The process and utility of classification and regression tree methodology in nursing research
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-01-01
Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048
The process and utility of classification and regression tree methodology in nursing research.
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-06-01
This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz
2005-01-01
We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...
Differences in Risk Factors for Rotator Cuff Tears between Elderly Patients and Young Patients.
Watanabe, Akihisa; Ono, Qana; Nishigami, Tomohiko; Hirooka, Takahiko; Machida, Hirohisa
2018-02-01
It has been unclear whether the risk factors for rotator cuff tears are the same at all ages or differ between young and older populations. In this study, we examined the risk factors for rotator cuff tears using classification and regression tree analysis as methods of nonlinear regression analysis. There were 65 patients in the rotator cuff tears group and 45 patients in the intact rotator cuff group. Classification and regression tree analysis was performed to predict rotator cuff tears. The target factor was rotator cuff tears; explanatory variables were age, sex, trauma, and critical shoulder angle≥35°. In the results of classification and regression tree analysis, the tree was divided at age 64. For patients aged≥64, the tree was divided at trauma. For patients aged<64, the tree was divided at critical shoulder angle≥35°. The odds ratio for critical shoulder angle≥35° was significant for all ages (5.89), and for patients aged<64 (10.3) while trauma was only a significant factor for patients aged≥64 (5.13). Age, trauma, and critical shoulder angle≥35° were related to rotator cuff tears in this study. However, these risk factors showed different trends according to age group, not a linear relationship.
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
ERIC Educational Resources Information Center
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Jose F. Negron
1998-01-01
Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii
Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie
1988-01-01
Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
CADDIS Volume 4. Data Analysis: Basic Analyses
Use of statistical tests to determine if an observation is outside the normal range of expected values. Details of CART, regression analysis, use of quantile regression analysis, CART in causal analysis, simplifying or pruning resulting trees.
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Forest type mapping of the Interior West
Bonnie Ruefenacht; Gretchen G. Moisen; Jock A. Blackard
2004-01-01
This paper develops techniques for the mapping of forest types in Arizona, New Mexico, and Wyoming. The methods involve regression-tree modeling using a variety of remote sensing and GIS layers along with Forest Inventory Analysis (FIA) point data. Regression-tree modeling is a fast and efficient technique of estimating variables for large data sets with high accuracy...
ERIC Educational Resources Information Center
Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.
2016-01-01
In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…
Digression and Value Concatenation to Enable Privacy-Preserving Regression.
Li, Xiao-Bai; Sarkar, Sumit
2014-09-01
Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Identification of extremely premature infants at high risk of rehospitalization.
Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D
2011-11-01
Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
Identification of Extremely Premature Infants at High Risk of Rehospitalization
Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.
2011-01-01
OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Distribution of cavity trees in midwestern old-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Distribution of cavity trees in midwesternold-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Huang, Li-Shan; Myers, Gary J.; Davidson, Philip W.; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W.; Cernichiari, Elsa; Shamlaye, Conrad F.; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W.
2007-01-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age nine years. The analyses for the most recent nine-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated nonlinearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of twenty-one endpoints available at age nine years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other nine-year endpoints that in the linear analysis has a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158
Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W
2007-11-01
Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels.
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Jon C. Regelbrugge
1993-01-01
Abstract. We modeled tree mortality occurring two years following wildfire in Pinus ponderosa forests using data from 1275 trees in 25 stands burned during the 1987 Stanislaus Complex fires. We used logistic regression analysis to develop models relating the probability of wildfire-induced mortality with tree size and fire severity for Pinus ponderosa, Calocedrus...
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
NASA Astrophysics Data System (ADS)
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
2018-01-01
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Logistic regression trees for initial selection of interesting loci in case-control studies
Nickolov, Radoslav Z; Milanov, Valentin B
2007-01-01
Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad
2002-01-01
Global climate change could have profound effects on the Earth's biota, including large redistributions of tree species and forest types. We used DISTRIB, a deterministic regression tree analysis model, to examine environmental drivers related to current forest-species distributions and then model potential suitable habitat under five climate change scenarios...
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Seligman, D A; Pullinger, A G
2000-01-01
Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.
Louis R. Iverson; Anantha Prasad; Mark W. Schwartz; Mark W. Schwartz
1999-01-01
We are using a deterministic regression tree analysis model (DISTRIB) and a stochastic migration model (SHIFT) to examine potential distributions of ~66 individual species of eastern US trees under a 2 x CO2 climate change scenario. This process is demonstrated for Virginia pine (Pinus virginiana).
Potential Changes in Tree Species Richness and Forest Community Types following Climate Change
Louis R. Iverson; Anantha M. Prasad
2001-01-01
Potential changes in tree species richness and forest community types were evaluated for the eastern United States according to five scenarios of future climate change resulting from a doubling of atmospheric carbon dioxide (CO2). DISTRIB, an empirical model that uses a regression tree analysis approach, was used to generate suitable habitat, or potential future...
Estimating tree crown widths for the primary Acadian species in Maine
Matthew B. Russell; Aaron R. Weiskittel
2012-01-01
In this analysis, data for seven conifer and eight hardwood species were gathered from across the state of Maine for estimating tree crown widths. Maximum and largest crown width equations were developed using tree diameter at breast height as the primary predicting variable. Quantile regression techniques were used to estimate the maximum crown width and a constrained...
Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie
2017-10-01
By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.
Bevilacqua, M; Ciarapica, F E; Giacchetta, G
2008-07-01
This work is an attempt to apply classification tree methods to data regarding accidents in a medium-sized refinery, so as to identify the important relationships between the variables, which can be considered as decision-making rules when adopting any measures for improvement. The results obtained using the CART (Classification And Regression Trees) method proved to be the most precise and, in general, they are encouraging concerning the use of tree diagrams as preliminary explorative techniques for the assessment of the ergonomic, management and operational parameters which influence high accident risk situations. The Occupational Injury analysis carried out in this paper was planned as a dynamic process and can be repeated systematically. The CART technique, which considers a very wide set of objective and predictive variables, shows new cause-effect correlations in occupational safety which had never been previously described, highlighting possible injury risk groups and supporting decision-making in these areas. The use of classification trees must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into traditional types of analysis.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok
2013-02-01
The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.
"Mad or bad?": burden on caregivers of patients with personality disorders.
Bauer, Rita; Döring, Antje; Schmidt, Tanja; Spießl, Hermann
2012-12-01
The burden on caregivers of patients with personality disorders is often greatly underestimated or completely disregarded. Possibilities for caregiver support have rarely been assessed. Thirty interviews were conducted with caregivers of such patients to assess illness-related burden. Responses were analyzed with a mixed method of qualitative and quantitative analysis in a sequential design. Patient and caregiver data, including sociodemographic and disease-related variables, were evaluated with regression analysis and regression trees. Caregiver statements (n = 404) were summarized into 44 global statements. The most frequent global statements were worries about the burden on other family members (70.0%), poor cooperation with clinical centers and other institutions (60.0%), financial burden (56.7%), worry about the patient's future (53.3%), and dissatisfaction with the patient's treatment and rehabilitation (53.3%). Linear regression and regression tree analysis identified predictors for more burdened caregivers. Caregivers of patients with personality disorders experience a variety of burdens, some disorder specific. Yet these caregivers often receive little attention or support.
Error analysis of leaf area estimates made from allometric regression models
NASA Technical Reports Server (NTRS)
Feiveson, A. H.; Chhikara, R. S.
1986-01-01
Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).
Jose Negron
1997-01-01
Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.
ERIC Educational Resources Information Center
Kim, Sung-Ho
Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
Towards lidar-based mapping of tree age at the Arctic forest tundra ecotone.
NASA Astrophysics Data System (ADS)
Jensen, J.; Maguire, A.; Oelkers, R.; Andreu-Hayles, L.; Boelman, N.; D'Arrigo, R.; Griffin, K. L.; Jennewein, J. S.; Hiers, E.; Meddens, A. J.; Russell, M.; Vierling, L. A.; Eitel, J.
2017-12-01
Climate change may cause spatial shifts in the forest-tundra ecotone (FTE). To improve our ability to study these spatial shifts, information on tree demography along the FTE is needed. The objective of this study was to assess the suitability of lidar derived tree heights as a surrogate for tree age. We calculated individual tree age from 48 tree cores collected at basal height from white spruce (Picea glauca) within the FTE in northern Alaska. Tree height was obtained from terrestrial lidar scans (<1cm spatial resolution). The relationship between age and height was examined using a linear regression model forced through the origin. We found a very strong predictive relationship between tree height and age (R2 = 0.90, RMSE = 19.34 years) for trees that ranged between 14 to 230 years. Separate regression models were also developed for small (height < 3 m) and large trees (height >= 3 m), yielding strong predictive relationships between height and age (R2 = 0.86, RMSE 12.21 years, and R2 = 0.93, RMSE = 25.16 years, respectively). The slope coefficient for small and large tree models (16.83 and 12.98 years/m, respectively) indicate that small trees grow 1.3 times faster than large trees at these FTE study sites. Although a strong, predictive relationship between age and height is uncommon in light-limited forest environments, our findings suggest that the sparseness of trees within the FTE may explain the strong tree height-age relationships found herein. Further analysis of 36 additional tree cores recently collected within the FTE near Inuvik, Canada will be performed. Our preliminary analysis suggests that lidar derived tree height could be a reliable proxy for tree age at the FTE, thereby establishing a new technique for scaling tree structure and demographics across larger portions of this sensitive ecotone.
Tanaka, Tomohiro; Voigt, Michael D
2018-03-01
Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI < 40 who did not receive sirolimus, as high risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2 = 0.971, p < 0.0001). Cumulative incidence of NMSC in low, moderate and high risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p < 0.0001). The decision tree model accurately stratifies the risk of developing NMSC in the long-term after LT.
Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L. Monika
2012-01-01
The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620
Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A
2017-10-01
Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
2015-01-01
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
Geospatial relationships of tree species damage caused by Hurricane Katrina in south Mississippi
Mark W. Garrigues; Zhaofei Fan; David L. Evans; Scott D. Roberts; William H. Cooke III
2012-01-01
Hurricane Katrina generated substantial impacts on the forests and biological resources of the affected area in Mississippi. This study seeks to use classification tree analysis (CTA) to determine which variables are significant in predicting hurricane damage (shear or windthrow) in the Southeast Mississippi Institute for Forest Inventory District. Logistic regressions...
Updated generalized biomass equations for North American tree species
David C. Chojnacky; Linda S. Heath; Jennifer C. Jenkins
2014-01-01
Historically, tree biomass at large scales has been estimated by applying dimensional analysis techniques and field measurements such as diameter at breast height (dbh) in allometric regression equations. Equations often have been developed using differing methods and applied only to certain species or isolated areas. We previously had compiled and combined (in meta-...
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth
2009-01-01
We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.
National scale biomass estimators for United States tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2003-01-01
Estimates of national-scale forest carbon (C) stocks and fluxes are typically based on allometric regression equations developed using dimensional analysis techniques. However, the literature is inconsistent and incomplete with respect to large-scale forest C estimation. We compiled all available diameter-based allometric regression equations for estimating total...
Brabant, Marie-Eve; Hébert, Martine; Chagnon, François
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions.
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-01-01
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-05-20
In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011
NASA Astrophysics Data System (ADS)
Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R.
2016-01-01
Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.
More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011.
Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R
2016-01-01
Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrew G. Peterson; J. Timothy Ball; Yiqi Luo
1998-09-25
Estimation of leaf photosynthetic rate (A) from leaf nitrogen content (N) is both conceptually and numerically important in models of plant, ecosystem and biosphere responses to global change. The relationship between A and N has been studied extensively at ambient CO{sub 2} but much less at elevated CO{sub 2}. This study was designed to (1) assess whether the A-N relationship was more similar for species within than between community and vegetation types, and (2) examine how growth at elevated CO{sub 2} affects the A-N relationship. Data were obtained for 39 C{sub 3} species grown at ambient CO{sub 2} and 10more » C{sub 3} species grown at ambient and elevated CO{sub 2}. A regression model was applied to each species as well as to species pooled within different community and vegetation types. Cluster analysis of the regression coefficients indicated that species measured at ambient CO{sub 2} did not separate into distinct groups matching community or vegetation type. Instead, most community and vegetation types shared the same general parameter space for regression coefficients. Growth at elevated CO{sub 2} increased photosynthetic nitrogen use efficiency for pines and deciduous trees. When species were pooled by vegetation type, the A-N relationship for deciduous trees expressed on a leaf-mass bask was not altered by elevated CO{sub 2}, while the intercept increased for pines. When regression coefficients were averaged to give mean responses for different vegetation types, elevated CO{sub 2} increased the intercept and the slope for deciduous trees but increased only the intercept for pines. There were no statistical differences between the pines and deciduous trees for the effect of CO{sub 2}. Generalizations about the effect of elevated CO{sub 2} on the A-N relationship, and differences between pines and deciduous trees will be enhanced as more data become available.« less
Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne
2017-01-01
Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and locally adapted forest management plans to increase carbon dioxide sequestration from the atmosphere in trees. Copyright © 2016 Elsevier B.V. All rights reserved.
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J
2015-12-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.
2016-01-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526
Boosted regression tree, table, and figure data
Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Diameter-growth model across shortleaf pine range using regression tree analysis
Daniel Yaussy; Louis Iverson; Anantha Prasad
1999-01-01
Diameter growth of a tree in most gap-phase models is limited by light, nutrients, moisture, and temperature. Growing-season temperature is represented by growing degree days (gdd), which is the sum of the average daily temperatures above a baseline temperature. Gap-phase models determine the north-south range of a species by the gdd limits at the north and south...
Equations relating compacted and uncompacted live crown ratio for common tree species in the South
KaDonna C. Randolph
2010-01-01
Species-specific equations to predict uncompacted crown ratio (UNCR) from compacted live crown ratio (CCR), tree length, and stem diameter were developed for 24 species and 12 genera in the southern United States. Using data from the US Forest Service Forest Inventory and Analysis program, nonlinear regression was used to model UNCR with a logistic function. Model...
An Intelligent Decision Support System for Workforce Forecast
2011-01-01
ARIMA ) model to forecast the demand for construction skills in Hong Kong. This model was based...Decision Trees ARIMA Rule Based Forecasting Segmentation Forecasting Regression Analysis Simulation Modeling Input-Output Models LP and NLP Markovian...data • When results are needed as a set of easily interpretable rules 4.1.4 ARIMA Auto-regressive, integrated, moving-average ( ARIMA ) models
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy
2016-01-01
This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.
Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R
2014-04-01
The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J
2009-11-22
Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.
Charles E. Rose; Thomas B. Lynch
2001-01-01
A method was developed for estimating parameters in an individual tree basal area growth model using a system of equations based on dbh rank classes. The estimation method developed is a compromise between an individual tree and a stand level basal area growth model that accounts for the correlation between trees within a plot by using seemingly unrelated regression (...
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi
2013-02-01
The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.
Novak, Klemen; de Luis, Martin; Saz, Miguel A.; Longares, Luis A.; Serrano-Notivoli, Roberto; Raventós, Josep; Čufar, Katarina; Gričar, Jožica; Di Filippo, Alfredo; Piovesan, Gianluca; Rathgeber, Cyrille B. K.; Papadopoulos, Andreas; Smith, Kevin T.
2016-01-01
Climate predictions for the Mediterranean Basin include increased temperatures, decreased precipitation, and increased frequency of extreme climatic events (ECE). These conditions are associated with decreased tree growth and increased vulnerability to pests and diseases. The anatomy of tree rings responds to these environmental conditions. Quantitatively, the width of a tree ring is largely determined by the rate and duration of cell division by the vascular cambium. In the Mediterranean climate, this division may occur throughout almost the entire year. Alternatively, cell division may cease during relatively cool and dry winters, only to resume in the same calendar year with milder temperatures and increased availability of water. Under particularly adverse conditions, no xylem may be produced in parts of the stem, resulting in a missing ring (MR). A dendrochronological network of Pinus halepensis was used to determine the relationship of MR to ECE. The network consisted of 113 sites, 1,509 trees, 2,593 cores, and 225,428 tree rings throughout the distribution range of the species. A total of 4,150 MR were identified. Binomial logistic regression analysis determined that MR frequency increased with increased cambial age. Spatial analysis indicated that the geographic areas of south-eastern Spain and northern Algeria contained the greatest frequency of MR. Dendroclimatic regression analysis indicated a non-linear relationship of MR to total monthly precipitation and mean temperature. MR are strongly associated with the combination of monthly mean temperature from previous October till current February and total precipitation from previous September till current May. They are likely to occur with total precipitation lower than 50 mm and temperatures higher than 5°C. This conclusion is global and can be applied to every site across the distribution area. Rather than simply being a complication for dendrochronology, MR formation is a fundamental response of trees to adverse environmental conditions. The demonstrated relationship of MR formation to ECE across this dendrochronological network in the Mediterranean basin shows the potential of MR analysis to reconstruct the history of past climatic extremes and to predict future forest dynamics in a changing climate. PMID:27303421
Novak, Klemen; de Luis, Martin; Saz, Miguel A; Longares, Luis A; Serrano-Notivoli, Roberto; Raventós, Josep; Čufar, Katarina; Gričar, Jožica; Di Filippo, Alfredo; Piovesan, Gianluca; Rathgeber, Cyrille B K; Papadopoulos, Andreas; Smith, Kevin T
2016-01-01
Climate predictions for the Mediterranean Basin include increased temperatures, decreased precipitation, and increased frequency of extreme climatic events (ECE). These conditions are associated with decreased tree growth and increased vulnerability to pests and diseases. The anatomy of tree rings responds to these environmental conditions. Quantitatively, the width of a tree ring is largely determined by the rate and duration of cell division by the vascular cambium. In the Mediterranean climate, this division may occur throughout almost the entire year. Alternatively, cell division may cease during relatively cool and dry winters, only to resume in the same calendar year with milder temperatures and increased availability of water. Under particularly adverse conditions, no xylem may be produced in parts of the stem, resulting in a missing ring (MR). A dendrochronological network of Pinus halepensis was used to determine the relationship of MR to ECE. The network consisted of 113 sites, 1,509 trees, 2,593 cores, and 225,428 tree rings throughout the distribution range of the species. A total of 4,150 MR were identified. Binomial logistic regression analysis determined that MR frequency increased with increased cambial age. Spatial analysis indicated that the geographic areas of south-eastern Spain and northern Algeria contained the greatest frequency of MR. Dendroclimatic regression analysis indicated a non-linear relationship of MR to total monthly precipitation and mean temperature. MR are strongly associated with the combination of monthly mean temperature from previous October till current February and total precipitation from previous September till current May. They are likely to occur with total precipitation lower than 50 mm and temperatures higher than 5°C. This conclusion is global and can be applied to every site across the distribution area. Rather than simply being a complication for dendrochronology, MR formation is a fundamental response of trees to adverse environmental conditions. The demonstrated relationship of MR formation to ECE across this dendrochronological network in the Mediterranean basin shows the potential of MR analysis to reconstruct the history of past climatic extremes and to predict future forest dynamics in a changing climate.
A study of Solar-Enso correlation with southern Brazil tree ring index (1955- 1991)
NASA Astrophysics Data System (ADS)
Rigozo, N.; Nordemann, D.; Vieira, L.; Echer, E.
The effects of solar activity and El Niño-Southern Oscillation on tree growth in Southern Brazil were studied by correlation analysis. Trees for this study were native Araucaria (Araucaria Angustifolia)from four locations in Rio Grande do Sul State, in Southern Brazil: Canela (29o18`S, 50o51`W, 790 m asl), Nova Petropolis (29o2`S, 51o10`W, 579 m asl), Sao Francisco de Paula (29o25`S, 50o24`W, 930 m asl) and Sao Martinho da Serra (29o30`S, 53o53`W, 484 m asl). From these four sites, an average tree ring Index for this region was derived, for the period 1955-1991. Linear correlations were made on annual and 10 year running averages of this tree ring Index, of sunspot number Rz and SOI. For annual averages, the correlation coefficients were low, and the multiple regression between tree ring and SOI and Rz indicates that 20% of the variance in tree rings was explained by solar activity and ENSO variability. However, when the 10 year running averages correlations were made, the coefficient correlations were much higher. A clear anticorrelation is observed between SOI and Index (r=-0.81) whereas Rz and Index show a positive correlation (r=0.67). The multiple regression of 10 year running averages indicates that 76% of the variance in tree ring INdex was explained by solar activity and ENSO. These results indicate that the effects of solar activity and ENSO on tree rings are better seen on long timescales.
Barlin, Joyce N; Zhou, Qin; St Clair, Caryn M; Iasonos, Alexia; Soslow, Robert A; Alektiar, Kaled M; Hensley, Martee L; Leitao, Mario M; Barakat, Richard R; Abu-Rustum, Nadeem R
2013-09-01
The objectives of the study are to evaluate which clinicopathologic factors influenced overall survival (OS) in endometrial carcinoma and to determine if the surgical effort to assess para-aortic (PA) lymph nodes (LNs) at initial staging surgery impacts OS. All patients diagnosed with endometrial cancer from 1/1993-12/2011 who had LNs excised were included. PALN assessment was defined by the identification of one or more PALNs on final pathology. A multivariate analysis was performed to assess the effect of PALNs on OS. A form of recursive partitioning called classification and regression tree (CART) analysis was implemented. Variables included: age, stage, tumor subtype, grade, myometrial invasion, total LNs removed, evaluation of PALNs, and adjuvant chemotherapy. The cohort included 1920 patients, with a median age of 62 years. The median number of LNs removed was 16 (range, 1-99). The removal of PALNs was not associated with OS (P=0.450). Using the CART hierarchically, stage I vs. stages II-IV and grades 1-2 vs. grade 3 emerged as predictors of OS. If the tree was allowed to grow, further branching was based on age and myometrial invasion. Total number of LNs removed and assessment of PALNs as defined in this study were not predictive of OS. This innovative CART analysis emphasized the importance of proper stage assignment and a binary grading system in impacting OS. Notably, the total number of LNs removed and specific evaluation of PALNs as defined in this study were not important predictors of OS. Copyright © 2013 Elsevier Inc. All rights reserved.
Duncan, Dustin T.; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A.; Arbia, Giuseppe; Castro, Marcia C.; White, Kellee; Williams, David R.
2017-01-01
The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran’s I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran’s I range from 0.24 to 0.86, all P=0.001), for tree density (Global Moran’s I=0.452, P=0.001), and in the OLS regression residuals (Global Moran’s I range from 0.32 to 0.38, all P<0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (rS=−0.19; conventional P-value=0.016; spatially adjusted P-value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (rS=−0.18; conventional P-value=0.019; spatially adjusted P-value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed. PMID:29354668
Huang, C.; Townshend, J.R.G.
2003-01-01
A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
An introduction to tree-structured modeling with application to quality of life data.
Su, Xiaogang; Azuero, Andres; Cho, June; Kvale, Elizabeth; Meneses, Karen M; McNees, M Patrick
2011-01-01
Investigators addressing nursing research are faced increasingly with the need to analyze data that involve variables of mixed types and are characterized by complex nonlinearity and interactions. Tree-based methods, also called recursive partitioning, are gaining popularity in various fields. In addition to efficiency and flexibility in handling multifaceted data, tree-based methods offer ease of interpretation. The aims of this study were to introduce tree-based methods, discuss their advantages and pitfalls in application, and describe their potential use in nursing research. In this article, (a) an introduction to tree-structured methods is presented, (b) the technique is illustrated via quality of life (QOL) data collected in the Breast Cancer Education Intervention study, and (c) implications for their potential use in nursing research are discussed. As illustrated by the QOL analysis example, tree methods generate interesting and easily understood findings that cannot be uncovered via traditional linear regression analysis. The expanding breadth and complexity of nursing research may entail the use of new tools to improve efficiency and gain new insights. In certain situations, tree-based methods offer an attractive approach that help address such needs.
Eric H. Wharton; Tiberius Cunia
1987-01-01
Proceedings of a workshop co-sponsored by the USDA Forest Service, the State University of New York, and the Society of American Foresters. Presented were papers on the methodology of sample tree selection, tree biomass measurement, construction of biomass tables and estimation of their error, and combining the error of biomass tables with that of the sample plots or...
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
KAWAGUCHI, TAKUMI; SUETSUGU, TAKURO; OGATA, SHYOU; IMANAGA, MINAMI; ISHII, KUMIKO; ESAKI, NAO; SUGIMOTO, MASAKO; OTSUYAMA, JYURI; NAGAMATSU, AYU; TANIGUCHI, EITARO; ITOU, MINORU; ORIISHI, TETSUHARU; IWASAKI, SHOKO; MIURA, HIROKO; TORIMURA, TAKUJI
2016-01-01
The incidence of traffic accidents in patients with chronic liver disease (CLD) is high in the USA. However, the characteristics of patients, including dietary habits, differ between Japan and the USA. The present study investigated the incidence of traffic accidents in CLD patients and the clinical profiles associated with traffic accidents in Japan using a data-mining analysis. A cross-sectional study was performed and 256 subjects [148 CLD patients (CLD group) and 106 patients with other digestive diseases (disease control group)] were enrolled; 2 patients were excluded. The incidence of traffic accidents was compared between the two groups. Independent factors for traffic accidents were analyzed using logistic regression and decision-tree analyses. The incidence of traffic accidents did not differ between the CLD and disease control groups (8.8 vs. 11.3%). The results of the logistic regression analysis showed that yoghurt consumption was the only independent risk factor for traffic accidents (odds ratio, 0.37; 95% confidence interval, 0.16–0.85; P=0.0197). Similarly, the results of the decision-tree analysis showed that yoghurt consumption was the initial divergence variable. In patients who consumed yoghurt habitually, the incidence of traffic accidents was 6.6%, while that in patients who did not consume yoghurt was 16.0%. CLD was not identified as an independent factor in the logistic regression and decision-tree analyses. In conclusion, the difference in the incidence of traffic accidents in Japan between the CLD and disease control groups was insignificant. Furthermore, yoghurt consumption was an independent negative risk factor for traffic accidents in patients with digestive diseases, including CLD. PMID:27123257
Kawaguchi, Takumi; Suetsugu, Takuro; Ogata, Shyou; Imanaga, Minami; Ishii, Kumiko; Esaki, Nao; Sugimoto, Masako; Otsuyama, Jyuri; Nagamatsu, Ayu; Taniguchi, Eitaro; Itou, Minoru; Oriishi, Tetsuharu; Iwasaki, Shoko; Miura, Hiroko; Torimura, Takuji
2016-05-01
The incidence of traffic accidents in patients with chronic liver disease (CLD) is high in the USA. However, the characteristics of patients, including dietary habits, differ between Japan and the USA. The present study investigated the incidence of traffic accidents in CLD patients and the clinical profiles associated with traffic accidents in Japan using a data-mining analysis. A cross-sectional study was performed and 256 subjects [148 CLD patients (CLD group) and 106 patients with other digestive diseases (disease control group)] were enrolled; 2 patients were excluded. The incidence of traffic accidents was compared between the two groups. Independent factors for traffic accidents were analyzed using logistic regression and decision-tree analyses. The incidence of traffic accidents did not differ between the CLD and disease control groups (8.8 vs. 11.3%). The results of the logistic regression analysis showed that yoghurt consumption was the only independent risk factor for traffic accidents (odds ratio, 0.37; 95% confidence interval, 0.16-0.85; P=0.0197). Similarly, the results of the decision-tree analysis showed that yoghurt consumption was the initial divergence variable. In patients who consumed yoghurt habitually, the incidence of traffic accidents was 6.6%, while that in patients who did not consume yoghurt was 16.0%. CLD was not identified as an independent factor in the logistic regression and decision-tree analyses. In conclusion, the difference in the incidence of traffic accidents in Japan between the CLD and disease control groups was insignificant. Furthermore, yoghurt consumption was an independent negative risk factor for traffic accidents in patients with digestive diseases, including CLD.
Robertson, Dale M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.
2008-01-01
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
NASA Astrophysics Data System (ADS)
Niemeijer, Meindert; Dumitrescu, Alina V.; van Ginneken, Bram; Abrámoff, Michael D.
2011-03-01
Parameters extracted from the vasculature on the retina are correlated with various conditions such as diabetic retinopathy and cardiovascular diseases such as stroke. Segmentation of the vasculature on the retina has been a topic that has received much attention in the literature over the past decade. Analysis of the segmentation result, however, has only received limited attention with most works describing methods to accurately measure the width of the vessels. Analyzing the connectedness of the vascular network is an important step towards the characterization of the complete vascular tree. The retinal vascular tree, from an image interpretation point of view, originates at the optic disc and spreads out over the retina. The tree bifurcates and the vessels also cross each other. The points where this happens form the key to determining the connectedness of the complete tree. We present a supervised method to detect the bifurcations and crossing points of the vasculature of the retina. The method uses features extracted from the vasculature as well as the image in a location regression approach to find those locations of the segmented vascular tree where the bifurcation or crossing occurs (from here, POI, points of interest). We evaluate the method on the publicly available DRIVE database in which an ophthalmologist has marked the POI.
Assessing wildfire risks at multiple spatial scales
Justin Fitch
2008-01-01
In continuation of the efforts to advance wildfire science and develop tools for wildland fire managers, a spatial wildfire risk assessment was carried out using Classification and Regression Tree analysis (CART) and Geographic Information Systems (GIS). The analysis was performed at two scales. The small-scale assessment covered the entire state of New Mexico, while...
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn
2017-11-15
A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.
Chung, Yi-Shih
2013-12-01
Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Yamashita, Takashi; Kart, Cary S; Noe, Douglas A
2012-12-01
Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.
Dynamic travel time estimation using regression trees.
DOT National Transportation Integrated Search
2008-10-01
This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...
Using nonlinear quantile regression to estimate the self-thinning boundary curve
Quang V. Cao; Thomas J. Dean
2015-01-01
The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...
Predicting the limits to tree height using statistical regressions of leaf traits.
Burgess, Stephen S O; Dawson, Todd E
2007-01-01
Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized.
Weather Impact on Airport Arrival Meter Fix Throughput
NASA Technical Reports Server (NTRS)
Wang, Yao
2017-01-01
Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.
NASA Astrophysics Data System (ADS)
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett
2009-01-01
Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Aaron Weiskittel; Jereme Frank; David Walker; Phil Radtke; David Macfarlane; James Westfall
2015-01-01
Prediction of forest biomass and carbon is becoming important issues in the United States. However, estimating forest biomass and carbon is difficult and relies on empirically-derived regression equations. Based on recent findings from a national gap analysis and comprehensive assessment of the USDA Forest Service Forest Inventory and Analysis (USFS-FIA) component...
Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A
2018-04-29
Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.
Using decision tree analysis to identify risk factors for relapse to smoking
Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.
2010-01-01
This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871
The application of data mining techniques to oral cancer prognosis.
Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan
2015-05-01
This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Static terrestrial laser scanning of juvenile understory trees for field phenotyping
NASA Astrophysics Data System (ADS)
Wang, Huanhuan; Lin, Yi
2014-11-01
This study was to attempt the cutting-edge 3D remote sensing technique of static terrestrial laser scanning (TLS) for parametric 3D reconstruction of juvenile understory trees. The data for test was collected with a Leica HDS6100 TLS system in a single-scan way. The geometrical structures of juvenile understory trees are extracted by model fitting. Cones are used to model trunks and branches. Principal component analysis (PCA) is adopted to calculate their major axes. Coordinate transformation and orthogonal projection are used to estimate the parameters of the cones. Then, AutoCAD is utilized to simulate the morphological characteristics of the understory trees, and to add secondary branches and leaves in a random way. Comparison of the reference values and the estimated values gives the regression equation and shows that the proposed algorithm of extracting parameters is credible. The results have basically verified the applicability of TLS for field phenotyping of juvenile understory trees.
Spectral analysis of white ash response to emerald ash borer infestations
NASA Astrophysics Data System (ADS)
Calandra, Laura
The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah
2018-07-01
In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit
2015-01-01
The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
A self-trained classification technique for producing 30 m percent-water maps from Landsat data
Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei
2010-01-01
Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.
Scalable Regression Tree Learning on Hadoop using OpenPlanet
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor
As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
Record of the Solar Activity and of Other Geophysical Phenomenons in Tree Ring
NASA Astrophysics Data System (ADS)
Rigozo, Nivaor Rodolfo
1999-01-01
Tree ring studies are usually used to determine or verify climatic factors which prevail in a given place or region and may cause tree ring width variations. Few studies are dedicated to the geophysical phenomena which may underlie these tree ring width variations. In order to look for periodicities which may be associated to the solar activity and/or to other geophysical phenomena which may influence tree ring growth, a new interactive image analysis method to measure tree ring width was developed and is presented here. This method makes use of a computer and a high resolution flatbed scanner; a program was also developed in Interactive Data Language (IDL 5.0) to study ring digitized images and transform them into time series. The main advantage of this method is the tree ring image interactive analysis without needing complex and high cost instrumentation. Thirty-nine samples were collected: 12 from Concordia - S. C., 9 from Canela - R. S., 14 from Sao Francisco de Paula - R. S., one from Nova Petropolis - R. S., 2 from Sao Martinho da Serra - R. S. e one from Chile. Fit functions are applied to ring width time series to obtain the best long time range trend (growth rate of every tree) curves and are eliminated through a standardization process that gives the tree ring index time series from which is performed spectral analysis by maximum entropy method and iterative regression. The results obtained show periodicities close to 11 yr, 22 yr Hale solar cycles and 5.5 yr for all sampling locations 52 yr and Gleissberg cycles for Concordia - S. C. and Chile samples. El Nino events were also observed with periods around 4 e 7 yr.
Chin, Weng-Yee; Wan, Eric Yuk Fai; Dowrick, Christopher; Arroll, Bruce; Lam, Cindy Lo Kuen
2018-04-26
The aim of this study was to explore the relationship between patient self-reported Patient Health Questionnaire-9 (PHQ-9) symptoms and doctor diagnosis of depression using a tree analysis approach. This was a secondary analysis on a dataset obtained from 10 179 adult primary care patients and 59 primary care physicians (PCPs) across Hong Kong. Patients completed a waiting room survey collecting data on socio-demographics and the PHQ-9. Blinded doctors documented whether they thought the patient had depression. Data were analyzed using multiple logistic regression and conditional inference decision tree modeling. PCPs diagnosed 594 patients with depression. Logistic regression identified gender, age, employment status, past history of depression, family history of mental illness and recent doctor visit as factors associated with a depression diagnosis. Tree analyses revealed different pathways of association between PHQ-9 symptoms and depression diagnosis for patients with and without past depression. The PHQ-9 symptom model revealed low mood, sense of worthlessness, fatigue, sleep disturbance and functional impairment as early classifiers. The PHQ-9 total score model revealed cut-off scores of >12 and >15 were most frequently associated with depression diagnoses in patients with and without past depression. A past history of depression is the most significant factor associated with the diagnosis of depression. PCPs appear to utilize a hypothetical-deductive problem-solving approach incorporating pre-test probability, with different associated factors for patients with and without past depression. Diagnostic thresholds may be too low for patients with past depression and too high for those without, potentially leading to over and under diagnosis of depression.
Smitley, D R; Rebek, E J; Royalty, R N; Davis, T W; Newhouse, K F
2010-02-01
We conducted field trials at five different locations over a period of 6 yr to investigate the efficacy of imidacloprid applied each spring as a basal soil drench for protection against emerald ash borer, Agrilus planipennis Fairmaire (Coleoptera: Buprestidae). Canopy thinning and emerald ash borer larval density were used to evaluate efficacy for 3-4 yr at each location while treatments continued. Test sites included small urban trees (5-15 cm diameter at breast height [dbh]), medium to large (15-65 cm dbh) trees at golf courses, and medium to large street trees. Annual basal drenches with imidacloprid gave complete protection of small ash trees for three years. At three sites where the size of trees ranged from 23 to 37 cm dbh, we successfully protected all ash trees beginning the test with <60% canopy thinning. Regression analysis of data from two sites reveals that tree size explains 46% of the variation in efficacy of imidacloprid drenches. The smallest trees (<30 cm dbh) remained in excellent condition for 3 yr, whereas most of the largest trees (>38 cm dbh) declined to a weakened state and undesirable appearance. The five-fold increase in trunk and branch surface area of ash trees as the tree dbh doubles may account for reduced efficacy on larger trees, and suggests a need to increase treatment rates for larger trees.
Method for estimating potential tree-grade distributions for northeastern forest species
Daniel A. Yaussy; Daniel A. Yaussy
1993-01-01
Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...
López-Sampson, Arlene; Cernusak, Lucas A; Page, Tony
2017-05-01
Physiological traits are frequently used as indicators of tree productivity. Aquilaria species growing in a research planting were studied to investigate relationships between leaf-productivity traits and tree growth. Twenty-eight trees were selected to measure isotopic composition of carbon (δ13C) and nitrogen (δ15N) and monitor six leaf attributes. Trees were sampled randomly within each of four diametric classes (at 150 mm above ground level) ensuring the variability in growth of the whole population was represented. A model averaging technique based on the Akaike's information criterion was computed to identify whether leaf traits could assist in diameter prediction. Regression analysis was performed to test for relationships between carbon isotope values and diameter and leaf traits. Approximately one new leaf per week was produced by a shoot. The rate of leaf expansion was estimated as 1.45 mm day-1. The range of δ13C values in leaves of Aquilaria species was from -25.5‰ to -31‰, with an average of -28.4 ‰ (±1.5‰ SD). A moderate negative correlation (R2 = 0.357) between diameter and δ13C in leaf dry matter indicated that individuals with high intercellular CO2 concentrations (low δ13C) and associated low water-use efficiency sustained rapid growth. Analysis of the 95% confidence of best-ranked regression models indicated that the predictors that could best explain growth in Aquilaria species were δ13C, δ15N, petiole length, number of new leaves produced per week and specific leaf area. The model constructed with these variables explained 55% (R2 = 0.55) of the variability in stem diameter. This demonstrates that leaf traits can assist in the early selection of high-productivity trees in Aquilaria species. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Additivity of nonlinear biomass equations
Bernard R. Parresol
2001-01-01
Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.
Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis
2016-01-01
Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
USDA-ARS?s Scientific Manuscript database
Risk factors for obesity and weight gain are typically evaluated individually while "adjusting for" the influence of other confounding factors, and few studies, if any, have created risk profiles by clustering risk factors. We identified subgroups of postmenopausal women homogeneous in their cluster...
Exploring Race Differences in Correlates of Seniors' Satisfaction with Undergraduate Education
ERIC Educational Resources Information Center
Einarson, Marne K.; Matier, Michael W.
2005-01-01
This study employed multiple linear regression and decision tree analysis to examine the correlates of overall satisfaction with undergraduate education for white, Asian American, Latino and African American seniors enrolled at 17 doctoral/research universities. Satisfaction with the overall quality of instruction and social involvement were the…
Exploring Race Differences in Correlates of Seniors' Satisfaction with Undergraduate Education
ERIC Educational Resources Information Center
Einarson, Marne K.; Matier, Michael W.
2004-01-01
This study employed multiple linear regression and decision tree analysis to examine the correlates of overall satisfaction with undergraduate education for white, Asian American, Hispanic and African American seniors enrolled at 17 research-extensive universities. Satisfaction with the overall quality of instruction and social involvement were…
Biomass estimates of eastern red cedar tree components
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schnell, R.L.
1976-02-01
Fresh and dry-weight relationships of species of the eastern red cedar (Juniperus virginiana L.) found in the Tennessee Valley are presented. Both wood and bark were analyzed. All fresh and dry weights tabulated were computed from predicting equations developed by multiple regression analysis of field data. (JGB)
Many units in public housing or other low-income urban dwellings may have elevated pesticide residues, given recurring infestation, but it would be logistically and economically infeasible to sample a large number of units to identify highly exposed households to design interven...
Environmental factors affecting understory diversity in second-growth deciduous forests
Cynthia D. Huebner; J.C. Randolph; G.R. Parker
1995-01-01
The purpose of this study was to determine the most important nonanthropogenic factors affecting understory (herbs, shrubs and low-growing vines) diversity in forested landscapes of southern Indiana. Fourteen environmental variables were measured for 46 sites. Multiple regression analysis showed significant positive correlation between understory diversity and tree...
Individual tree growth models for natural even-aged shortleaf pine
Chakra B. Budhathoki; Thomas B. Lynch; James M. Guldin
2006-01-01
Shortleaf pine (Pinus echinata Mill.) measurements were available from permanent plots established in even-aged stands of the Ouachita Mountains for studying growth. Annual basal area growth was modeled with a least-squares nonlinear regression method utilizing three measurements. The analysis showed that the parameter estimates were in agreement...
USDA-ARS?s Scientific Manuscript database
Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
USDA-ARS?s Scientific Manuscript database
Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...
Fire frequency in the Interior Columbia River Basin: Building regional models from fire history data
McKenzie, D.; Peterson, D.L.; Agee, James K.
2000-01-01
Fire frequency affects vegetation composition and successional pathways; thus it is essential to understand fire regimes in order to manage natural resources at broad spatial scales. Fire history data are lacking for many regions for which fire management decisions are being made, so models are needed to estimate past fire frequency where local data are not yet available. We developed multiple regression models and tree-based (classification and regression tree, or CART) models to predict fire return intervals across the interior Columbia River basin at 1-km resolution, using georeferenced fire history, potential vegetation, cover type, and precipitation databases. The models combined semiqualitative methods and rigorous statistics. The fire history data are of uneven quality; some estimates are based on only one tree, and many are not cross-dated. Therefore, we weighted the models based on data quality and performed a sensitivity analysis of the effects on the models of estimation errors that are due to lack of cross-dating. The regression models predict fire return intervals from 1 to 375 yr for forested areas, whereas the tree-based models predict a range of 8 to 150 yr. Both types of models predict latitudinal and elevational gradients of increasing fire return intervals. Examination of regional-scale output suggests that, although the tree-based models explain more of the variation in the original data, the regression models are less likely to produce extrapolation errors. Thus, the models serve complementary purposes in elucidating the relationships among fire frequency, the predictor variables, and spatial scale. The models can provide local managers with quantitative information and provide data to initialize coarse-scale fire-effects models, although predictions for individual sites should be treated with caution because of the varying quality and uneven spatial coverage of the fire history database. The models also demonstrate the integration of qualitative and quantitative methods when requisite data for fully quantitative models are unavailable. They can be tested by comparing new, independent fire history reconstructions against their predictions and can be continually updated, as better fire history data become available.
NASA Astrophysics Data System (ADS)
Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei
2017-02-01
Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Tree STEM and Canopy Biomass Estimates from Terrestrial Laser Scanning Data
NASA Astrophysics Data System (ADS)
Olofsson, K.; Holmgren, J.
2017-10-01
In this study an automatic method for estimating both the tree stem and the tree canopy biomass is presented. The point cloud tree extraction techniques operate on TLS data and models the biomass using the estimated stem and canopy volume as independent variables. The regression model fit error is of the order of less than 5 kg, which gives a relative model error of about 5 % for the stem estimate and 10-15 % for the spruce and pine canopy biomass estimates. The canopy biomass estimate was improved by separating the models by tree species which indicates that the method is allometry dependent and that the regression models need to be recomputed for different areas with different climate and different vegetation.
Du, Ning; Fan, Jintu; Chen, Shuo; Liu, Yang
2008-07-21
Although recent investigations [Ryan, M.G., Yoder, B.J., 1997. Hydraulic limits to tree height and tree growth. Bioscience 47, 235-242; Koch, G.W., Sillett, S.C.,Jennings, G.M.,Davis, S.D., 2004. The limits to tree height. Nature 428, 851-854; Niklas, K.J., Spatz, H., 2004. Growth and hydraulic (not mechanical) constraints govern the scaling of tree height and mass. Proc. Natl Acad. Sci. 101, 15661-15663; Ryan, M.G., Phillips, N., Bond, B.J., 2006. Hydraulic limitation hypothesis revisited. Plant Cell Environ. 29, 367-381; Niklas, K.J., 2007. Maximum plant height and the biophysical factors that limit it. Tree Physiol. 27, 433-440; Burgess, S.S.O., Dawson, T.E., 2007. Predicting the limits to tree height using statistical regressions of leaf traits. New Phytol. 174, 626-636] suggested that the hydraulic limitation hypothesis (HLH) is the most plausible theory to explain the biophysical limits to maximum tree height and the decline in tree growth rate with age, the analysis is largely qualitative or based on statistical regression. Here we present an integrated biophysical model based on the principle that trees develop physiological compensations (e.g. the declined leaf water potential and the tapering of conduits with heights [West, G.B., Brown, J.H., Enquist, B.J., 1999. A general model for the structure and allometry of plant vascular systems. Nature 400, 664-667]) to resist the increasing water stress with height, the classical HLH and the biochemical limitations on photosynthesis [von Caemmerer, S., 2000. Biochemical Models of Leaf Photosynthesis. CSIRO Publishing, Australia]. The model has been applied to the tallest trees in the world (viz. Coast redwood (Sequoia sempervirens)). Xylem water potential, leaf carbon isotope composition, leaf mass to area ratio at different heights derived from the model show good agreements with the experimental measurements of Koch et al. [2004. The limits to tree height. Nature 428, 851-854]. The model also well explains the universal trend of declining growth rate with age.
Estimating tree species diversity in the savannah using NDVI and woody canopy cover
NASA Astrophysics Data System (ADS)
Madonsela, Sabelo; Cho, Moses Azong; Ramoelo, Abel; Mutanga, Onisimo; Naidoo, Laven
2018-04-01
Remote sensing applications in biodiversity research often rely on the establishment of relationships between spectral information from the image and tree species diversity measured in the field. Most studies have used normalized difference vegetation index (NDVI) to estimate tree species diversity on the basis that it is sensitive to primary productivity which defines spatial variation in plant diversity. The NDVI signal is influenced by photosynthetically active vegetation which, in the savannah, includes woody canopy foliage and grasses. The question is whether the relationship between NDVI and tree species diversity in the savanna depends on the woody cover percentage. This study explored the relationship between woody canopy cover (WCC) and tree species diversity in the savannah woodland of southern Africa and also investigated whether there is a significant interaction between seasonal NDVI and WCC in the factorial model when estimating tree species diversity. To fulfil our aim, we followed stratified random sampling approach and surveyed tree species in 68 plots of 90 m × 90 m across the study area. Within each plot, all trees with diameter at breast height of >10 cm were sampled and Shannon index - a common measure of species diversity which considers both species richness and abundance - was used to quantify tree species diversity. We then extracted WCC in each plot from existing fractional woody cover product produced from Synthetic Aperture Radar (SAR) data. Factorial regression model was used to determine the interaction effect between NDVI and WCC when estimating tree species diversity. Results from regression analysis showed that (i) WCC has a highly significant relationship with tree species diversity (r2 = 0.21; p < 0.01), (ii) the interaction between the NDVI and WCC is not significant, however, the factorial model significantly reduced the error of prediction (RMSE = 0.47, p < 0.05) compared to NDVI (RMSE = 0.49) or WCC (RMSE = 0.49) model during the senescence period. The result justifies our assertion that combining NDVI with WCC will be optimal for biodiversity estimation during the senescence period.
Singh, Gyanendra; Sachdeva, S N; Pal, Mahesh
2016-11-01
This work examines the application of M5 model tree and conventionally used fixed/random effect negative binomial (FENB/RENB) regression models for accident prediction on non-urban sections of highway in Haryana (India). Road accident data for a period of 2-6 years on different sections of 8 National and State Highways in Haryana was collected from police records. Data related to road geometry, traffic and road environment related variables was collected through field studies. Total two hundred and twenty two data points were gathered by dividing highways into sections with certain uniform geometric characteristics. For prediction of accident frequencies using fifteen input parameters, two modeling approaches: FENB/RENB regression and M5 model tree were used. Results suggest that both models perform comparably well in terms of correlation coefficient and root mean square error values. M5 model tree provides simple linear equations that are easy to interpret and provide better insight, indicating that this approach can effectively be used as an alternative to RENB approach if the sole purpose is to predict motor vehicle crashes. Sensitivity analysis using M5 model tree also suggests that its results reflect the physical conditions. Both models clearly indicate that to improve safety on Indian highways minor accesses to the highways need to be properly designed and controlled, the service roads to be made functional and dispersion of speeds is to be brought down. Copyright © 2016 Elsevier Ltd. All rights reserved.
New flux based dose-response relationships for ozone for European forest tree species.
Büker, P; Feng, Z; Uddling, J; Briolat, A; Alonso, R; Braun, S; Elvira, S; Gerosa, G; Karlsson, P E; Le Thiec, D; Marzuoli, R; Mills, G; Oksanen, E; Wieser, G; Wilkinson, M; Emberson, L D
2015-11-01
To derive O3 dose-response relationships (DRR) for five European forest trees species and broadleaf deciduous and needleleaf tree plant functional types (PFTs), phytotoxic O3 doses (PODy) were related to biomass reductions. PODy was calculated using a stomatal flux model with a range of cut-off thresholds (y) indicative of varying detoxification capacities. Linear regression analysis showed that DRR for PFT and individual tree species differed in their robustness. A simplified parameterisation of the flux model was tested and showed that for most non-Mediterranean tree species, this simplified model led to similarly robust DRR as compared to a species- and climate region-specific parameterisation. Experimentally induced soil water stress was not found to substantially reduce PODy, mainly due to the short duration of soil water stress periods. This study validates the stomatal O3 flux concept and represents a step forward in predicting O3 damage to forests in a spatially and temporally varying climate. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T
2006-08-01
The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Temple, P.J.; Mutters, R.J.; Adams, C.
1995-06-01
Biomass sampling plots were established at 29 locations within the dominant vegetation zones of the study area. Estimates of foliar biomass were made for each plot by three independent methods: regression analysis on the basis of tree diameter, calculation of the amount of light intercepted by the leaf canopy, and extrapolation from branch leaf area. Multivariate regression analysis was used to relate these foliar biomass estimates for oak plots and conifer plots to several independent predictor variables, including elevation, slope, aspect, temperature, precipitation, and soil chemical characteristics.
Generalized and synthetic regression estimators for randomized branch sampling
David L. R. Affleck; Timothy G. Gregoire
2015-01-01
In felled-tree studies, ratio and regression estimators are commonly used to convert more readily measured branch characteristics to dry crown mass estimates. In some cases, data from multiple trees are pooled to form these estimates. This research evaluates the utility of both tactics in the estimation of crown biomass following randomized branch sampling (...
Cloud-Free Satellite Image Mosaics with Regression Trees and Histogram Matching.
E.H. Helmer; B. Ruefenacht
2005-01-01
Cloud-free optical satellite imagery simplifies remote sensing, but land-cover phenology limits existing solutions to persistent cloudiness to compositing temporally resolute, spatially coarser imagery. Here, a new strategy for developing cloud-free imagery at finer resolution permits simple automatic change detection. The strategy uses regression trees to predict...
Regression estimators for late-instar gypsy moth larvae at low pupulation densities
W.E. Wallnr; A.S. Devito; Stanley J. Zarnoch
1989-01-01
Two regression estimators were developed for determining densities of late-instar gypsy moth, Lymantria dispar (Lepidoptera: Lymantriidae), larvae from burlap band and pyrethrin spray counts on oak trees in Vermont, Massachusetts, Connecticut, and New York. Studies were conducted by marking larvae on individual burlap banded trees within 15...
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma
2017-01-01
This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849
NASA Astrophysics Data System (ADS)
Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.
2018-01-01
Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Prediction model of critical weight loss in cancer patients during particle therapy.
Zhang, Zhihong; Zhu, Yu; Zhang, Lijuan; Wang, Ziying; Wan, Hongwei
2018-01-01
The objective of this study is to investigate the predictors of critical weight loss in cancer patients receiving particle therapy, and build a prediction model based on its predictive factors. Patients receiving particle therapy were enroled between June 2015 and June 2016. Body weight was measured at the start and end of particle therapy. Association between critical weight loss (defined as >5%) during particle therapy and patients' demographic, clinical characteristic, pre-therapeutic nutrition risk screening (NRS 2002) and BMI were evaluated by logistic regression and decision tree analysis. Finally, 375 cancer patients receiving particle therapy were included. Mean weight loss was 0.55 kg, and 11.5% of patients experienced critical weight loss during particle therapy. The main predictors of critical weight loss during particle therapy were head and neck tumour location, total radiation dose ≥70 Gy on the primary tumour, and without post-surgery, as indicated by both logistic regression and decision tree analysis. Prediction model that includes tumour locations, total radiation dose and post-surgery had a good predictive ability, with the area under receiver operating characteristic curve 0.79 (95% CI: 0.71-0.88) and 0.78 (95% CI: 0.69-0.86) for decision tree and logistic regression model, respectively. Cancer patients with head and neck tumour location, total radiation dose ≥70 Gy and without post-surgery were at higher risk of critical weight loss during particle therapy, and early intensive nutrition counselling or intervention should be target at this population. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.
2017-08-01
Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
USDA-ARS?s Scientific Manuscript database
Sudden losses of managed honey bee (Apis mellifera L.) colonies are considered an important problem worldwide but the underlying cause or causes of these losses are currently unknown. In the United States, this syndrome was termed Colony Collapse Disorder (CCD), since the defining trait was a rapid ...
ERIC Educational Resources Information Center
Montoya, Isaac D.
2008-01-01
Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction
Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip
2015-01-01
Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304
NASA Astrophysics Data System (ADS)
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA, Forest Service) data set for the 31 eastern most United States. Second, 8-day composite of MODIS Land Cover, FPAR, LAI and GPP/NPP data were obtained from Jan 2001 to Dec 2004 (total 182 composite) and each product were filtered by pixel-level quality assurance data to select best quality pixels. Third, 30-year averaged climate data were collected from National Oceanic and Atmospheric Administration (NOAA) and five climatic variables were obtained: Monthly temperature, precipitation, annual heating and cooling days, and annual frost-free days. Forth, topographic data were obtained from digital elevation model (1km by 1km). This research will provide a better understanding of large-scale forest responses to environmental factors that will be beneficial for the development of important forest management applications.
Ruseva, Tatyana B; Evans, Tom P; Fischer, Burnell C
2015-05-15
This study uses a mail survey of private landowners in the Midwest United States to understand the characteristics of owners who have planted trees or intend to plant trees in the future. The analysis examines what policy tools encourage owners to plant trees, and how policy tools operate across different ownership attributes to promote tree-planting on private lands. Logistic regression results suggest that cost-subsidizing policy tools, such as low-cost and free seedlings, significantly increase the odds of actual and planned reforestation when landowners consider them important for increasing forest cover. Individuals most likely to plant trees, when low-cost seedlings are available and important, are fairly recent (<5 years), college-educated owners who own small parcels (<4 ha) and use the land for recreation. Motivations to reforest were also shaped by owners' planning horizons, connection to the land, previous tree-planting experience, and peer influence. The study has relevance for the design of policy approaches that can encourage private forestation through provision of economic incentives and capacity to private landowners. Copyright © 2015 Elsevier Ltd. All rights reserved.
Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard
2016-02-28
Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Lisa M. Ganio; Robert A. Progar
2017-01-01
Wild and prescribed fire-induced injury to forest trees can produce immediate or delayed tree mortality but fire-injured trees can also survive. Land managers use logistic regression models that incorporate tree-injury variables to discriminate between fatally injured trees and those that will survive. We used data from 4024 ponderosa pine (Pinus ponderosa...
NASA Astrophysics Data System (ADS)
Vaglio Laurin, Gaia; Puletti, Nicola; Chen, Qi; Corona, Piermaria; Papale, Dario; Valentini, Riccardo
2016-10-01
Estimates of forest aboveground biomass are fundamental for carbon monitoring and accounting; delivering information at very high spatial resolution is especially valuable for local management, conservation and selective logging purposes. In tropical areas, hosting large biomass and biodiversity resources which are often threatened by unsustainable anthropogenic pressures, frequent forest resources monitoring is needed. Lidar is a powerful tool to estimate aboveground biomass at fine resolution; however its application in tropical forests has been limited, with high variability in the accuracy of results. Lidar pulses scan the forest vertical profile, and can provide structure information which is also linked to biodiversity. In the last decade the remote sensing of biodiversity has received great attention, but few studies focused on the use of lidar for assessing tree species richness in tropical forests. This research aims at estimating aboveground biomass and tree species richness using discrete return airborne lidar in Ghana forests. We tested an advanced statistical technique, Multivariate Adaptive Regression Splines (MARS), which does not require assumptions on data distribution or on the relationships between variables, being suitable for studying ecological variables. We compared the MARS regression results with those obtained by multilinear regression and found that both algorithms were effective, but MARS provided higher accuracy either for biomass (R2 = 0.72) and species richness (R2 = 0.64). We also noted strong correlation between biodiversity and biomass field values. Even if the forest areas under analysis are limited in extent and represent peculiar ecosystems, the preliminary indications produced by our study suggest that instrument such as lidar, specifically useful for pinpointing forest structure, can also be exploited as a support for tree species richness assessment.
Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.
2014-01-01
This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…
Quirke, Michael; Curran, Emma May; O'Kelly, Patrick; Moran, Ruth; Daly, Eimear; Aylward, Seamus; McElvaney, Gerry; Wakai, Abel
2018-01-01
To measure the percentage rate and risk factors for amendment in the type, duration and setting of outpatient parenteral antimicrobial therapy ( OPAT) for the treatment of cellulitis. A retrospective cohort study of adult patients receiving OPAT for cellulitis was performed. Treatment amendment (TA) was defined as hospital admission or change in antibiotic therapy in order to achieve clinical response. Multivariable logistic regression (MVLR) and classification and regression tree (CART) analysis were performed. There were 307 patients enrolled. TA occurred in 36 patients (11.7%). Significant risk factors for TA on MVLR were increased age, increased Numerical Pain Scale Score (NPSS) and immunocompromise. The median OPAT duration was 7 days. Increased age, heart rate and C reactive protein were associated with treatment prolongation. CART analysis selected age <64.5 years, female gender and NPSS <2.5 in the final model, generating a low-sensitivity (27.8%), high-specificity (97.1%) decision tree. Increased age, NPSS and immunocompromise were associated with OPAT amendment. These identified risk factors can be used to support an evidence-based approach to patient selection for OPAT in cellulitis. The CART algorithm has good specificity but lacks sensitivity and is shown to be inferior in this study to logistic regression modelling. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P
2010-03-19
This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
2017-10-11
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M
2011-12-01
This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.
Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C
2015-01-01
We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.
Cowles, Richard S
2010-10-01
The armored scales Fiorinia externa Ferris and Aspidiotus cryptomeriae Kuwana (Hemiptera: Diaspididae) are increasingly damaging to Christmas tree plantings in southern New England. The systemic insecticide dinotefuran was investigated for selectively suppressing armored scale populations relative to their natural enemies in cooperating growers' fields in 2008 and 2009. Banded soil application of dinotefuran resulted in poor control. However, a dinotefuran spray applied to the basal 25 cm of trunk resulted in its absorption through the bark, translocation to the foliage, and good efficacy. The basal bark spray did not significantly impact the activity of predators Chilocorus stigma (Say) or Cybocephalus nipponicus Enrody-Younga and in 2009 showed a dosage-dependent improvement in the percentage of scales parasitized by Encarsia citrina Craw. A field dosage-response factorial experiment revealed that a 0.25% (vol:vol) addition of a surfactant with dinotefuran did not enhance insecticidal effect. Probit-transformed scale population reduction relative to the untreated check was subjected to linear regression analysis; reduction of scale populations was proportional to the log of insecticide dosage, whereas basal bark spray efficacy declined in proportion to the cube of tree height. The regression equation can be used to optimize dosage relative to tree height. Excellent efficacy resulted from basal bark spray application dates of 28 April (prebud break) to mid-June, but earlier spray timing within that treatment window had fewer crawlers discoloring new growth with their short-lived feeding. A basal bark spray of dinotefuran is well suited for integration with natural enemies to manage armored scales in Christmas tree plantations.
Rainfall and streamflow from small tree-covered and fern-covered and burned watersheds in Hawaii
H. W. Anderson; P. D. Duffy; Teruo Yamamoto
1966-01-01
Streamflow from two 30-acre watersheds near Honolulu was studied by using principal components regression analysis. Models using data on monthly, storm, and peak discharges were tested against several variables expressing amount and intensity of rainfall, and against variables expressing antecedent rainfall. Explained variation ranged from 78 to 94 percent. The...
Knowledge and Community: The Effect of a First-Year Seminar on Student Persistence
ERIC Educational Resources Information Center
Pittendrigh, Adele; Borkowski, John; Swinford, Steven; Plumb, Carolyn
2016-01-01
This study explores the effects of an academic seminar on the persistence of first-year college students, including effects on students most at risk of dropping out. A secondary interest was demonstrating the utility of using classification and regression tree analysis to identify relevant predictors of student persistence. The results of the…
Evaluation of open source data mining software packages
Bonnie Ruefenacht; Greg Liknes; Andrew J. Lister; Haans Fisk; Dan Wendt
2009-01-01
Since 2001, the USDA Forest Service (USFS) has used classification and regression-tree technology to map USFS Forest Inventory and Analysis (FIA) biomass, forest type, forest type groups, and National Forest vegetation. This prior work used Cubist/See5 software for the analyses. The objective of this project, sponsored by the Remote Sensing Steering Committee (RSSC),...
Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T
2006-01-01
Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446
Mukherjee, Arideep; Agrawal, Madhoolika
2018-05-15
Responses of urban vegetation to air pollution stress in relation to their tolerance and sensitivity have been extensively studied, however, studies related to air pollution responses based on different leaf functional traits and tree characteristics are limited. In this paper, we have tried to assess combined and individual effects of major air pollutants PM 10 (particulate matter ≤ 10 µm), TSP (total suspended particulate matter), SO 2 (sulphur dioxide), NO 2 (nitrogen dioxide) and O 3 (ozone) on thirteen tropical tree species in relation to fifteen leaf functional traits and different tree characteristics. Stepwise linear regression a general linear modelling approach was used to quantify the pollution response of trees against air pollutants. The study was performed for six successive seasons for two years in three distinct urban areas (traffic, industrial and residential) of Varanasi city in India. At all the study sites, concentrations of air pollutants, specifically PM (particulate matter) and NO 2 were above the specified standards. Distinct variations were recorded in all the fifteen leaf functional traits with pollution load. Caesalpinia sappan was identified as most tolerant species followed by Psidium guajava, Dalbergia sissoo and Albizia lebbeck. Stepwise regression analysis identified maximum response of Eucalyptus citriodora and P. guajava to air pollutants explaining overall 59% and 58% variability's in leaf functional traits, respectively. Among leaf functional traits, maximum effect of air pollutants was observed on non-enzymatic antioxidants followed by photosynthetic pigments and leaf water status. Among the pollutants, PM was identified as the major stress factor followed by O 3 explaining 47% and 33% variability's in leaf functional traits. Tolerance and pollution response were regulated by different tree characteristics such as height, canopy size, leaf from, texture and nature of tree. Outcomes of this study will help in urban forest development by selection of specific pollutant tolerant tree species and leaf traits, which is suitable as air pollution mitigation measure. Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung
2017-04-01
Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An optimization based on pruning of rules derived from the modified regression trees was conducted. Root Mean Square Error (RMSE) and Correlation coefficients (r) were used to optimize the rules, and finally 59 rules from modified regression trees were produced. The results show high validation r (0.79) and low validation RMSE (0.0556m3/m3). The 1 km downscaled soil moisture was evaluated using ground soil moisture data at 14 stations, and both soil moisture data showed similar temporal patterns (average r=0.51 and average RMSE=0.041). The spatial distribution of the 1 km downscaled soil moisture well corresponded with GLDAS soil moisture that caught both extremely dry and wet regions. Correlation between GLDAS and the 1 km downscaled soil moisture during growing season was positive (mean r=0.35) in most regions.
A hierarchical linear model for tree height prediction.
Vicente J. Monleon
2003-01-01
Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...
Modeling individual tree survial
Quang V. Cao
2016-01-01
Information provided by growth and yield models is the basis for forest managers to make decisions on how to manage their forests. Among different types of growth models, whole-stand models offer predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-known logistic regression is commonly used to predict tree...
Log and tree sawing times for hardwood mills
Everette D. Rast
1974-01-01
Data on 6,850 logs and 1,181 trees were analyzed to predict sawing times. For both logs and trees, regression equations were derived that express (in minutes) sawing time per log or tree and per Mbf. For trees, merchantable height is expressed in number of logs as well as in feet. One of the major uses for the tables of average sawing times is as a bench mark against...
A Millennial-length Reconstruction of the Western Pacific Pattern with Associated Paleoclimate
NASA Astrophysics Data System (ADS)
Wright, W. E.; Guan, B. T.; Wei, K.
2010-12-01
The Western Pacific Pattern (WP) is a lesser known 500 hPa pressure pattern similar to the NAO or PNA. As defined, the poles of the WP index are centered on 60°N over the Kamchatka peninsula and the neighboring Pacific and on 32.5°N over the western north Pacific. However, the area of influence for the southern half of the dipole includes a wide swath from East Asia, across Taiwan, through the Philippine Sea, to the western north Pacific. Tree rings of Taiwanese Chamaecyparis obtusa var. formosana in this extended region show significant correlation with the WP, and with local temperature. The WP is also significantly correlated with atmospheric temperatures over Taiwan, especially at 850hPa and 700 hPa, pressure levels that bracket the tree site. Spectral analysis indicates that variations in the WP occur at relatively high frequency, with most power at less than 5 years. Simple linear regression against high frequency variants of the tree-ring chronology yielded the most significant correlation coefficients. Two reconstructions are presented. The first uses a tree-ring time series produced as the first intrinsic mode function (IMF) from an Ensemble Empirical Mode Decomposition (EEMD), based on the Hilbert-Huang Transform. The significance of the regression using the EEMD-derived time series was much more significant than time series produced using traditional high pass filtering. The second also uses the first IMF of a tree-ring time series, but the dataset was first sorted and partitioned at a specified quantile prior to EEMD decomposition, with the mean of the partitioned data forming the input to the EEMD. The partitioning was done to filter out the less climatically sensitive tree rings, a common problem with shade tolerant trees. Time series statistics indicate that the first reconstruction is reliable to 1241 of the Common Era. Reliability of the second reconstruction is dependent on the development of statistics related to the quantile partitioning, and the consequent reduction in sample depth. However, the correlation coefficients from regressions over the instrumental period greatly exceed those from any other method of chronology generation, and so the technique holds promise. Additional atmospheric parameters having significant correlations against the WPO and tree ring time series with similar spatial patterns are also presented. These include vertical wind shear (850hPa-700hPa) over the northern Philippines and the Philippine Sea, surface Omega and 850hPa v-winds over the East China Sea, Japan and Taiwan. Possible links to changes in the subtropical jet stream will also be discussed.
Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela
2018-01-19
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan
2006-01-01
We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...
Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha
2016-01-01
Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.
Ozcelik, Ramazan; Gul, Altay Ugur; Merganic, Jan; Merganicova, Katarina
2008-05-01
We studied the effects of stand parameters (crown closure, basal area, stand volume, age, mean stand diameter number of trees, and heterogeneity index) and geomorphology features (elevation, aspect and slope) on tree species diversity in an example of untreated natural mixed forest stands in the eastern Black Sea region of Turkey. Tree species diversity and basal area heterogeneity in forest ecosystems are quantified using the Shannon-Weaver and Simpson indices. The relationship between tree species diversity basal area heterogeneity stand parameters and geomorphology features are examined using regression analysis. Our work revealed that the relationship between tree species diversity and stand parameters is loose with a correlation coefficient between 0.02 and 0.70. The correlation of basal area heterogeneity with stand parameters fluctuated between 0.004 and 0.77 (R2). According to our results, stands with higher tree species diversity are characterised by higher mean stand diameter number of diameter classes, basal area and lower homogeneity index value. Considering the effect of geomorphology features on tree species or basal area heterogeneity we found that all investigated relationships are loose with R < or = 0.24. A significant correlation was detected only between tree species diversity and aspect. Future work is required to verify the detected trends in behaviour of tree species diversity if it is to estimate from the usual forest stand parameters and topography characteristics.
Finding structure in data using multivariate tree boosting
Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.
2016-01-01
Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
NASA Astrophysics Data System (ADS)
Freeman, Mary Pyott
ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.
Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding
2010-05-01
Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
A form of two-phase sampling utilizing regression analysis
Michael A. Fiery; John R. Brooks
2007-01-01
A two-phase sampling technique was introduced and tested on several horizontal point sampling inventories of hardwood tracts located in northern West Virginia and western Maryland. In this sampling procedure species and dbh are recorded for all âin-treesâ on all sample points. Sawlog merchantable height was recorded on a subsample of intensively measured (second phase...
2007-06-01
other databases such as MySQL , Oracle , and Derby will be added to future versions of the program. Setting a factor requires more than changing a single...Non-Penetrating vs . Penetrating Results.............106 a. Coverage...Interaction Profile for D U-2 and C RQ-4 .......................................................89 Figure 59. R-Squared vs . Number of Regression Tree
Partitioning sources of variation in vertebrate species richness
Boone, R.B.; Krohn, W.B.
2000-01-01
Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
Using decision trees to understand structure in missing data
Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L
2015-01-01
Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509
Poulos, Helen M; Camp, Ann E
2010-04-01
The abundance and distribution of species reflect how the niche requirements of species and the dynamics of populations interact with spatial and temporal variation in the environment. This study investigated the influence of geographical variation in environmental site conditions on tree dominance and diversity patterns in three topographically dissected mountain ranges in west Texas, USA, and northern Mexico. We measured tree abundance and basal area using a systematic sampling design across the forested areas of three mountain ranges and related these data to a suite of environmental parameters derived from field and digital elevation model data. We employed cluster analysis, classification and regression trees (CART), and rarefaction to identify (1) the dominant forest cover types across the three study sites and (2) environmental influences on tree distribution and diversity patterns. Elevation, topographic position, and incident solar radiation were the major influences on tree dominance and diversity. Mesic valley bottoms hosted high-diversity vegetation types, while hotter and drier mid-slopes and ridgetops supported lower tree diversity. Valley bottoms and other topographic positions shared few species, indicating high species turnover at the landscape scale. Mountain ranges with high topographic complexity also had higher species richness, suggesting that geographical variability in environmental conditions was a major influence on tree diversity. This study stressed the importance of landscape- and regional-scale topographic variability as a key factor controlling vegetation pattern and diversity in southwestern North America.
Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.
Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan
2015-03-01
A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.
Teale, Stephen A; Letkowski, Steven; Matusick, George; Stehman, Stephen V; Castello, John D
2009-08-01
Beech scale, Cryptococcus fagisuga Lindinger, is a non-native invasive insect associated with beech bark disease. A quantitative method of measuring viable scale density at the levels of the individual tree and localized bark patches was developed. Bark patches (10 cm(2)) were removed at 0, 1, and 2 m above the ground and at the four cardinal directions from 13 trees in northern New York and 12 trees in northern Michigan. Digital photographs of each patch were made, and the wax mass area was measured from two random 1-cm(2) subsamples on each bark patch using image analysis software. Viable scale insects were counted after removing the wax under a dissecting microscope. Separate regression analyses at the whole tree level for the New York and Michigan sites each showed a strong positive relationship of wax mass area with the number of underlying viable scale insects. The relationships for the New York and Michigan data were not significantly different from each other, and when pooling data from the two sites, there was still a significant positive relationship between wax mass area and the number of scale insects. The relationships between viable scale insects and wax mass area were different at the 0-, 1-, and 2-m sampling heights but do not seem to affect the relationship. This method does not disrupt the insect or its interactions with the host tree.
Climate Response of Tree Radial Growth at Different Timescales in the Qinling Mountains.
Sun, Changfeng; Liu, Yu
2016-01-01
The analysis of the tree radial growth response to climate is crucial for dendroclimatological research. However, the response relationships between tree-ring indices and climatic factors at different timescales are not yet clear. In this study, the tree-ring width of Huashan pine (Pinus armandii) from Huashan in the Qinling Mountains, north-central China, was used to explore the response differences of tree growth to climatic factors at daily, pentad (5 days), dekad (10 days) and monthly timescales. Correlation function and linear regression analysis were applied in this paper. The tree-ring width showed a more sensitive response to daily and pentad climatic factors. With the timescale decreasing, the absolute value of the maximum correlation coefficient between the tree-ring data and precipitation increases as well as temperature (mean, minimum and maximum temperature). Compared to the other three timescales, pentad was more suitable for analysing the response of tree growth to climate. Relative to the monthly climate data, the association between the tree-ring data and the pentad climate data was more remarkable and accurate, and the reconstruction function based on the pentad climate was also more reliable and stable. We found that the major climatic factor limiting Huashan pine growth was the precipitation of pentads 20-35 (from April 6 to June 24) rather than the well-known April-June precipitation. The pentad was also proved to be a better timescale for analysing the climate and tree growth in the western and eastern Qinling Mountains. The formation of the earlywood density of Chinese pine (Pinus tabulaeformis) from Shimenshan in western Qinling was mainly affected by the maximum temperature of pentads 28-32 (from May 16 to June 9). The maximum temperature of pentads 28-33 (from May 16 to June 14) was the major factor affecting the ring width of Chinese pine from Shirenshan in eastern Qinling.
Modeling Caribbean tree stem diameters from tree height and crown width measurements
Thomas Brandeis; KaDonna Randolph; Mike Strub
2009-01-01
Regression models to predict diameter at breast height (DBH) as a function of tree height and maximum crown radius were developed for Caribbean forests based on data collected by the U.S. Forest Service in the Commonwealth of Puerto Rico and Territory of the U.S. Virgin Islands. The model predicting DBH from tree height fit reasonably well (R2 = 0.7110), with...
Leaf phenological characters of main tree species in urban forest of Shenyang.
Xu, Sheng; Xu, Wenduo; Chen, Wei; He, Xingyuan; Huang, Yanqing; Wen, Hua
2014-01-01
Plant leaves, as the main photosynthetic organs and the high energy converters among primary producers in terrestrial ecosystems, have attracted significant research attention. Leaf lifespan is an adaptive characteristic formed by plants to obtain the maximum carbon in the long-term adaption process. It determines important functional and structural characteristics exhibited in the environmental adaptation of plants. However, the leaf lifespan and leaf characteristics of urban forests were not studied up to now. By using statistic, linear regression methods and correlation analysis, leaf phenological characters of main tree species in urban forest of Shenyang were observed for five years to obtain the leafing phenology (including leafing start time, end time, and duration), defoliating phenology (including defoliation start time, end time, and duration), and the leaf lifespan of the main tree species. Moreover, the relationships between temperature and leafing phenology, defoliating phenology, and leaf lifespan were analyzed. The timing of leafing differed greatly among species. The early leafing species would have relatively early end of leafing; the longer it took to the end of leafing would have a later time of completed leafing. The timing of defoliation among different species varied significantly, the early defoliation species would have relatively longer duration of defoliation. If the mean temperature rise for 1°C in spring, the time of leafing would experience 5 days earlier in spring. If the mean temperature decline for 1°C, the time of defoliation would experience 3 days delay in autumn. There is significant correlation between leaf longevity and the time of leafing and defoliation. According to correlation analysis and regression analysis, there is significant correlation between temperature and leafing and defoliation phenology. Early leafing species would have a longer life span and consequently have advantage on carbon accumulation compared with later defoliation species.
Breeding habitat preference of preimaginal black flies (Diptera: Simuliidae) in Peninsular Malaysia.
Ya'cob, Zubaidah; Takaoka, Hiroyuki; Pramual, Pairot; Low, Van Lun; Sofian-Azirun, Mohd
2016-01-01
To investigate the breeding habitat preference of black flies, a comprehensive black fly survey was conducted for the first time in Peninsular Malaysia. Preimaginal black flies (pupae and larvae) were collected manually from 180 stream points encompassing northern, southern, central and east coast of the Peninsular Malaysia. A total of 47 black fly species were recorded in this study. The predominant species were Simulium trangense (36.7%) and Simulium angulistylum (33.3%). Relatively common species were Simulium cheongi (29.4%), Simulium tani (25.6%), Simulium nobile (16.2%), Simulium sheilae (14.5%) and Simulium bishopi (10.6%). Principal Component Analysis (PCA) of all stream variables revealed four PCs that accounted for 69.3% of the total intersite variance. Regression analysis revealed that high species richness is associated with larger, deeper, faster and higher discharge streams with larger streambed particles, more riparian vegetation and low pH (F=22.7, d.f.=1, 173; P<0.001). Relationship between species occurrence of seven common species (present in >10% of the sampling sites) was assessed. Forward logistic regression analysis indicated that four species were significantly related to the stream variables. S. nobile and S. tani prefer large, fast flowing streams with higher pH, large streambed particles and riparian trees. S. bishopi was commonly found at high elevation with cooler stream, low conductivity, higher conductivity and more riparian trees. In contrast, S. sheilae was negatively correlated with PC-2, thus, this species commonly found at low elevation, warmer stream with low conductivity and less riparian trees. The results of this study are consistent with previous studies from other geographic regions, which indicated that both physical and chemical stream conditions are the key factors for black fly ecology. Copyright © 2015 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Serrato, M.; Jungho, I.; Jensen, J.
2012-01-17
Remote sensing technology can provide a cost-effective tool for monitoring hazardous waste sites. This study investigated the usability of HyMap airborne hyperspectral remote sensing data (126 bands at 2.3 x 2.3 m spatial resolution) to characterize the vegetation at U.S. Department of Energy uranium processing sites near Monticello, Utah and Monument Valley, Arizona. Grass and shrub species were mixed on an engineered disposal cell cover at the Monticello site while shrub species were dominant in the phytoremediation plantings at the Monument Valley site. The specific objectives of this study were to: (1) estimate leaf-area-index (LAI) of the vegetation using threemore » different methods (i.e., vegetation indices, red-edge positioning (REP), and machine learning regression trees), and (2) map the vegetation cover using machine learning decision trees based on either the scaled reflectance data or mixture tuned matched filtering (MTMF)-derived metrics and vegetation indices. Regression trees resulted in the best calibration performance of LAI estimation (R{sup 2} > 0.80). The use of REPs failed to accurately predict LAI (R{sup 2} < 0.2). The use of the MTMF-derived metrics (matched filter scores and infeasibility) and a range of vegetation indices in decision trees improved the vegetation mapping when compared to the decision tree classification using just the scaled reflectance. Results suggest that hyperspectral imagery are useful for characterizing biophysical characteristics (LAI) and vegetation cover on capped hazardous waste sites. However, it is believed that the vegetation mapping would benefit from the use of 1 higher spatial resolution hyperspectral data due to the small size of many of the vegetation patches (< 1m) found on the sites.« less
Pestana, Maribela; Beja, Pedro; Correia, Pedro José; de Varennes, Amarilis; Faria, Eugénio Araújo
2005-06-01
To determine if flower nutrient composition can be used to predict fruit quality, a field experiment was conducted over three seasons (1996-1999) in a commercial orange orchard (Citrus sinensis (L.) Osbeck cv. 'Valencia Late', budded on Troyer citrange rootstock) established on a calcareous soil in southern Portugal. Flowers were collected from 20 trees during full bloom in April and their nutrient composition determined, and fruits were harvested the following March and their quality evaluated. Patterns of covariation in flower nutrient concentrations and in fruit quality variables were evaluated by principal component analysis. Regression models relating fruit quality variables to flower nutrient composition were developed by stepwise selection procedures. The predictive power of the regression models was evaluated with an independent data set. Nutrient composition of flowers at full bloom could be used to predict the fruit quality variables fresh fruit mass and maturation index in the following year. Magnesium, Ca and Zn concentrations measured in flowers were related to fruit fresh mass estimations and N, P, Mg and Fe concentrations were related to fruit maturation index. We also established reference values for the nutrient composition of flowers based on measurements made in trees that produced large (> 76 mm in diameter) fruit.
Reconstructions of Soil Moisture for the Upper Colorado River Basin Using Tree-Ring Chronologies
NASA Astrophysics Data System (ADS)
Tootle, G.; Anderson, S.; Grissino-Mayer, H.
2012-12-01
Soil moisture is an important factor in the global hydrologic cycle, but existing reconstructions of historic soil moisture are limited. Tree-ring chronologies (TRCs) were used to reconstruct annual soil moisture in the Upper Colorado River Basin (UCRB). Gridded soil moisture data were spatially regionalized using principal components analysis and k-nearest neighbor techniques. Moisture sensitive tree-ring chronologies in and adjacent to the UCRB were correlated with regional soil moisture and tested for temporal stability. TRCs that were positively correlated and stable for the calibration period were retained. Stepwise linear regression was applied to identify the best predictor combinations for each soil moisture region. The regressions explained 42-78% of the variability in soil moisture data. We performed reconstructions for individual soil moisture grid cells to enhance understanding of the disparity in reconstructive skill across the regions. Reconstructions that used chronologies based on ponderosa pines (Pinus ponderosa) and pinyon pines (Pinus edulis) explained increased variance in the datasets. Reconstructed soil moisture was standardized and compared with standardized reconstructed streamflow and snow water equivalent from the same region. Soil moisture reconstructions were highly correlated with streamflow and snow water equivalent reconstructions, indicating reconstructions of soil moisture in the UCRB using TRCs successfully represent hydrologic trends, including the identification of periods of prolonged drought.
Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L
2017-07-01
To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.
Hitt, Nathaniel P.; Floyd, Michael; Compton, Michael; McDonald, Kenneth
2016-01-01
Chrosomus cumberlandensis (Blackside Dace [BSD]) and Etheostoma spilotum (Kentucky Arrow Darter [KAD]) are fish species of conservation concern due to their fragmented distributions, their low population sizes, and threats from anthropogenic stressors in the southeastern United States. We evaluated the relationship between fish abundance and stream conductivity, an index of environmental quality and potential physiological stressor. We modeled occurrence and abundance of KAD in the upper Kentucky River basin (208 samples) and BSD in the upper Cumberland River basin (294 samples) for sites sampled between 2003 and 2013. Segmented regression indicated a conductivity change-point for BSD abundance at 343 μS/cm (95% CI: 123–563 μS/cm) and for KAD abundance at 261 μS/cm (95% CI: 151–370 μS/cm). In both cases, abundances were negligible above estimated conductivity change-points. Post-hoc randomizations accounted for variance in estimated change points due to unequal sample sizes across the conductivity gradients. Boosted regression-tree analysis indicated stronger effects of conductivity than other natural and anthropogenic factors known to influence stream fishes. Boosted regression trees further indicated threshold responses of BSD and KAD occurrence to conductivity gradients in support of segmented regression results. We suggest that the observed conductivity relationship may indicate energetic limitations for insectivorous fishes due to changes in benthic macroinvertebrate community composition.
A quantitative analysis to objectively appraise drought indicators and model drought impacts
NASA Astrophysics Data System (ADS)
Bachmair, S.; Svensson, C.; Hannaford, J.; Barker, L. J.; Stahl, K.
2016-07-01
Drought monitoring and early warning is an important measure to enhance resilience towards drought. While there are numerous operational systems using different drought indicators, there is no consensus on which indicator best represents drought impact occurrence for any given sector. Furthermore, thresholds are widely applied in these indicators but, to date, little empirical evidence exists as to which indicator thresholds trigger impacts on society, the economy, and ecosystems. The main obstacle for evaluating commonly used drought indicators is a lack of information on drought impacts. Our aim was therefore to exploit text-based data from the European Drought Impact report Inventory (EDII) to identify indicators that are meaningful for region-, sector-, and season-specific impact occurrence, and to empirically determine indicator thresholds. In addition, we tested the predictability of impact occurrence based on the best-performing indicators. To achieve these aims we applied a correlation analysis and an ensemble regression tree approach, using Germany and the UK (the most data-rich countries in the EDII) as test beds. As candidate indicators we chose two meteorological indicators (Standardized Precipitation Index, SPI, and Standardized Precipitation Evaporation Index, SPEI) and two hydrological indicators (streamflow and groundwater level percentiles). The analysis revealed that accumulation periods of SPI and SPEI best linked to impact occurrence are longer for the UK compared with Germany, but there is variability within each country, among impact categories and, to some degree, seasons. The median of regression tree splitting values, which we regard as estimates of thresholds of impact occurrence, was around -1 for SPI and SPEI in the UK; distinct differences between northern/northeastern vs. southern/central regions were found for Germany. Predictions with the ensemble regression tree approach yielded reasonable results for regions with good impact data coverage. The predictions also provided insights into the EDII, in particular highlighting drought events where missing impact reports may reflect a lack of recording rather than true absence of impacts. Overall, the presented quantitative framework proved to be a useful tool for evaluating drought indicators, and to model impact occurrence. In summary, this study demonstrates the information gain for drought monitoring and early warning through impact data collection and analysis. It highlights the important role that quantitative analysis with impact data can have in providing "ground truth" for drought indicators, alongside more traditional stakeholder-led approaches.
Facchinello, Yann; Beauséjour, Marie; Richard-Denis, Andreane; Thompson, Cynthia; Mac-Thiong, Jean-Marc
2017-10-25
Predicting the long-term functional outcome following traumatic spinal cord injury is needed to adapt medical strategies and to plan an optimized rehabilitation. This study investigates the use of regression tree for the development of predictive models based on acute clinical and demographic predictors. This prospective study was performed on 172 patients hospitalized following traumatic spinal cord injury. Functional outcome was quantified using the Spinal Cord Independence Measure collected within the first-year post injury. Age, delay prior to surgery and Injury Severity Score were considered as continuous predictors while energy of injury, trauma mechanisms, neurological level of injury, injury severity, occurrence of early spasticity, urinary tract infection, pressure ulcer and pneumonia were coded as categorical inputs. A simplified model was built using only injury severity, neurological level, energy and age as predictor and was compared to a more complex model considering all 11 predictors mentioned above The models built using 4 and 11 predictors were found to explain 51.4% and 62.3% of the variance of the Spinal Cord Independence Measure total score after validation, respectively. The severity of the neurological deficit at admission was found to be the most important predictor. Other important predictors were the Injury Severity Score, age, neurological level and delay prior to surgery. Regression trees offer promising performances for predicting the functional outcome after a traumatic spinal cord injury. It could help to determine the number and type of predictors leading to a prediction model of the functional outcome that can be used clinically in the future.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Indicators of Terrorism Vulnerability in Africa
2015-03-26
the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Yoshioka, Fumi; Azuma, Emiko; Nakajima, Takae; Hashimoto, Masafumi; Toyoshima, Kyoichiro; Komachi, Yoshio
2004-08-01
To clarify the living environment factors that increase the risk of allergic sensitization to house dust mites, we applied a regression binary tree-based method (CART, Classification & Regression Trees) to an epidemiological study on airway allergy. The utility of the tree map in personal sanitary guidance for preventing allergic sensitization was examined with respect to feasibility and validity. A questionnaire was given to 386 healthy adult women, asking them about their individual living environments. Also, blood samples were collected to measure Dermatophagoides pteronyssinus (Dp)-specific IgE, the presence/absence of Dp-sensitization being expressed as positive/negative. The questionnaire consisted of nine items on (1) home ventilation by keeping windows open, (2) personal or family smoking habits, (3) use of air conditioners in hot weather, (4) type of flooring (tatami/wooden/carpet) in the living room, (5) visible mold proliferation in the kitchen, (6) type of housing (concrete/wooden), (7) residential area (heavy or light traffic area) (8) heating system (use of unventilated combustion appliances), and (9) frequency of cleaning (every day or less often). There also were queries on the past history of airway allergic diseases, such as bronchial asthma and allergic rhinitis. CART and a multivariate logistic regression analysis (MLRA) were performed. The subjects were first classified into two groups, with and without a history of airway allergic diseases (Groups WPH and WOPH). In each group, the involvement of living environment factors in Dp-sensitization was examined using CART and MLRA. In the MLRA study, individual living environment factors showed promotional or suppressive effects on Dp-sensitization with differences between the two groups. With respect to the CART results, the two groups were first split by the factor that had the most significant odds ratio for MLRA. In Group WPH, which had a Dp-sensitization risk of 19.5%, the first split was by the factor of visible mold proliferation in the kitchen into the factor-present group with a risk value of 45.5% and the factor-absent group with 13.5%. The mold proliferation group was split with reference to frequent cleaning, and the risk rose to 75% in the factor-absent group and to 100% when family smoking habits were reported. Group WOPH (the risk: 10.8%) was first split into two groups according to the use of air conditioners in hot weather for more than 6 hours a day or less, which showed risk values of 16.7% and 6.9%, respectively. The risk of the group that intensively used air conditioners fell to 8.3% with tatami as flooring in the living room, and, if others, rose to 20.8%. The risk of the factor-lacking group fell to 4.0% without wooden flooring. CART analysis enables us to express complex relationships between living environment factors and Dp-sensitization simply by a binary regression tree, pointing to preventive strategies that can be flexibly changed according to the individual living environments of the subjects.
Developing Models to Forcast Sales of Natural Christmas Trees
Lawrence D. Garrett; Thomas H. Pendleton
1977-01-01
A study of practices for marketing Christmas trees in Winston-Salem, North Carolina, and Denver, Colorado, revealed that such factors as retail lot competition, tree price, consumer traffic, and consumer income were very important in determining a particular retailer's sales. Analyses of 4 years of market data were used in developing regression models for...
Comprehensive database of diameter-based biomass regressions for North American tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2004-01-01
A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...
Tree-ring variation in western larch (Larix occidentalis Nutt. ) exposed to sulfur dioxide emissions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fox, C.A.; Kincaid, W.B.; Nash, T.H. III
1984-12-01
Tree-ring analysis of western larch (Larix occidentialis Nutt) demonstrated both direct and indirect affects of sulfur dioxide emissions from the lead/zinc smelter at Trail, B.C. Tree cores were collected from 5 stands known to have been polluted and from 3 control stands. Age effects were removed by fitting theoretical growth curves, and macrocliate was modeled using the average of the controls and two laged values thereof. Separate analyses were performed for years before and after installation of two tall stacks, for drought and nondrought years, and for years prior to initiation of smelting. Regression analyses revealed a negative effect onmore » annual growth that diminished with increasing distance from the smelter and during drought years. Furthermore, chronology statistics suggested an increase in sensitivity to climate that persisted decades beyond implementation of pollution controls, which reduced emissions 10-fold. 38 references, 6 figures, 3 tables.« less
Identification of patients with gout: elaboration of a questionnaire for epidemiological studies.
Richette, P; Clerson, P; Bouée, S; Chalès, G; Doherty, M; Flipo, R M; Lambert, C; Lioté, F; Poiraud, T; Schaeverbeke, T; Bardin, T
2015-09-01
In France, the prevalence of gout is currently unknown. We aimed to design a questionnaire to detect gout that would be suitable for use in a telephone survey by non-physicians and assessed its performance. We designed a 62-item questionnaire covering comorbidities, clinical features and treatment of gout. In a case-control study, we enrolled patients with a history of arthritis who had undergone arthrocentesis for synovial fluid analysis and crystal detection. Cases were patients with crystal-proven gout and controls were patients who had arthritis and effusion with no monosodium urate crystals in synovial fluid. The questionnaire was administered by phone to cases and controls by non-physicians who were unaware of the patient diagnosis. Logistic regression analysis and classification and regression trees were used to select items discriminating cases and controls. We interviewed 246 patients (102 cases and 142 controls). Two logistic regression models (sensitivity 88.0% and 87.5%; specificity 93.0% and 89.8%, respectively) and one classification and regression tree model (sensitivity 81.4%, specificity 93.7%) revealed 11 informative items that allowed for classifying 90.0%, 88.8% and 88.5% of patients, respectively. We developed a questionnaire to detect gout containing 11 items that is fast and suitable for use in a telephone survey by non-physicians. The questionnaire demonstrated good properties for discriminating patients with and without gout. It will be administered in a large sample of the general population to estimate the prevalence of gout in France. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Atlas of climate change effects in 150 bird species of the Eastern United States
Stephen Matthews; Raymond O' Connor; Louis R. Iverson; Anantha M. Prasad
2004-01-01
NOTE: Instructions for navigating this publication can be found on the front cover. This atlas documents the current and potential future distribution of 150 common bird species in the Eastern United States. Distribution data for individual species were derived from the Breeding Bird Survey (BBS) from 1981 to 1990. Regression tree analysis was used to model the BBS...
Rifai, Sami W; Urquiza Muñoz, José D; Negrón-Juárez, Robinson I; Ramírez Arévalo, Fredy R; Tello-Espinoza, Rodil; Vanderwel, Mark C; Lichstein, Jeremy W; Chambers, Jeffrey Q; Bohlman, Stephanie A
2016-10-01
Wind disturbance can create large forest blowdowns, which greatly reduces live biomass and adds uncertainty to the strength of the Amazon carbon sink. Observational studies from within the central Amazon have quantified blowdown size and estimated total mortality but have not determined which trees are most likely to die from a catastrophic wind disturbance. Also, the impact of spatial dependence upon tree mortality from wind disturbance has seldom been quantified, which is important because wind disturbance often kills clusters of trees due to large treefalls killing surrounding neighbors. We examine (1) the causes of differential mortality between adult trees from a 300-ha blowdown event in the Peruvian region of the northwestern Amazon, (2) how accounting for spatial dependence affects mortality predictions, and (3) how incorporating both differential mortality and spatial dependence affect the landscape level estimation of necromass produced from the blowdown. Standard regression and spatial regression models were used to estimate how stem diameter, wood density, elevation, and a satellite-derived disturbance metric influenced the probability of tree death from the blowdown event. The model parameters regarding tree characteristics, topography, and spatial autocorrelation of the field data were then used to determine the consequences of non-random mortality for landscape production of necromass through a simulation model. Tree mortality was highly non-random within the blowdown, where tree mortality rates were highest for trees that were large, had low wood density, and were located at high elevation. Of the differential mortality models, the non-spatial models overpredicted necromass, whereas the spatial model slightly underpredicted necromass. When parameterized from the same field data, the spatial regression model with differential mortality estimated only 7.5% more dead trees across the entire blowdown than the random mortality model, yet it estimated 51% greater necromass. We suggest that predictions of forest carbon loss from wind disturbance are sensitive to not only the underlying spatial dependence of observations, but also the biological differences between individuals that promote differential levels of mortality. © 2016 by the Ecological Society of America.
O’Connor, Christopher D.; Lynch, Ann M.
2016-01-01
A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST. PMID:27391084
Swetnam, Tyson L; O'Connor, Christopher D; Lynch, Ann M
2016-01-01
A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST.
Association between split selection instability and predictive error in survival trees.
Radespiel-Tröger, M; Gefeller, O; Rabenstein, T; Hothorn, T
2006-01-01
To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study. We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node. We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error. The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.
Barazani, Oz; Waitz, Yoni; Tugendhaft, Yizhar; Dorman, Michael; Dag, Arnon; Hamidat, Mohammed; Hijawi, Thameen; Kerem, Zohar; Westberg, Erik; Kadereit, Joachim W
2017-02-06
A previous multi-locus lineage (MLL) analysis of SSR-microsatellite data of old olive trees in the southeast Mediterranean area had shown the predominance of the Souri cultivar (MLL1) among grafted trees. The MLL analysis had also identified an MLL (MLL7) that was more common among rootstocks than other MLLs. We here present a comparison of the MLL combinations MLL1 (scion)/MLL7 (rootstock) and MLL1/MLL1 in order to investigate the possible influence of rootstock on scion phenotype. A linear regression analysis demonstrated that the abundance of MLL1/MLL7 trees decreases and of MLL1/MLL1 trees increases along a gradient of increasing aridity. Hypothesizing that grafting on MLL7 provides an advantage under certain conditions, Akaike information criterion (AIC) model selection procedure was used to assess the influence of different environmental conditions on phenotypic characteristics of the fruits and oil of the two MLL combinations. The most parsimonious models indicated differential influences of environmental conditions on parameters of olive oil quality in trees belonging to the MLL1/MLL7 and MLL1/MLL1 combinations, but a similar influence on fruit characteristics and oil content. These results suggest that in certain environments grafting of the local Souri cultivar on MLL7 rootstocks and the MLL1/MLL1 combination result in improved oil quality. The decreasing number of MLL1/MLL7 trees along an aridity gradient suggests that use of this genotype combination in arid sites was not favoured because of sensitivity of MLL7 to drought. Our results thus suggest that MLL1/MLL7 and MLL1/MLL1 combinations were selected by growers in traditional rain-fed cultivation under Mediterranean climate conditions in the southeast Mediterranean area.
Bark flammability as a fire-response trait for subalpine trees
Frejaville, Thibaut; Curt, Thomas; Carcaillet, Christopher
2013-01-01
Relationships between the flammability properties of a given plant and its chances of survival after a fire still remain unknown. We hypothesize that the bark flammability of a tree reduces the potential for tree survival following surface fires, and that if tree resistance to fire is provided by a thick insulating bark, the latter must be few flammable. We test, on subalpine tree species, the relationship between the flammability of bark and its insulating ability, identifies the biological traits that determine bark flammability, and assesses their relative susceptibility to surface fires from their bark properties. The experimental set of burning properties was analyzed by Principal Component Analysis to assess the bark flammability. Bark insulating ability was expressed by the critical time to cambium kill computed from bark thickness. Log-linear regressions indicated that bark flammability varies with the bark thickness and the density of wood under bark and that the most flammable barks have poor insulating ability. Susceptibility to surface fires increases from gymnosperm to angiosperm subalpine trees. The co-dominant subalpine species Larix decidua (Mill.) and Pinus cembra (L.) exhibit large differences in both flammability and insulating ability of the bark that should partly explain their contrasted responses to fires in the past. PMID:24324473
Prognoses of diameter and height of trees of eucalyptus using artificial intelligence.
Vieira, Giovanni Correia; de Mendonça, Adriano Ribeiro; da Silva, Gilson Fernandes; Zanetti, Sidney Sára; da Silva, Mayra Marques; Dos Santos, Alexandre Rosa
2018-04-01
Models of individual trees are composed of sub-models that generally estimate competition, mortality, and growth in height and diameter of each tree. They are usually adopted when we want more detailed information to estimate forest multiproduct. In these models, estimates of growth in diameter at 1.30m above the ground (DBH) and total height (H) are obtained by regression analysis. Recently, artificial intelligence techniques (AIT) have been used with satisfactory performance in forest measurement. Therefore, the objective of this study was to evaluate the performance of two AIT, artificial neural networks and adaptive neuro-fuzzy inference system, to estimate the growth in DBH and H of eucalyptus trees. We used data of continuous forest inventories of eucalyptus, with annual measurements of DBH, H, and the dominant height of trees of 398 plots, plus two qualitative variables: genetic material and site index. It was observed that the two AIT showed accuracy in growth estimation of DBH and H. Therefore, the two techniques discussed can be used for the prognosis of DBH and H in even-aged eucalyptus stands. The techniques used could also be adapted to other areas and forest species. Copyright © 2017 Elsevier B.V. All rights reserved.
Lofaro, Danilo; Jager, Kitty J; Abu-Hanna, Ameen; Groothoff, Jaap W; Arikoski, Pekka; Hoecker, Britta; Roussey-Kesler, Gwenaelle; Spasojević, Brankica; Verrina, Enrico; Schaefer, Franz; van Stralen, Karlijn J
2016-02-01
Identification of patient groups by risk of renal graft loss might be helpful for accurate patient counselling and clinical decision-making. Survival tree models are an alternative statistical approach to identify subgroups, offering cut-off points for covariates and an easy-to-interpret representation. Within the European Society of Pediatric Nephrology/European Renal Association-European Dialysis and Transplant Association (ESPN/ERA-EDTA) Registry data we identified paediatric patient groups with specific profiles for 5-year renal graft survival. Two analyses were performed, including (i) parameters known at time of transplantation and (ii) additional clinical measurements obtained early after transplantation. The identified subgroups were added as covariates in two survival models. The prognostic performance of the models was tested and compared with conventional Cox regression analyses. The first analysis included 5275 paediatric renal transplants. The best 5-year graft survival (90.4%) was found among patients who received a renal graft as a pre-emptive transplantation or after short-term dialysis (<45 days), whereas graft survival was poorest (51.7%) in adolescents transplanted after long-term dialysis (>2.2 years). The Cox model including both pre-transplant factors and tree subgroups had a significantly better predictive performance than conventional Cox regression (P < 0.001). In the analysis including clinical factors, graft survival ranged from 97.3% [younger patients with estimated glomerular filtration rate (eGFR) >30 mL/min/1.73 m(2) and dialysis <20 months] to 34.7% (adolescents with eGFR <60 mL/min/1.73 m(2) and dialysis >20 months). Also in this case combining tree findings and clinical factors improved the predictive performance as compared with conventional Cox model models (P < 0.0001). In conclusion, we demonstrated the tree model to be an accurate and attractive tool to predict graft failure for patients with specific characteristics. This may aid the evaluation of individual graft prognosis and thereby the design of measures to improve graft survival in the poor prognosis groups. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Unravelling the limits to tree height: a major role for water and nutrient trade-offs.
Cramer, Michael D
2012-05-01
Competition for light has driven forest trees to grow exceedingly tall, but the lack of a single universal limit to tree height indicates multiple interacting environmental limitations. Because soil nutrient availability is determined by both nutrient concentrations and soil water, water and nutrient availabilities may interact in determining realised nutrient availability and consequently tree height. In SW Australia, which is characterised by nutrient impoverished soils that support some of the world's tallest forests, total [P] and water availability were independently correlated with tree height (r = 0.42 and 0.39, respectively). However, interactions between water availability and each of total [P], pH and [Mg] contributed to a multiple linear regression model of tree height (r = 0.72). A boosted regression tree model showed that maximum tree height was correlated with water availability (24%), followed by soil properties including total P (11%), Mg (10%) and total N (9%), amongst others, and that there was an interaction between water availability and total [P] in determining maximum tree height. These interactions indicated a trade-off between water and P availability in determining maximum tree height in SW Australia. This is enabled by a species assemblage capable of growing tall and surviving (some) disturbances. The mechanism for this trade-off is suggested to be through water enabling mass-flow and diffusive mobility of P, particularly of relatively mobile organic P, although water interactions with microbial activity could also play a role.
Multivariate regression model for partitioning tree volume of white oak into round-product classes
Daniel A. Yaussy; David L. Sonderman
1984-01-01
Describes the development of multivariate equations that predict the expected cubic volume of four round-product classes from independent variables composed of individual tree-quality characteristics. Although the model has limited application at this time, it does demonstrate the feasibility of partitioning total tree cubic volume into round-product classes based on...
Kathleen L. Kavanaugh; Matthew B. Dickinson; Anthony S. Bova
2010-01-01
Current operational methods for predicting tree mortality from fire injury are regression-based models that only indirectly consider underlying causes and, thus, have limited generality. A better understanding of the physiological consequences of tree heating and injury are needed to develop biophysical process models that can make predictions under changing or novel...
Height-age relationships for regeneration-size trees in the northern Rocky Mountains, USA
Dennis E. Ferguson; Clinton E. Carlson
2010-01-01
Regression equations were developed to predict heights of 10 conifer species inregenerating stands in central and northern Idaho, western Montana, and eastern Washington. Most sample trees were natural regeneration that became established after conventional harvest and site preparation methods. Heights are predicted as a function of tree age, residual overstory density...
Toward a Periodic Table of Niches, or Exploring the Lizard Niche Hypervolume.
Pianka, Eric R; Vitt, Laurie J; Pelegrin, Nicolás; Fitzgerald, Daniel B; Winemiller, Kirk O
2017-11-01
Widespread niche convergence suggests that species can be organized according to functional trait combinations to create a framework analogous to a periodic table. We compiled ecological data for lizards to examine patterns of global and regional niche diversification, and we used multivariate statistical approaches to develop the beginnings for a periodic table of niches. Data (50+ variables) for five major niche dimensions (habitat, diet, life history, metabolism, defense) were compiled for 134 species of lizards representing 24 of the 38 extant families. Principal coordinates analyses were performed on niche dimensional data sets, and species scores for the first three axes were used as input for a principal components analysis to ordinate species in continuous niche space and for a regression tree analysis to separate species into discrete niche categories. Three-dimensional models facilitate exploration of species positions in relation to major gradients within the niche hypervolume. The first gradient loads on body size, foraging mode, and clutch size. The second was influenced by metabolism and terrestrial versus arboreal microhabitat. The third was influenced by activity time, life history, and diet. Natural dichotomies are activity time, foraging mode, parity mode, and habitat. Regression tree analysis identified 103 cases of extreme niche conservatism within clades and 100 convergences between clades. Extending this approach to other taxa should lead to a wider understanding of niche evolution.
NASA Astrophysics Data System (ADS)
Deng, Chengbin; Wu, Changshan
2013-12-01
Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
Effect of Landscape Pattern on Insect Species Density within Urban Green Spaces in Beijing, China
Su, Zhimin; Li, Xiaoma; Zhou, Weiqi; Ouyang, Zhiyun
2015-01-01
Urban green space is an important refuge of biodiversity in urban areas. Therefore, it is crucial to understand the relationship between the landscape pattern of green spaces and biodiversity to mitigate the negative effects of urbanization. In this study, we collected insects from 45 green patches in Beijing during July 2012 using suction sampling. The green patches were dominated by managed lawns, mixed with scattered trees and shrubs. We examined the effects of landscape pattern on insect species density using hierarchical partitioning analysis and partial least squares regression. The results of the hierarchical partitioning analysis indicated that five explanatory variables, i.e., patch area (with 19.9% independent effects), connectivity (13.9%), distance to nearest patch (13.8%), diversity for patch types (11.0%), and patch shape (8.3%), significantly contributed to insect species density. With the partial least squares regression model, we found species density was negatively related to patch area, shape, connectivity, diversity for patch types and proportion of impervious surface at the significance level of p < 0.05 and positively related to proportion of vegetated land. Regression tree analysis further showed that the highest species density was found in green patches with an area <500 m2. Our results indicated that improvement in habitat quality, such as patch area and connectivity that are typically thought to be important for conservation, did not actually increase species density. However, increasing compactness (low-edge) of patch shape and landscape composition did have the expected effect. Therefore, it is recommended that the composition of the surrounding landscape should be considered simultaneously with planned improvements in local habitat quality. PMID:25793897
Effect of landscape pattern on insect species density within urban green spaces in Beijing, China.
Su, Zhimin; Li, Xiaoma; Zhou, Weiqi; Ouyang, Zhiyun
2015-01-01
Urban green space is an important refuge of biodiversity in urban areas. Therefore, it is crucial to understand the relationship between the landscape pattern of green spaces and biodiversity to mitigate the negative effects of urbanization. In this study, we collected insects from 45 green patches in Beijing during July 2012 using suction sampling. The green patches were dominated by managed lawns, mixed with scattered trees and shrubs. We examined the effects of landscape pattern on insect species density using hierarchical partitioning analysis and partial least squares regression. The results of the hierarchical partitioning analysis indicated that five explanatory variables, i.e., patch area (with 19.9% independent effects), connectivity (13.9%), distance to nearest patch (13.8%), diversity for patch types (11.0%), and patch shape (8.3%), significantly contributed to insect species density. With the partial least squares regression model, we found species density was negatively related to patch area, shape, connectivity, diversity for patch types and proportion of impervious surface at the significance level of p < 0.05 and positively related to proportion of vegetated land. Regression tree analysis further showed that the highest species density was found in green patches with an area <500 m2. Our results indicated that improvement in habitat quality, such as patch area and connectivity that are typically thought to be important for conservation, did not actually increase species density. However, increasing compactness (low-edge) of patch shape and landscape composition did have the expected effect. Therefore, it is recommended that the composition of the surrounding landscape should be considered simultaneously with planned improvements in local habitat quality.
[Soil and forest structure in the Colombian Amazon].
Calle-Rendón, Bayron R; Moreno, Flavio; Cárdenas López, Dairon
2011-09-01
Forests structural differences could result of environmental variations at different scales. Because soils are an important component of plant's environment, it is possible that edaphic and structural variables are associated and that, in consequence, spatial autocorrelation occurs. This paper aims to answer two questions: (1) are structural and edaphic variables associated at local scale in a terra firme forest of Colombian Amazonia? and (2) are these variables regionalized at the scale of work? To answer these questions we analyzed the data of a 6ha plot established in a terra firme forest of the Amacayacu National Park. Structural variables included basal area and density of large trees (diameter > or = 10cm) (Gdos and Ndos), basal area and density of understory individuals (diameter < 10cm) (Gsot and Nsot) and number of species of large trees (sp). Edaphic variables included were pH, organic matter, P, Mg, Ca, K, Al, sand, silt and clay. Structural and edaphic variables were reduced through a principal component analysis (PCA); then, the association between edaphic and structural components from PCA was evaluated by multiple regressions. The existence of regionalization of these variables was studied through isotropic variograms, and autocorrelated variables were spatially mapped. PCA found two significant components for structure, corresponding to the structure of large trees (G, Gdos, Ndos and sp) and of small trees (N, Nsot and Gsot), which explained 43.9% and 36.2% of total variance, respectively. Four components were identified for edaphic variables, which globally explained 81.9% of total variance and basically represent drainage and soil fertility. Regression analyses were significant (p < 0.05) and showed that the structure of both large and small trees is associated with greater sand contents and low soil fertility, though they explained a low proportion of total variability (R2 was 4.9% and 16.5% for the structure of large trees and small tress, respectively). Variables with spatial autocorrelation were the structure of small trees, Al, silt, and sand. Among them, Nsot and sand content showed similar patterns of spatial distribution inside the plot.
NASA Astrophysics Data System (ADS)
Rao, M.; George, L. A.
2012-12-01
Nitrogen dioxide (NO2), an atmospheric pollutant generated primarily by anthropogenic combustion processes, is typically found at higher concentrations in urban areas compared to non-urbanized environments. Elevated NO2 levels have multiple ecosystem effects at different spatial scales. At the local scale, elevated levels affect human health directly and through the formation of secondary pollutants such as ozone and aerosols; at the regional scale secondary pollutants such as nitric acid and organic nitrates have deleterious effects on non-urbanized areas; and, at the global scale, nitrogen oxide emissions significantly alter the natural biogeochemical nitrogen cycle. As cities globally become larger and larger sources of nitrogen oxide emissions, it is important to assess possible mitigation strategies to reduce the impact of emissions locally, regionally and globally. In this study, we build a national land-use regression (LUR) model to compare the impacts of deciduous and evergreen trees on urban NO2 levels in the United States. We use the EPA monitoring network values of NO2 levels for 2006, the 2006 NLCD tree canopy data for deciduous and evergreen canopies, and the US Census Bureau's TIGER shapefiles for roads, railroads, impervious area & population density as proxies for NO2 sources on-road traffic, railroad traffic, off-road and area sources respectively. Our preliminary LUR model corroborates previous LUR studies showing that the presence of trees is associated with reduced urban NO2 levels. Additionally, our model indicates that deciduous and evergreen trees reduce NO2 to different extents, and that the amount of NO2 reduced varies seasonally. The model indicates that every square kilometer of deciduous canopy within a 2km buffer is associated with a reduction in ambient NO2 levels of 0.64 ppb in summer and 0.46ppb in winter. Similarly, every square kilometer of evergreen tree canopy within a 2 km buffer is associated with a reduction in ambient NO2 by 0.53 ppb in summer and 0.84 ppb in winter. Thus, the model indicates that deciduous trees are associated with a 30% smaller reduction in NO2 in winter as compared to summer, while evergreens are associated with a 60% increase in the reduction of NO2 in winter, for every square kilometer of deciduous or evergreen canopy within a 2 km buffer. Leaf- and local canopy-level studies have shown that trees are a sink for urban NO2 through deposition as well as stomatal and cuticular uptake. The winter time versus summer time effects suggest that leaf-level deposition may not be the only uptake mechanism and points to the need for a more holistic analysis of tree and canopy-level deposition for urban air pollution models. Since deposition velocities for NO2 vary by tree species, the reduction may also vary by species. These findings have implications for designing cities to reduce the impact of air pollution.
NASA Astrophysics Data System (ADS)
Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.
2012-06-01
In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).
NASA Technical Reports Server (NTRS)
Mcgwire, K.; Friedl, M.; Estes, J. E.
1993-01-01
This article describes research related to sampling techniques for establishing linear relations between land surface parameters and remotely-sensed data. Predictive relations are estimated between percentage tree cover in a savanna environment and a normalized difference vegetation index (NDVI) derived from the Thematic Mapper sensor. Spatial autocorrelation in original measurements and regression residuals is examined using semi-variogram analysis at several spatial resolutions. Sampling schemes are then tested to examine the effects of autocorrelation on predictive linear models in cases of small sample sizes. Regression models between image and ground data are affected by the spatial resolution of analysis. Reducing the influence of spatial autocorrelation by enforcing minimum distances between samples may also improve empirical models which relate ground parameters to satellite data.
Predicting Diameter at Breast Height from Stump Diameters for Northeastern Tree Species
Eric H. Wharton; Eric H. Wharton
1984-01-01
Presents equations to predict diameter at breast height from stump diameter measurements for 17 northeastern tree species. Simple linear regression was used to develop the equations. Application of the equations is discussed.
Russell, W.H.; Carnell, K.; McBride, J.R.
2001-01-01
Feeding damage to trees by black bears (Ursus americanus Pallas) was recorded in proximity to timber harvest edges in harvested and old-growth stands of coast redwood (Sequoia sempervirens [D. Don] Endl.) in northern California, USA. Bears exhibited distinct preference in their feeding patterns related to stand structure and composition and to distance from the timber-harvest edge. Most damage was recorded within regenerating stands. Regression analysis indicated that density of damaged trees was negatively correlated with distance from timber harvest edges within old-growth stands. A significant negative correlation was also found between the density of trees damaged by bears and habitat diversity (H') as measured by the Shannon diversity index. In addition, bears exhibited preference for pole-size trees (dbh = 10-50 cm) over all other size classes, and coast redwood over other species. In general, damage by bears appeared to act as a natural thinning agent in even-aged stands. No damage was recorded in old-growth stands except in close proximity to the timber-harvest edge where subcanopy recruitment was high.
Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree
NASA Astrophysics Data System (ADS)
Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao
2016-08-01
Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
Post-wildfire summer greening depends on winter snowpack
NASA Astrophysics Data System (ADS)
Wilson, A.; Nolin, A. W.
2017-12-01
Forested, mountain landscapes in the Pacific Northwest (PNW) are changing at an unprecedented rate, largely due to shifts in the regional climate regime. Documented climatic trends include increasing wildfire frequency and intensity and an increasingly ephemeral snowpack, especially at moderate elevations. One relationship that has yet to be studied thoroughly is the dependence of post-wildfire forest recovery on winter snowpack. This study will correlate winter snowpack with summer greenness in the context of 15 recent severe wildfires across the PNW. Winter snow water equivalent will be estimated using a new Snow Cover Frequency (SCF) metric derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) daily snow cover product. Summer forest greenness will be assessed using the Enhanced Vegetation Index (EVI), also derived from daily MODIS reflectance data. Regression tree analysis will be employed to characterize the relative importance of snowpack, elevation, slope, aspect, soil texture, and summer precipitation to summer greenness. Using findings from the regression tree analysis, the most critical physiographic factors will frame a multivariate time series spanning the 5 years pre-wildfire and 5 years post-wildfire in an effort to illustrate how the snowpack-revegetation relationship persists over time. As northwestern mountainous forests become more vulnerable to wildfire activity, it will be vital to continue deepening our understanding of how snowpack matters to post-wildfire forest recovery.
NASA Technical Reports Server (NTRS)
Cohen, Warren B.; Spies, Thomas A.
1992-01-01
Relationships between spectral and texture variables derived from SPOT HRV 10 m panchromatic and Landsat TM 30 m multispectral data and 16 forest stand structural attributes is evaluated to determine the utility of satellite data for analysis of hemlock forests west of the Cascade Mountains crest in Oregon and Washington, USA. Texture of the HRV data was found to be strongly related to many of the stand attributes evaluated, whereas TM texture was weakly related to all attributes. Data analysis based on regression models indicates that both TM and HRV imagery should yield equally accurate estimates of forest age class and stand structure. It is concluded that the satellite data are a valuable source for estimation of the standard deviation of tree sizes, mean size and density of trees in the upper canopy layers, a structural complexity index, and stand age.
Kim, So-Ra; Kwak, Doo-Ahn; Lee, Woo-Kyun; oLee, Woo-Kyun; Son, Yowhan; Bae, Sang-Won; Kim, Choonsig; Yoo, Seongjin
2010-07-01
The objective of this study was to estimate the carbon storage capacity of Pinus densiflora stands using remotely sensed data by combining digital aerial photography with light detection and ranging (LiDAR) data. A digital canopy model (DCM), generated from the LiDAR data, was combined with aerial photography for segmenting crowns of individual trees. To eliminate errors in over and under-segmentation, the combined image was smoothed using a Gaussian filtering method. The processed image was then segmented into individual trees using a marker-controlled watershed segmentation method. After measuring the crown area from the segmented individual trees, the individual tree diameter at breast height (DBH) was estimated using a regression function developed from the relationship observed between the field-measured DBH and crown area. The above ground biomass of individual trees could be calculated by an image-derived DBH using a regression function developed by the Korea Forest Research Institute. The carbon storage, based on individual trees, was estimated by simple multiplication using the carbon conversion index (0.5), as suggested in guidelines from the Intergovernmental Panel on Climate Change. The mean carbon storage per individual tree was estimated and then compared with the field-measured value. This study suggested that the biomass and carbon storage in a large forest area can be effectively estimated using aerial photographs and LiDAR data.
McKenzie, D.; Hessl, Amy E.; Peterson, D.L.
2001-01-01
We explored spatial patterns of low-frequency variability in radial tree growth among western North American conifer species and identified predictors of the variability in these patterns. Using 185 sites from the International Tree-Ring Data Bank, each of which contained 10a??60 raw ring-width series, we rebuilt two chronologies for each site, using two conservative methods designed to retain any low-frequency variability associated with recent environmental change. We used factor analysis to identify regional low-frequency patterns in site chronologies and estimated the slope of the growth trend since 1850 at each site from a combination of linear regression and time-series techniques. This slope was the response variable in a regression-tree model to predict the effects of environmental gradients and species-level differences on growth trends. Growth patterns at 27 sites from the American Southwest were consistent with quasi-periodic patterns of drought. Either 12 or 32 of the 185 sites demonstrated patterns of increasing growth between 1850 and 1980 A.D., depending on the standardization technique used. Pronounced growth increases were associated with high-elevation sites (above 3000 m) and high-latitude sites in maritime climates. Future research focused on these high-elevation and high-latitude sites should address the precise mechanisms responsible for increased 20th century growth.
Custer, Christine M.; Custer, Thomas W.; Dummer, Paul; Etterson, Matthew A.; Thogmartin, Wayne E.; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C.
2013-01-01
The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150–200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed.
[Aboveground biomass of three conifers in Qianyanzhou plantation].
Li, Xuanran; Liu, Qijing; Chen, Yongrui; Hu, Lile; Yang, Fengting
2006-08-01
In this paper, the regressive models of the aboveground biomass of Pinus elliottii, P. massoniana and Cunninghamia lanceolata in Qianyanzhou of subtropical China were established, and the regression analysis on the dry weight of leaf biomass and total biomass against branch diameter (d), branch length (L), d3 and d2L was conducted with linear, power and exponent functions. Power equation with single parameter (d) was proved to be better than the rests for P. massoniana and C. lanceolata, and linear equation with parameter (d3) was better for P. elliottii. The canopy biomass was derived by the regression equations for all branches. These equations were also used to fit the relationships of total tree biomass, branch biomass and foliage biomass with tree diameter at breast height (D), tree height (H), D3 and D2H, respectively. D2H was found to be the best parameter for estimating total biomass. For foliage-and branch biomass, both parameters and equation forms showed some differences among species. Correlations were highly significant (P <0.001) for foliage-, branch-and total biomass, with the highest for total biomass. By these equations, the aboveground biomass and its allocation were estimated, with the aboveground biomass of P. massoniana, P. elliottii, and C. lanceolata forests being 83.6, 72. 1 and 59 t x hm(-2), respectively, and more stem biomass than foliage-and branch biomass. According to the previous studies, the underground biomass of these three forests was estimated to be 10.44, 9.42 and 11.48 t x hm(-2), and the amount of fixed carbon was 47.94, 45.14 and 37.52 t x hm(-2), respectively.
Andreas, Sylke; Harries-Hedder, Karin; Schwenk, Wolfgang; Hausberg, Maria; Koch, Uwe; Schulz, Holger
2010-07-01
The Health of the Nation Outcome Scales (HoNOS) is an internationally established clinician-rated instrument. The aim of the study was to assess the psychometric properties in inpatients with substance-related disorders. The HoNOS was applied in a multicenter, consecutive sample of 417 inpatients. Interrater reliability coefficients, confirmatory factor analysis, and regression tree analyses were calculated to assess the reliability and validity of the HoNOS. The factor validity of the HoNOS and its total score could not be confirmed. After training, all items of the HoNOS revealed sufficient values of interrater reliabilities. As the results of the regression tree analyses showed, the single items of the HoNOS were one of the most important predictor of service utilization. The HoNOS can be recommended for obtaining detailed ratings of the problems of inpatients with substance-related disorders as a clinical application in routine mental health care at present. Further studies should include comparisons of HoNOS and Addiction Severity Index. Copyright 2010 Elsevier Inc. All rights reserved.
Using CART to Identify Thresholds and Hierarchies in the Determinants of Funding Decisions.
Schilling, Chris; Mortimer, Duncan; Dalziel, Kim
2017-02-01
There is much interest in understanding decision-making processes that determine funding outcomes for health interventions. We use classification and regression trees (CART) to identify cost-effectiveness thresholds and hierarchies in the determinants of funding decisions. The hierarchical structure of CART is suited to analyzing complex conditional and nonlinear relationships. Our analysis uncovered hierarchies where interventions were grouped according to their type and objective. Cost-effectiveness thresholds varied markedly depending on which group the intervention belonged to: lifestyle-type interventions with a prevention objective had an incremental cost-effectiveness threshold of $2356, suggesting that such interventions need to be close to cost saving or dominant to be funded. For lifestyle-type interventions with a treatment objective, the threshold was much higher at $37,024. Lower down the tree, intervention attributes such as the level of patient contribution and the eligibility for government reimbursement influenced the likelihood of funding within groups of similar interventions. Comparison between our CART models and previously published results demonstrated concurrence with standard regression techniques while providing additional insights regarding the role of the funding environment and the structure of decision-maker preferences.
Zhao, Cai-Yun; Xu, Jing; Liu, Xiao-Yan
2017-01-01
Abstract Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. PMID:28973576
Acid rain, air pollution, and tree growth in southeastern New York
Puckett, L.J.
1982-01-01
Whether dendroecological analyses could be used to detect changes in the relationship of tree growth to climate that might have resulted from chronic exposure to components of the acid rain-air pollution complex was determined. Tree-ring indices of white pine (Pinus strobus L.), eastern hemlock (Tsuga canadensis (L.) Cart.), pitch pine (Pinus rigida Mill.), and chestnut oak (Quercus prinus L.) were regressed against orthogonally transformed values of temperature and precipitation in order to derive a response-function relationship. Results of the regression analyses for three time periods, 1901–1920, 1926–1945, and 1954–1973 suggest that the relationship of tree growth to climate has been altered. Statistical tests of the temperature and precipitation data suggest that this change was nonclimatic. Temporally, the shift in growth response appears to correspond with the suspected increase in acid rain and air pollution in the Shawangunk Mountain area of southeastern New York in the early 1950's. This change could be the result of physiological stress induced by components of the acid rain-air pollution complex, causing climatic conditions to be more limiting to tree growth.
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam
2012-01-01
Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.
Associations between regional moisture gradient, tree species dominance, and downed wood abundance
NASA Astrophysics Data System (ADS)
Johnson, A. C.; Mills, J.
2007-12-01
Downed wood functions as a source of nurse logs, physical structure in streams, food, and carbon. Because downed wood is important in upland and aquatic habitats, an understanding of wood recruitment along a continuum from wet to dry landscapes is critical for both preservation of biodiversity and restoration of natural ecosystem structure and function. We assessed downed wood in public and private forests of Washington and Oregon by using a subset of the Forest Inventory and Analysis (FIA) database including 15,842 sampled conditions. Multivariate regression trees, ANOVA, and t-tests were used to discern environmental conditions most closely associated with abundance of woody debris. Of the 16 parameters included in the analysis, rainfall, forest ownership, number of damaged standing trees, and forest elevation were most indicative of woody debris abundance. The Hemlock/spruce Group, including hemlock, spruce, cedar, and white pine, most associated with wetter soils, had significantly more downed wood than 12 other forest groups. The Ponderosa Pine Group, indicative of drier sites with higher fire frequencies, included ponderosa pine, sugar pine, and incense cedar, and had significantly less downed wood volume. Overall, the amount of woody debris in either the Spruce/hemlock Group or the Ponderosa Pine Group did not change significantly as tree age increased from 5 to 350 years. Plots within the Hemlock/spruce with greater standing tree volume also had significantly greater downed wood volume. In contrast, greater downed wood volume was not associated with greater standing tree volume in the Ponderosa Pine Group. Knowledge of linkages among environmental variables and stand characteristics are useful in development of regional forest models aimed at understanding the effects of climate change and disturbance on forest succession.
Araucaria growth response to solar and climate variability in South Brazil
NASA Astrophysics Data System (ADS)
Prestes, Alan; Klausner, Virginia; Rojahn da Silva, Iuri; Ojeda-González, Arian; Lorensi, Caren
2018-05-01
In this work, the Sun-Earth-climate relationship is studied using tree growth rings of Araucaria angustifolia (Bertol.) O. Kuntze collected in the city of Passo Fundo, located in the state of Rio Grande do Sul (RS), Brazil. These samples were previously studied by Rigozo et al. (2008); however, their main interest was to search for the solar periodicities in the tree-ring width mean time series without interpreting the rest of the periodicities found. The question arises as to what are the drivers related to those periodicities. For this reason, the classical method of spectral analysis by iterative regression and wavelet methods are applied to find periodicities and trends present in each tree-ring growth, in Southern Oscillation Index (SOI), and in annual mean temperature anomaly between the 24 and 44° S. In order to address the aforementioned question, this paper discusses the correlation between the growth rate of the tree rings with temperature and SOI. In each tree-ring growth series, periods between 2 and 7 years were found, possibly related to the El Niño/La Niña phenomena, and a ˜ 23-year period was found, which may be related to temperature variation. These novel results might represent the tree-ring growth response to local climate conditions during its lifetime, and to nonlinear coupling between the Sun and the local climate variability responsible to the regional climate variations.
Integration of vessel traits, wood density, and height in angiosperm shrubs and trees.
Martínez-Cabrera, Hugo I; Schenk, H Jochen; Cevallos-Ferriz, Sergio R S; Jones, Cynthia S
2011-05-01
Trees and shrubs tend to occupy different niches within and across ecosystems; therefore, traits related to their resource use and life history are expected to differ. Here we analyzed how growth form is related to variation in integration among vessel traits, wood density, and height. We also considered the ecological and evolutionary consequences of such differences. In a sample of 200 woody plant species (65 shrubs and 135 trees) from Argentina, Mexico, and the United States, standardized major axis (SMA) regression, correlation analyses, and ANOVA were used to determine whether relationships among traits differed between growth forms. The influence of phylogenetic relationships was examined with a phylogenetic ANOVA and phylogenetically independent contrasts (PICs). A principal component analysis was conducted to determine whether trees and shrubs occupy different portions of multivariate trait space. Wood density did not differ between shrubs and trees, but there were significant differences in vessel diameter, vessel density, theoretical conductivity, and as expected, height. In addition, relationships between vessel traits and wood density differed between growth forms. Trees showed coordination among vessel traits, wood density, and height, but in shrubs, wood density and vessel traits were independent. These results hold when phylogenetic relationships were considered. In the multivariate analyses, these differences translated as significantly different positions in multivariate trait space occupied by shrubs and trees. Differences in trait integration between growth forms suggest that evolution of growth form in some lineages might be associated with the degree of trait interrelation.
Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards
2006-01-01
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...
E. Freeman; G. Moisen; J. Coulston; B. Wilson
2014-01-01
Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
STX--Fortran-4 program for estimates of tree populations from 3P sample-tree-measurements
L. R. Grosenbaugh
1967-01-01
Describes how to use an improved and greatly expanded version of an earlier computer program (1964) that converts dendrometer measurements of 3P-sample trees to population values in terms of whatever units user desires. Many new options are available, including that of obtaining a product-yield and appraisal report based on regression coefficients supplied by user....
Portable Language-Independent Adaptive Translation from OCR. Phase 1
2009-04-01
including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
Evaluation of Oil-Palm Fungal Disease Infestation with Canopy Hyperspectral Reflectance Data
Lelong, Camille C. D.; Roger, Jean-Michel; Brégand, Simon; Dubertret, Fabrice; Lanore, Mathieu; Sitorus, Nurul A.; Raharjo, Doni A.; Caliman, Jean-Pierre
2010-01-01
Fungal disease detection in perennial crops is a major issue in estate management and production. However, nowadays such diagnostics are long and difficult when only made from visual symptom observation, and very expensive and damaging when based on root or stem tissue chemical analysis. As an alternative, we propose in this study to evaluate the potential of hyperspectral reflectance data to help detecting the disease efficiently without destruction of tissues. This study focuses on the calibration of a statistical model of discrimination between several stages of Ganoderma attack on oil palm trees, based on field hyperspectral measurements at tree scale. Field protocol and measurements are first described. Then, combinations of pre-processing, partial least square regression and linear discriminant analysis are tested on about hundred samples to prove the efficiency of canopy reflectance in providing information about the plant sanitary status. A robust algorithm is thus derived, allowing classifying oil-palm in a 4-level typology, based on disease severity from healthy to critically sick stages, with a global performance close to 94%. Moreover, this model discriminates sick from healthy trees with a confidence level of almost 98%. Applications and further improvements of this experiment are finally discussed. PMID:22315565
Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey
2005-01-01
The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Wardâs) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...
Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey
2005-01-01
The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Wardâs) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Bayesian models for comparative analysis integrating phylogenetic uncertainty.
de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P
2012-06-28
Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.
Bayesian models for comparative analysis integrating phylogenetic uncertainty
2012-01-01
Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language. PMID:22741602
Simulation of land use change in the three gorges reservoir area based on CART-CA
NASA Astrophysics Data System (ADS)
Yuan, Min
2018-05-01
This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Tree Nuts Are Inversely Associated with Metabolic Syndrome and Obesity: The Adventist Health Study-2
Jaceldo-Siegl, Karen; Haddad, Ella; Oda, Keiji; Fraser, Gary E.; Sabaté, Joan
2014-01-01
Objective To examine the relationships of nut consumption, metabolic syndrome (MetS), and obesity in the Adventist Health Study-2, a relatively healthy population with a wide range of nut intake. Research Design and Methods Cross-sectional analysis was conducted on clinical, dietary, anthropometric, and demographic data of 803 adults. MetS was defined according to the American Heart Association and the National Heart, Lung, and Blood Institute diagnostic criteria. We assessed intake of total nuts, tree nuts and peanuts, and also classified subjects into low tree nut/low peanut (LT/LP), low tree/high peanut (LT/HP), high tree nut/high peanut (HT/HP), and high tree/low peanut (HT/LP) consumers. Odds ratios were estimated using multivariable logistic regression. Results 32% of subjects had MetS. Compared to LT/LP consumers, obesity was lower in LT/HP (OR = 0.89; 95% CI = 0.53, 1.48), HT/HP (OR = 0.63; 95% CI = 0.40, 0.99) and HT/LP (OR = 0.54; 95% CI = 0.34, 0.88) consumers, p for trend = 0.006. For MetS, odds ratios (95% CI) were 0.77 (0.47, 1.28), 0.65 (0.42, 1.00) and 0.68 (0.43, 1.07), respectively (p for trend = 0.056). Frequency of nut intake (once/week) had significant inverse associations with MetS (3% less for tree nuts and 2% less for total nuts) and obesity (7% less for tree nuts and 3% less for total nuts). Conclusions Tree nuts appear to have strong inverse association with obesity, and favorable though weaker association with MetS independent of demographic, lifestyle and dietary factors. PMID:24416351
Jaceldo-Siegl, Karen; Haddad, Ella; Oda, Keiji; Fraser, Gary E; Sabaté, Joan
2014-01-01
To examine the relationships of nut consumption, metabolic syndrome (MetS), and obesity in the Adventist Health Study-2, a relatively healthy population with a wide range of nut intake. Cross-sectional analysis was conducted on clinical, dietary, anthropometric, and demographic data of 803 adults. MetS was defined according to the American Heart Association and the National Heart, Lung, and Blood Institute diagnostic criteria. We assessed intake of total nuts, tree nuts and peanuts, and also classified subjects into low tree nut/low peanut (LT/LP), low tree/high peanut (LT/HP), high tree nut/high peanut (HT/HP), and high tree/low peanut (HT/LP) consumers. Odds ratios were estimated using multivariable logistic regression. 32% of subjects had MetS. Compared to LT/LP consumers, obesity was lower in LT/HP (OR = 0.89; 95% CI = 0.53, 1.48), HT/HP (OR = 0.63; 95% CI = 0.40, 0.99) and HT/LP (OR = 0.54; 95% CI = 0.34, 0.88) consumers, p for trend = 0.006. For MetS, odds ratios (95% CI) were 0.77 (0.47, 1.28), 0.65 (0.42, 1.00) and 0.68 (0.43, 1.07), respectively (p for trend = 0.056). Frequency of nut intake (once/week) had significant inverse associations with MetS (3% less for tree nuts and 2% less for total nuts) and obesity (7% less for tree nuts and 3% less for total nuts). Tree nuts appear to have strong inverse association with obesity, and favorable though weaker association with MetS independent of demographic, lifestyle and dietary factors.
Smitley, David; Davis, Terrance; Rebek, Eric
2008-10-01
Our objective was to characterize the rate at which ash (Fraxinus spp.) trees decline in areas adjacent to the leading edge of visible ash canopy thinning due to emerald ash borer, Agrilus planipennis Fairmaire (Coleoptera: Buprestidae). Trees in southeastern Michigan were surveyed from 2003 to 2006 for canopy thinning and dieback by comparing survey trees with a set of 11 standard photographs. Freeways stemming from Detroit in all directions were used as survey transects. Between 750 and 1,100 trees were surveyed each year. A rapid method of sampling populations of emerald ash borer was developed by counting emerald ash borer emergence holes with binoculars and then felling trees to validate binocular counts. Approximately 25% of the trees surveyed for canopy thinning in 2005 and 2006 also were sampled for emerald ash borer emergence holes using binoculars. Regression analysis indicates that 41-53% of the variation in ash canopy thinning can be explained by the number of emerald ash borer emergence holes per tree. Emerald ash borer emergence holes were found at every site where ash canopy thinning averaged > 40%. In 2003, ash canopy thinning averaged 40% at a distance of 19.3 km from the epicenter of the emerald ash borer infestation in Canton. By 2006, the point at which ash trees averaged 40% canopy thinning had increased to a distance of 51.2 km away from Canton. Therefore, the point at which ash trees averaged 40% canopy thinning, a state of decline clearly visible to the average person, moved outward at a rate of 10.6 km/yr during this period.
Association of tree nut and coconut sensitizations.
Polk, Brooke I; Dinakarpandian, Deendayal; Nanda, Maya; Barnes, Charles; Dinakar, Chitra
2016-10-01
Coconut (Cocos nucifera), despite being a drupe, was added to the US Food and Drug Administration list of tree nuts in 2006, causing potential confusion regarding the prevalence of coconut allergy among tree nut allergic patients. To determine whether sensitization to tree nuts is associated with increased odds of coconut sensitization. A single-center retrospective analysis of serum specific IgE levels to coconut, tree nuts (almond, Brazil nut, cashew, chestnut, hazelnut, macadamia, pecan, pistachio, and walnut), and controls (milk and peanut) was performed using deidentified data from January 2000 to August 2012. Spearman correlation (ρ) between coconut and each tree nut was determined, followed by hierarchical clustering. Sensitization was defined as a nut specific IgE level of 0.35 kU/L or higher. Unadjusted and adjusted associations between coconut and tree nut sensitization were tested by logistic regression. Of 298 coconut IgE values, 90 (30%) were considered positive results, with a mean (SD) of 1.70 (8.28) kU/L. Macadamia had the strongest correlation (ρ = 0.77), whereas most other tree nuts had significant (P < .05) but low correlation (ρ < 0.5) with coconut. The adjusted odds ratio between coconut and macadamia was 7.39 (95% confidence interval, 2.60-21.02; P < .001) and 5.32 (95% confidence interval, 2.18-12.95; P < .001) between coconut and almond, with other nuts not being statistically significant. Our findings suggest that although sensitization to most tree nuts appears to correlate with coconut, this is largely explained by sensitization to almond and macadamia. This finding has not previously been reported in the literature. Further study correlating these results with clinical symptoms is planned. Copyright © 2016 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
In situ detection of tree root distribution and biomass by multi-electrode resistivity imaging.
Amato, Mariana; Basso, Bruno; Celano, Giuseppe; Bitella, Giovanni; Morelli, Gianfranco; Rossi, Roberta
2008-10-01
Traditional methods for studying tree roots are destructive and labor intensive, but available nondestructive techniques are applicable only to small scale studies or are strongly limited by soil conditions and root size. Soil electrical resistivity measured by geoelectrical methods has the potential to detect belowground plant structures, but quantitative relationships of these measurements with root traits have not been assessed. We tested the ability of two-dimensional (2-D) DC resistivity tomography to detect the spatial variability of roots and to quantify their biomass in a tree stand. A high-resolution resistivity tomogram was generated along a 11.75 m transect under an Alnus glutinosa (L.) Gaertn. stand based on an alpha-Wenner configuration with 48 electrodes spaced 0.25 m apart. Data were processed by a 2-D finite-element inversion algorithm, and corrected for soil temperature. Data acquisition, inversion and imaging were completed in the field within 60 min. Root dry mass per unit soil volume (root mass density, RMD) was measured destructively on soil samples collected to a depth of 1.05 m. Soil sand, silt, clay and organic matter contents, electrical conductivity, water content and pH were measured on a subset of samples. The spatial pattern of soil resistivity closely matched the spatial distribution of RMD. Multiple linear regression showed that only RMD and soil water content were related to soil resistivity along the transect. Regression analysis of RMD against soil resistivity revealed a highly significant logistic relationship (n = 97), which was confirmed on a separate dataset (n = 67), showing that soil resistivity was quantitatively related to belowground tree root biomass. This relationship provides a basis for developing quick nondestructive methods for detecting root distribution and quantifying root biomass, as well as for optimizing sampling strategies for studying root-driven phenomena.
Dendroclimatic estimates of a drought index for northern Virginia
Puckett, Larry J.
1981-01-01
A 230-year record of the Palmer drought-severity index (PDSI) was estimated for northern Virginia from variations in widths of tree rings. Increment cores were extracted from eastern hemlock, Tsuga canadensis (L.) Carr., at three locations in northern Virginia. Measurements of annual growth increments were made and converted to standardized indices of growth. A response function was derived for hemlock to determine the growth-climate relationship. Growth was positively correlated with precipitation and negatively correlated with temperature during the May-July growing season. Combined standardized indices of growth were calibrated with the July PDSI. Growth accounted for 20-30 percent of the PDSI variance. Further regressions using factor scores of combined tree growth indices resulted in a small but significant improvement. Greatest improvement was made by using factor scores of growth indices of individual trees, thereby accounting for 64 percent of the July PDSI variance in the regression. Comparison of the results with a 241-year reconstruction from New York showed good agreement between low-frequency climatic trends. Analysis of the estimated Central Mountain climatic division of Virginia PDSI record indicated that, relative to the long-term record (1746-1975), dry years have occurred in disproportionally larger numbers during the last half of the 19th century and the mid-20th century. This trend appears reversed for the last half of the 18th century and the first half of the 19th century. Although these results are considered first-generation products, they are encouraging, suggesting that once additional tree-ring chronologies are constructed and techniques are refined, it will be possible to obtain more accurate estimates of prior climatic conditions in the mid-Atlantic region.
Increased spruce tree growth in Central Europe since 1960s.
Cienciala, Emil; Altman, Jan; Doležal, Jiří; Kopáček, Jiří; Štěpánek, Petr; Ståhl, Göran; Tumajer, Jan
2018-04-01
Tree growth response to recent environmental changes is of key interest for forest ecology. This study addressed the following questions with respect to Norway spruce (Picea abies, L. Karst.) in Central Europe: Has tree growth accelerated during the last five decades? What are the main environmental drivers of the observed tree radial stem growth and how much variability can be explained by them? Using a nationwide dendrochronological sampling of Norway spruce in the Czech Republic (1246 trees, 266 plots), novel regional tree-ring width chronologies for 40(±10)- and 60(±10)-year old trees were assembled, averaged across three elevation zones (break points at 500 and 700m). Correspondingly averaged drivers, including temperature, precipitation, nitrogen (N) deposition and ambient CO 2 concentration, were used in a general linear model (GLM) to analyze the contribution of these in explaining tree ring width variability for the period from 1961 to 2013. Spruce tree radial stem growth responded strongly to the changing environment in Central Europe during the period, with a mean tree ring width increase of 24 and 32% for the 40- and 60-year old trees, respectively. The indicative General Linear Model analysis identified CO 2 , precipitation during the vegetation season, spring air temperature (March-May) and N-deposition as the significant covariates of growth, with the latter including interactions with elevation zones. The regression models explained 57% and 55% of the variability in the two tree ring width chronologies, respectively. Growth response to N-deposition showed the highest variability along the elevation gradient with growth stimulation/limitation at sites below/above 700m. A strong sensitivity of stem growth to CO 2 was also indicated, suggesting that the effect of rising ambient CO 2 concentration (direct or indirect by increased water use efficiency) should be considered in analyses of long-term growth together with climatic factors and N-deposition. Copyright © 2017 Elsevier B.V. All rights reserved.
Imai, Kenji; Takai, Koji; Watanabe, Satoshi; Hanai, Tatsunori; Suetsugu, Atsushi; Shiraki, Makoto; Shimizu, Masahito
2017-09-22
Sarcopenia impairs survival in patients with hepatocellular carcinoma (HCC). This study aimed to clarify the factors that contribute to decreased skeletal muscle volume in patients with HCC. The third lumbar vertebra skeletal muscle index (L3 SMI) in 351 consecutive patients with HCC was calculated to identify sarcopenia. Sarcopenia was defined as an L3 SMI value ≤ 29.0 cm²/m² for women and ≤ 36.0 cm²/m² for men. The factors affecting L3 SMI were analyzed by multiple linear regression analysis and tree-based models. Of the 351 HCC patients, 33 were diagnosed as having sarcopenia and showed poor prognosis compared with non-sarcopenia patients ( p = 0.007). However, this significant difference disappeared after the adjustments for age, sex, Child-Pugh score, maximum tumor size, tumor number, and the degree of portal vein invasion by propensity score matching analysis. Multiple linear regression analysis showed that age ( p = 0.015) and sex ( p < 0.0001) were significantly correlated with a decrease in L3 SMI. Tree-based models revealed that sex (female) is the most significant factor that affects L3 SMI. In male patients, L3 SMI was decreased by aging, increased Child-Pugh score (≥56 years), and enlarged tumor size (<56 years). Maintaining liver functional reserve and early diagnosis and therapy for HCC are vital to prevent skeletal muscle depletion and improve the prognosis of patients with HCC.
Leaf Phenological Characters of Main Tree Species in Urban Forest of Shenyang
Xu, Sheng; Xu, Wenduo; Chen, Wei; He, Xingyuan; Huang, Yanqing; Wen, Hua
2014-01-01
Background Plant leaves, as the main photosynthetic organs and the high energy converters among primary producers in terrestrial ecosystems, have attracted significant research attention. Leaf lifespan is an adaptive characteristic formed by plants to obtain the maximum carbon in the long-term adaption process. It determines important functional and structural characteristics exhibited in the environmental adaptation of plants. However, the leaf lifespan and leaf characteristics of urban forests were not studied up to now. Methods By using statistic, linear regression methods and correlation analysis, leaf phenological characters of main tree species in urban forest of Shenyang were observed for five years to obtain the leafing phenology (including leafing start time, end time, and duration), defoliating phenology (including defoliation start time, end time, and duration), and the leaf lifespan of the main tree species. Moreover, the relationships between temperature and leafing phenology, defoliating phenology, and leaf lifespan were analyzed. Findings The timing of leafing differed greatly among species. The early leafing species would have relatively early end of leafing; the longer it took to the end of leafing would have a later time of completed leafing. The timing of defoliation among different species varied significantly, the early defoliation species would have relatively longer duration of defoliation. If the mean temperature rise for 1°C in spring, the time of leafing would experience 5 days earlier in spring. If the mean temperature decline for 1°C, the time of defoliation would experience 3 days delay in autumn. Interpretation There is significant correlation between leaf longevity and the time of leafing and defoliation. According to correlation analysis and regression analysis, there is significant correlation between temperature and leafing and defoliation phenology. Early leafing species would have a longer life span and consequently have advantage on carbon accumulation compared with later defoliation species. PMID:24963625
James W. Flewelling
2009-01-01
Remotely sensed data can be used to make digital maps showing individual tree crowns (ITC) for entire forests. Attributes of the ITCs may include area, shape, height, and color. The crown map is sampled in a way that provides an unbiased linkage between ITCs and identifiable trees measured on the ground. Methods of avoiding edge bias are given. In an example from a...
Austin Troy; J. Morgan Grove; Jarlath O' Neill-Dunne
2012-01-01
The extent to which urban tree cover influences crime is in debate in the literature. This research took advantage of geocoded crime point data and high resolution tree canopy data to address this question in Baltimore City and County, MD, an area that includes a significant urban-rural gradient. Using ordinary least squares and spatially adjusted regression and...
Extensions and applications of ensemble-of-trees methods in machine learning
NASA Astrophysics Data System (ADS)
Bleich, Justin
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Zhang, Ling Yu; Liu, Zhao Gang
2017-12-01
Based on the data collected from 108 permanent plots of the forest resources survey in Maoershan Experimental Forest Farm during 2004-2016, this study investigated the spatial distribution of recruitment trees in natural secondary forest by global Poisson regression and geographically weighted Poisson regression (GWPR) with four bandwidths of 2.5, 5, 10 and 15 km. The simulation effects of the 5 regressions and the factors influencing the recruitment trees in stands were analyzed, a description was given to the spatial autocorrelation of the regression residuals on global and local levels using Moran's I. The results showed that the spatial distribution of the number of natural secondary forest recruitment was significantly influenced by stands and topographic factors, especially average DBH. The GWPR model with small scale (2.5 km) had high accuracy of model fitting, a large range of model parameter estimates was generated, and the localized spatial distribution effect of the model parameters was obtained. The GWPR model at small scale (2.5 and 5 km) had produced a small range of model residuals, and the stability of the model was improved. The global spatial auto-correlation of the GWPR model residual at the small scale (2.5 km) was the lowe-st, and the local spatial auto-correlation was significantly reduced, in which an ideal spatial distribution pattern of small clusters with different observations was formed. The local model at small scale (2.5 km) was much better than the global model in the simulation effect on the spatial distribution of recruitment tree number.
Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng
2018-02-09
Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Lin, Lei; Wang, Qian; Sadek, Adel W
2016-06-01
The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Observed Methods for Felling Hardwood Trees with Chain Saws
Jerry L. Koger
1983-01-01
The angles and lengths of the cutting surfaces made by chain saw operators on hardwood tree stumps are described by means, standard deviations, ranges, and regression equations. Recommended felling guidelines are compared with observed felling methods used by experienced timber cutters in the southern Appalachian Mountains.
NASA Astrophysics Data System (ADS)
Hadley, Brian Christopher
This dissertation assessed remotely sensed data and geospatial modeling technique(s) to map the spatial distribution of total above-ground biomass present on the surface of the Savannah River National Laboratory's (SRNL) Mixed Waste Management Facility (MWMF) hazardous waste landfill. Ordinary least squares (OLS) regression, regression kriging, and tree-structured regression were employed to model the empirical relationship between in-situ measured Bahia (Paspalum notatum Flugge) and Centipede [Eremochloa ophiuroides (Munro) Hack.] grass biomass against an assortment of explanatory variables extracted from fine spatial resolution passive optical and LIDAR remotely sensed data. Explanatory variables included: (1) discrete channels of visible, near-infrared (NIR), and short-wave infrared (SWIR) reflectance, (2) spectral vegetation indices (SVI), (3) spectral mixture analysis (SMA) modeled fractions, (4) narrow-band derivative-based vegetation indices, and (5) LIDAR derived topographic variables (i.e. elevation, slope, and aspect). Results showed that a linear combination of the first- (1DZ_DGVI), second- (2DZ_DGVI), and third-derivative of green vegetation indices (3DZ_DGVI) calculated from hyperspectral data recorded over the 400--960 nm wavelengths of the electromagnetic spectrum explained the largest percentage of statistical variation (R2 = 0.5184) in the total above-ground biomass measurements. In general, the topographic variables did not correlate well with the MWMF biomass data, accounting for less than five percent of the statistical variation. It was concluded that tree-structured regression represented the optimum geospatial modeling technique due to a combination of model performance and efficiency/flexibility factors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupšys, P.
A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
Zhao, Cai-Yun; Li, Jun-Sheng; Xu, Jing; Liu, Xiao-Yan
2017-05-01
Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.
Predicting the risk of patients with biopsy Gleason score 6 to harbor a higher grade cancer.
Gofrit, Ofer N; Zorn, Kevin C; Taxy, Jerome B; Lin, Shang; Zagaja, Gregory P; Steinberg, Gary D; Shalhav, Arieh L
2007-11-01
Prostate cancer Gleason score 3 + 3 = 6 is currently the most common score assigned on prostatic biopsies. We analyzed the clinical variables that predict the likelihood of a patient with biopsy Gleason score 6 to harbor a higher grade tumor. The study population consisted of 448 patients with a mean age of 59.1 years who underwent radical prostatectomy between February 2003 to October 2006 for Gleason score 6 adenocarcinoma. The effect of preoperative variables on the probability of a Gleason score upgrade on final pathological evaluation was evaluated using logistic regression, and classification and regression tree analysis. Gleason score upgrade was found in 91 of 448 patients (20.3%). Logistic regression showed that only serum prostate specific antigen and the greatest percent of cancer in a core were significantly associated with a score upgrade (p = 0.0014 and 0.023, respectively). Classification and regression tree analysis showed that the risk of a Gleason score upgrade was 62% when serum prostate specific antigen was higher than 12 ng/ml and 18% when serum prostate specific antigen was 12 ng/ml or less. In patients with serum prostate specific antigen lower than 12 ng/ml the risk of a score upgrade could be dichotomized at a greatest percent of cancer in a core of 5%. The risk was 22.6% and 10.5% when the greatest percent of cancer in a core was higher than 5% and 5% or lower, respectively. The probability of patients with a prostate biopsy Gleason score of 6 to conceal a Gleason score of 7 or higher can be predicted using serum prostate specific antigen and the greatest percent of cancer in a core. With these parameters it is possible to predict upgrade rates as high as 62% and as low as 10.5%.
Arenja, Nisha; Riffel, Johannes H; Fritz, Thomas; André, Florian; Aus dem Siepen, Fabian; Mueller-Hennessen, Matthias; Giannitsis, Evangelos; Katus, Hugo A; Friedrich, Matthias G; Buss, Sebastian J
2017-06-01
Purpose To assess the utility of established functional markers versus two additional functional markers derived from standard cardiovascular magnetic resonance (MR) images for their incremental diagnostic and prognostic information in patients with nonischemic dilated cardiomyopathy (NIDCM). Materials and Methods Approval was obtained from the local ethics committee. MR images from 453 patients with NIDCM and 150 healthy control subjects were included between 2005 and 2013 and were analyzed retrospectively. Myocardial contraction fraction (MCF) was calculated by dividing left ventricular (LV) stroke volume by LV myocardial volume, and long-axis strain (LAS) was calculated from the distances between the epicardial border of the LV apex and the midpoint of a line connecting the origins of the mitral valve leaflets at end systole and end diastole. Receiver operating characteristic curve, Kaplan-Meier method, Cox regression, and classification and regression tree (CART) analyses were performed for diagnostic and prognostic performances. Results LAS (area under the receiver operating characteristic curve [AUC] = 0.93, P < .001) and MCF (AUC = 0.92, P < .001) can be used to discriminate patients with NIDCM from age- and sex-matched control subjects. A total of 97 patients reached the combined end point during a median follow-up of 4.8 years. In multivariate Cox regression analysis, only LV ejection fraction (EF) and LAS independently indicated the combined end point (hazard ratio = 2.8 and 1.9, respectively; P < .001 for both). In a risk stratification approach with classification and regression tree analysis, combined LV EF and LAS cutoff values were used to stratify patients into three risk groups (log-rank test, P < .001). Conclusion Cardiovascular MR-derived MCF and LAS serve as reliable diagnostic and prognostic markers in patients with NIDCM. LAS, as a marker for longitudinal contractile function, is an independent parameter for outcome and offers incremental information beyond LV EF and the presence of myocardial fibrosis. © RSNA, 2017 Online supplemental material is available for this article.
Gao, Lei; Xi, Qian Qian; Wu, Jun; Han, Yu; Dai, Wei; Su, Yuan Yuan; Zhang, Xin
2015-09-01
To investigate the association between autism and prenatal environmental risk factors. A case-control study was conducted among 193 children with autism from the special educational schools and 733 typical development controls matched by age and gender by using questionnaire in Tianjin from 2007 to 2012. Statistical analysis included quick unbiased efficient statistical tree (QUEST) and logistic regression in SPSS 20.0. There were four predictors by QUEST and the logistic regression analysis, maternal air conditioner use during pregnancy (OR=0.316, 95% CI: 0.215-0.463) was the single first-level node (χ²=50.994, P=0.000); newborn complications (OR=4.277, 95% CI: 2.314-7.908) and paternal consumption of freshwater fish (OR=0.383, 95% CI: 0.256-0.573) were second-layer predictors (χ²=45.248, P=0.000; χ²=24.212, P=0.000); and maternal depression (OR=4.822, 95% CI: 3.047-7.631) was the single third-level predictor (χ²=23.835, P=0.000). The prediction accuracy of the tree was 89.2%. The air conditioner use during pregnancy and paternal freshwater fish diet might be beneficial for the prevention of autism, while newborn complications and maternal depression might be the risk factors. Copyright © 2015 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
NASA Astrophysics Data System (ADS)
Zabret, Katarina; Rakovec, Jože; Šraj, Mojca
2018-03-01
Rainfall partitioning is an important part of the ecohydrological cycle, influenced by numerous variables. Rainfall partitioning for pine (Pinus nigra Arnold) and birch (Betula pendula Roth.) trees was measured from January 2014 to June 2017 in an urban area of Ljubljana, Slovenia. 180 events from more than three years of observations were analyzed, focusing on 13 meteorological variables, including the number of raindrops, their diameter, and velocity. Regression tree and boosted regression tree analyses were performed to evaluate the influence of the variables on rainfall interception loss, throughfall, and stemflow in different phenoseasons. The amount of rainfall was recognized as the most influential variable, followed by rainfall intensity and the number of raindrops. Higher rainfall amount, intensity, and the number of drops decreased percentage of rainfall interception loss. Rainfall amount and intensity were the most influential on interception loss by birch and pine trees during the leafed and leafless periods, respectively. Lower wind speed was found to increase throughfall, whereas wind direction had no significant influence. Consideration of drop size spectrum properties proved to be important, since the number of drops, drop diameter, and median volume diameter were often recognized as important influential variables.
Brito-Rocha, E; Schilling, A C; Dos Anjos, L; Piotto, D; Dalmolin, A C; Mielke, M S
2016-01-01
Individual leaf area (LA) is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length - L and width - W) of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx) with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx) had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.
Emilie Bigorgne,; Custer, Thomas W.; Dummer, Paul; Erickson, Richard A.; Karouna-Renier, Natalie K.; Schultz, Sandra; Custer, Christine M.; Thogmartin, Wayne E.; Cole W. Matson,
2015-01-01
The health of tree swallows, Tachycineta bicolor, on the Upper Mississippi River (UMR) was assessed in 2010 and 2011 using biomarkers at six sites downriver of Minneapolis/St. Paul, MN metropolitan area, a tributary into the UMR, and a nearby lake. Chromosomal damage was evaluated in nestling blood by measuring the coefficient of variation of DNA content (DNA CV) using flow cytometry. Cytochrome P450 1A activity in nestling liver was measured using the ethoxyresorufin-O-dealkylase (EROD) assay, and oxidative stress was estimated in nestling livers via determination of thiobarbituric acid reacting substances (TBARS), reduced glutathione (GSH), oxidized glutathione (GSSG), the ratio GSSG/GSH, total sulfhydryl, and protein bound sulfhydryl (PBSH). A multilevel regression model (DNA CV) and simple regressions (EROD and oxidative stress) were used to evaluate biomarker responses for each location. Chromosomal damage was significantly elevated at two sites on the UMR (Pigs Eye and Pool 2) relative to the Green Mountain Lake reference site, while the induction of EROD activity was only observed at Pigs Eye. No measures of oxidative stress differed among sites. Multivariate analysis confirmed an increased DNA CV at Pigs Eye and Pool 2, and elevated EROD activity at Pigs Eye. These results suggest that the health of tree swallows has been altered at the DNA level at Pigs Eye and Pool 2 sites, and at the physiological level at Pigs Eye site only.
Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh
NASA Astrophysics Data System (ADS)
Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.
2017-12-01
Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.
NASA Astrophysics Data System (ADS)
Rao, M.; Vuong, H.
2013-12-01
The overall objective of this study is to develop a method for estimating total aboveground biomass of redwood stands in Jackson Demonstration State Forest, Mendocino, California using airborne LiDAR data. LiDAR data owing to its vertical and horizontal accuracy are increasingly being used to characterize landscape features including ground surface elevation and canopy height. These LiDAR-derived metrics involving structural signatures at higher precision and accuracy can help better understand ecological processes at various spatial scales. Our study is focused on two major species of the forest: redwood (Sequoia semperirens [D.Don] Engl.) and Douglas-fir (Pseudotsuga mensiezii [Mirb.] Franco). Specifically, the objectives included linear regression models fitting tree diameter at breast height (dbh) to LiDAR derived height for each species. From 23 random points on the study area, field measurement (dbh and tree coordinate) were collected for more than 500 trees of Redwood and Douglas-fir over 0.2 ha- plots. The USFS-FUSION application software along with its LiDAR Data Viewer (LDV) were used to to extract Canopy Height Model (CHM) from which tree heights would be derived. Based on the LiDAR derived height and ground based dbh, a linear regression model was developed to predict dbh. The predicted dbh was used to estimate the biomass at the single tree level using Jenkin's formula (Jenkin et al 2003). The linear regression models were able to explain 65% of the variability associated with Redwood's dbh and 80% of that associated with Douglas-fir's dbh.
Dispersion patterns and sampling plans for Diaphorina citri (Hemiptera: Psyllidae) in citrus.
Sétamou, Mamoudou; Flores, Daniel; French, J Victor; Hall, David G
2008-08-01
The abundance and spatial dispersion of Diaphorina citri Kuwayama (Hemiptera: Psyllidae) were studied in 34 grapefruit (Citrus paradisi Macfad.) and six sweet orange [Citrus sinensis (L.) Osbeck] orchards from March to August 2006 when the pest is more abundant in southern Texas. Although flush shoot infestation levels did not vary with host plant species, densities of D. citri eggs, nymphs, and adults were significantly higher on sweet orange than on grapefruit. D. citri immatures also were found in significantly higher numbers in the southeastern quadrant of trees than other parts of the canopy. The spatial distribution of D. citri nymphs and adults was analyzed using Iowa's patchiness regression and Taylor's power law. Taylor's power law fitted the data better than Iowa's model. Based on both regression models, the field dispersion patterns of D. citri nymphs and adults were aggregated among flush shoots in individual trees as indicated by the regression slopes that were significantly >1. For the average density of each life stage obtained during our surveys, the minimum number of flush shoots per tree needed to estimate D. citri densities varied from eight for eggs to four flush shoots for adults. Projections indicated that a sampling plan consisting of 10 trees and eight flush shoots per tree would provide density estimates of the three developmental stages of D. citri acceptable enough for population studies and management decisions. A presence-absence sampling plan with a fixed precision level was developed and can be used to provide a quick estimation of D. citri populations in citrus orchards.
Vegetation placement for summer built surface temperature moderation in an urban microclimate.
Millward, Andrew A; Torchia, Melissa; Laursen, Andrew E; Rothman, Lorne D
2014-06-01
Urban vegetation can mitigate increases in summer air temperature by reducing the solar gain received by buildings. To quantify the temperature-moderating influence of city trees and vine-covered buildings, a total of 13 pairs of temperature loggers were installed on the surfaces of eight buildings in downtown Toronto, Canada, for 6 months during the summer of 2008. One logger in each pair was shaded by vegetation while the other measured built surface temperature in full sunlight. We investigated the temperature-moderating benefits of solitary mature trees, clusters of trees, and perennial vines using a linear-mixed model and a multiple regression analysis of degree hour difference. We then assessed the temperature-moderating effect of leaf area, plant size and proximity to building, and plant location relative to solar path. During a period of high solar intensity, we measured an average temperature differential of 11.7 °C, with as many as 10-12 h of sustained cooler built surface temperatures. Vegetation on the west-facing aspect of built structures provided the greatest temperature moderation, with maximum benefit (peak temperature difference) occurring late in the afternoon. Large mature trees growing within 5 m of buildings showed the greatest ability to moderate built surface temperature, with those growing in clusters delivering limited additional benefit compared with isolated trees. Perennial vines proved as effective as trees at moderating rise in built surface temperature to the south and west sides of buildings, providing an attractive alternative to shade trees where soil volume and space are limited.
Vegetation Placement for Summer Built Surface Temperature Moderation in an Urban Microclimate
NASA Astrophysics Data System (ADS)
Millward, Andrew A.; Torchia, Melissa; Laursen, Andrew E.; Rothman, Lorne D.
2014-06-01
Urban vegetation can mitigate increases in summer air temperature by reducing the solar gain received by buildings. To quantify the temperature-moderating influence of city trees and vine-covered buildings, a total of 13 pairs of temperature loggers were installed on the surfaces of eight buildings in downtown Toronto, Canada, for 6 months during the summer of 2008. One logger in each pair was shaded by vegetation while the other measured built surface temperature in full sunlight. We investigated the temperature-moderating benefits of solitary mature trees, clusters of trees, and perennial vines using a linear-mixed model and a multiple regression analysis of degree hour difference. We then assessed the temperature-moderating effect of leaf area, plant size and proximity to building, and plant location relative to solar path. During a period of high solar intensity, we measured an average temperature differential of 11.7 °C, with as many as 10-12 h of sustained cooler built surface temperatures. Vegetation on the west-facing aspect of built structures provided the greatest temperature moderation, with maximum benefit (peak temperature difference) occurring late in the afternoon. Large mature trees growing within 5 m of buildings showed the greatest ability to moderate built surface temperature, with those growing in clusters delivering limited additional benefit compared with isolated trees. Perennial vines proved as effective as trees at moderating rise in built surface temperature to the south and west sides of buildings, providing an attractive alternative to shade trees where soil volume and space are limited.
Long-Term Vegetation Trends Detected In Northern Canada Using Landsat Image Stacks
NASA Astrophysics Data System (ADS)
Fraser, R.; Olthof, I.; Carrière, M.; Deschamps, A.; Pouliot, D.
2011-12-01
Evidence of recent productivity increases in arctic vegetation comes from a variety of sources. At local scales, long-term plot measurements in North America are beginning to record increases in vascular plant cover and biomass. At landscape scales, expansion and densification of shrubs has been observed using repeat oblique photographs. Finally, continental-scale increases in vegetation "greenness" have been documented based on analysis of coarse resolution (≥ 1 km) NOAA-AVHRR satellite imagery. In this study we investigated intermediate, regional-level changes occurring in tundra vegetation since 1984 using the Landsat TM and ETM+ satellite image archive. Four study areas averaging 13,619 km2 were located over widely distributed national parks in northern Canada (Ivvavik, Sirmilik, Torngat Mountains, and Wapusk). Time-series image stacks of 16-41 growing-season Landsat scenes from overlapping WRS-2 frames were acquired spanning periods of 17-25 years. Each pixel's unique temporal database of clear-sky values was then analyzed for trends in four indices (NDVI, Tasseled Cap Brightness, Greenness and Wetness) using robust linear regression. The trends were further related to changes in the fractional cover of functional vegetation types using regression tree models trained with plot data and high resolution (≤ 10 m) satellite imagery. We found all four study areas to have a larger proportion of significant (p<0.05) positive greenness trends (range 6.1-25.5%) by comparison to negative trends (range 0.3-4.1%). For the three study areas where regression tree models could be derived, consistent trends of increasing shrub or vascular fractional cover and decreasing bare cover were predicted. The Landsat-based observations were associated with warming trends in each park over the analysis periods. Many of the major changes observed could be corroborated using published studies or field observations.
Yannakoulia, Mary; Lykou, Anastasia; Kastorini, Christina Maria; Saranti Papasaranti, Eirini; Petralias, Athanassios; Veloudaki, Afroditi; Linos, Athena
2016-02-01
To explore factors affecting children's and adolescents' diet quality, in the framework of a food aid and promotion of healthy nutrition programme implemented in areas of low socio-economic status of Greece, during the current financial crisis. From a total of 162 schools participating in the programme during 2012-2013, we gathered 15 897 questionnaires recording sociodemographic characteristics, lifestyle parameters and dietary habits of children and their families. As a measure of socio-economic status, the Family Affluence Scale (FAS) was used; whereas for the assessment of diet quality, the KIDMED score was computed. Associations between KIDMED and FAS, physical activity and socio-economic parameters were examined using regression and classification-regression tree analysis (CART). The higher the FAS score, the greater the percentage of children and adolescents who reported to consume, on a daily basis, fruits and vegetables, dairy products and breakfast (P<0·001). Results from CART showed that children and adolescents in the medium or high FAS groups had higher KIDMED score, compared with those in the low FAS group. For those in the low FAS group, KIDMED score is expected to increase by 12·4 % when they spend more than 0·25 h/week in sports activities. The respective threshold for the medium and high FAS groups is 1·75 h/week, while education of the mother and father affected KIDMED score significantly as well. Diet quality is strongly influenced by socio-economic parameters in children and adolescents living in economically disadvantaged areas of Greece, so that lower family affluence is associated with worse diet quality.
Stohlgren, T.J.; Bachand, R.R.; Onami, Y.; Binkley, Dan
1998-01-01
Do relationships between species and environmental gradients strengthen or weaken with tree life-stage (i.e., small seedlings, large seedlings, saplings, and mature trees)? Strengthened relationships may lead to distinct forest type boundaries, or weakening connections could lead to gradual ecotones and heterogeneous forest landscapes. We quantified the changes in forest dominance (basal area of tree species by life-stage) and environmental factors (elevation, slope, aspect, intercepted photosynthetically active radiation (PAR), summer soil moisture, and soil depth and texture) across 14 forest ecotones (n = 584, 10 m x 10 m plots) in Rocky Mountain National Park, Colorado, U.S.A. Local, ecotone-specific species-environment relationships, based on multiple regression techniques, generally strengthened from the small seedling stage (multiple R2 ranged from 0.00 to 0.26) to the tree stage (multiple R2 ranged from 0.20 to 0.61). At the landscape scale, combined canonical correspondence analysis (CCA) among species and for all tree life-stages suggested that the seedlings of most species became established in lower-elevation, drier sites than where mature trees of the same species dominated. However, conflicting evidence showed that species-environment relationships may weaken with tree life-stage. Seedlings were only found in a subset of plots (habitats) occupied by mature trees of the same species. At the landscape scale, CCA results showed that species-environment relationships weakened somewhat from the small seedling stage (86.4% of the variance explained by the first two axes) to the tree stage (76.6% of variance explained). The basal area of tree species co-occurring with Pinus contorta Doug. ex. Loud declined more gradually than P. contorta basal area declined across ecotones, resulting in less-distinct forest type boundaries. We conclude that broad, gradual ecotones and heterogeneous forest landscapes are created and maintained by: (1) sporadic establishment of seedlings in sub-optimal habitats; (2) survivorship of saplings and mature trees in a wider range of environmental conditions than seedlings presently endure; and (3) the longevity of trees and persistence of tree species in a broad range of soils, climates, and disturbance regimes.
A Riparian Approach to Dendrochronological Flow Reconstruction, Yellowstone River, Montana
NASA Astrophysics Data System (ADS)
Schook, D. M.; Rathburn, S. L.; Friedman, J. M.
2015-12-01
Tree ring-based flow reconstructions can reveal river discharge variability over durations far exceeding the gauged record, building perspective for both the measured record and future flows. We use plains cottonwood (Populus deltoides subsp. monilifera) tree rings collected from four rivers to reconstruct flow history of the Yellowstone River near its confluence with the Missouri River. Upland trees in dry regions are typically used in flow reconstruction because their annual growth is controlled by the same precipitation that drives downstream flow, but our study improves flow reconstruction by including floodplain trees that are directly affected by the river. Cores from over 1000 cottonwoods along the Yellowstone, Powder, Little Missouri, and Redwater Rivers were collected from within a 170 km radius to reconstruct flows using the Age Curve Standardization technique in a multiple regression analysis. The large sample from trees spanning many age classes allows us to use only the rings that were produced when each tree was less than 50 years old and growth was most strongly correlated to river discharge. Using trees from a range of rivers improves our ability to differentiate between growth resulting from local precipitation and river flow, and we show that cottonwood growth differs across these neighboring rivers having different watersheds. Using the program Seascorr, tree growth is found to better correlated to seasonal river discharge (R = 0.69) than to local precipitation (R = 0.45). Our flow reconstruction reveals that the most extreme multi-year or multi-decade drought periods of the last 250 years on either the Yellowstone (1817-1821) or Powder (1846-1865) Rivers are missed by the gauged discharge record. Across all sites, we document increased growth in the 20th century compared to the 19th, a finding unattainable with conventional methods but having important implications for flow management.
1985-12-01
consists of the node t and all descendants of t in T. (3) Definition 3. Pruning a branch Tt from a tree T con- sists of deleting from T all...The default is 1.0 so that actually, this keyword did not need to appear in the above file. (5) DELETE . This keyword does not appear in our example, but...when it is used associated with some variable names, it indicates that we want to delete these vari- ables from the regression. If this keyword is
NASA Astrophysics Data System (ADS)
Schwan, M. R.; Herrick, C.; Hobbie, E. A.; Chen, J.; Varner, R. K.; Palace, M. W.; Marek, E.; Kashi, N. N.; Smith, S. L.
2015-12-01
Rapid warming in arctic and sub-arctic environments shifts plant community structure which in turn can alter carbon cycling by releasing large stocks of carbon sequestered in arctic soils. Much work has been done in sub-arctic peatlands to understand how shifts in dominant vegetation cover can ultimately affect global carbon balances, but less focus has been given to upland environments where similar changes are occurring. Recent circumpolar expansion of deciduous shrubs and trees in sub-arctic upland environments may alter carbon cycling due to shrubs and trees sequestering less C in soils than the heath plants they typically replace. In this study we explored the relationship between nutrient and carbon cycling and above-ground vegetation on six transects which traverse an ecotone gradient from heath tundra (dominated by ericoid mycorrhizal plants) through deciduous shrubs to deciduous trees (dominated by ectomycorrhizal plants) in upland environments of sub-arctic Sweden near Vassijaure (~850 mm precipitation) and Abisko (~300 mm precipitation). We collected soil and foliage for analysis of natural abundances of stable carbon and nitrogen isotopes (δ13C and δ15N), which can be a sensitive indicator of C and N dynamics. We also took high-resolution remote aerial imagery over the transects to calculate percent cover of vegetation types using GIS software. We concurrently estimated percent cover in smaller plots on the ground of three dominant species, Empetrum nigrum, Betula nana, and Betula pubescens, to serve as ground-truthing for the aerial imagery. Analysis of vegetation cover data shows significant differences in vegetation types along the transects. Preliminary multiple regression analysis of isotopes shows that δ13C in organic soil at the Vassijaure site is mostly controlled by distance along the transect, an interaction term between transect distance and soil depth, and δ15N (adjusted r2 = 0.85, p < 0.0001). Values of δ13C were lower in soils in the shrubs and forest than in the heath. In regression analyses, δ15N was primarily controlled by depth, and secondarily by heath cover (adjusted r2 = 0.68, p < 0.0001). These results suggest that trees and shrubs are sequestering carbon, and interactions between plants and belowground soil communities may be driving nitrogen dynamics.
Xia, Jiangzhou; Liang, Shunlin; Chen, Jiquan; Yuan, Wenping; Liu, Shuguang; Li, Linghao; Cai, Wenwen; Zhang, Li; Fu, Yang; Zhao, Tianbao; Feng, Jinming; Ma, Zhuguo; Ma, Mingguo; Liu, Shaomin; Zhou, Guangsheng; Asanuma, Jun; Chen, Shiping; Du, Mingyuan; Davaa, Gombo; Kato, Tomomichi; Liu, Qiang; Liu, Suhong; Li, Shenggong; Shao, Changliang; Tang, Yanhong; Zhao, Xiang
2014-01-01
The regression tree method is used to upscale evapotranspiration (ET) measurements at eddy-covariance (EC) towers to the grassland ecosystems over the Dryland East Asia (DEA). The regression tree model was driven by satellite and meteorology datasets, and explained 82% and 76% of the variations of ET observations in the calibration and validation datasets, respectively. The annual ET estimates ranged from 222.6 to 269.1 mm yr−1 over the DEA region with an average of 245.8 mm yr−1 from 1982 through 2009. Ecosystem ET showed decreased trends over 61% of the DEA region during this period, especially in most regions of Mongolia and eastern Inner Mongolia due to decreased precipitation. The increased ET occurred primarily in the western and southern DEA region. Over the entire study area, water balance (the difference between precipitation and ecosystem ET) decreased substantially during the summer and growing season. Precipitation reduction was an important cause for the severe water deficits. The drying trend occurring in the grassland ecosystems of the DEA region can exert profound impacts on a variety of terrestrial ecosystem processes and functions. PMID:24845063
Xia, Jiangzhou; Liang, Shunlin; Chen, Jiquan; Yuan, Wenping; Liu, Shuguang; Li, Linghao; Cai, Wenwen; Zhang, Li; Fu, Yang; Zhao, Tianbao; Feng, Jinming; Ma, Zhuguo; Ma, Mingguo; Liu, Shaomin; Zhou, Guangsheng; Asanuma, Jun; Chen, Shiping; Du, Mingyuan; Davaa, Gombo; Kato, Tomomichi; Liu, Qiang; Liu, Suhong; Li, Shenggong; Shao, Changliang; Tang, Yanhong; Zhao, Xiang
2014-01-01
The regression tree method is used to upscale evapotranspiration (ET) measurements at eddy-covariance (EC) towers to the grassland ecosystems over the Dryland East Asia (DEA). The regression tree model was driven by satellite and meteorology datasets, and explained 82% and 76% of the variations of ET observations in the calibration and validation datasets, respectively. The annual ET estimates ranged from 222.6 to 269.1 mm yr(-1) over the DEA region with an average of 245.8 mm yr(-1) from 1982 through 2009. Ecosystem ET showed decreased trends over 61% of the DEA region during this period, especially in most regions of Mongolia and eastern Inner Mongolia due to decreased precipitation. The increased ET occurred primarily in the western and southern DEA region. Over the entire study area, water balance (the difference between precipitation and ecosystem ET) decreased substantially during the summer and growing season. Precipitation reduction was an important cause for the severe water deficits. The drying trend occurring in the grassland ecosystems of the DEA region can exert profound impacts on a variety of terrestrial ecosystem processes and functions.
NASA Astrophysics Data System (ADS)
Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine
2014-10-01
The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.
Qian, S.S.; Anderson, Chauncey W.
1999-01-01
We analyzed available concentration data of five commonly used herbicides and three pesticides collected from small streams in the Willamette River Basin in Oregon to identify factors that affect the variation of their concentrations in the area. The emphasis of this paper is the innovative use of classification and regression tree models for exploratory data analysis as well as analyzing data with a substantial amount of left-censored values. Among variables included in this analysis, land-use pattern in the watershed is the most important for all but one (simazine) of the eight pesticides studied, followed by geographic location, intensity of agriculture activities in the watershed (represented by nutrient concentrations in the stream), and the size of the watershed. The significant difference between urban sites and agriculture sites is the variability of stream concentrations. While all 16 nonurban watersheds have significantly higher variation than urban sites, the same is not necessarily true for the mean concentrations. Seasonal variation accounts for only a small fraction of the total variance in all eight pesticides.We analyzed available concentration data of five commonly used herbicides and three pesticides collected from small streams in the Willamette River Basin in Oregon to identify factors that affect the variation of their concentrations in the area. The emphasis of this paper is the innovative use of classification and regression tree models for exploratory data analysis as well as analyzing data with a substantial amount of left-censored values. Among variables included in this analysis, land-use pattern in the watershed is the most important for all but one (simazine) of the eight pesticides studied, followed by geographic location, intensity of agriculture activities in the watershed (represented by nutrient concentrations in the stream), and the size of the watershed. The significant difference between urban sites and agriculture sites is the variability of stream concentrations. While all 16 nonurban watersheds have significantly higher variation than urban sites, the same is not necessarily true for the mean concentrations. Seasonal variation accounts for only a small fraction of the total variance in all eight pesticides.
MINER: exploratory analysis of gene interaction networks by machine learning from expression data.
Kadupitige, Sidath Randeni; Leung, Kin Chun; Sellmeier, Julia; Sivieng, Jane; Catchpoole, Daniel R; Bain, Michael E; Gaëta, Bruno A
2009-12-03
The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. We have developed MINER (Microarray Interactive Network Exploration and Representation), an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.
Tree-ring reconstructions of hydroclimatic variability in the Upper Colorado River Basin
NASA Astrophysics Data System (ADS)
Hidalgo-Leon, Hugo
Three major sources of improvements in tree-ring analysis and reconstruction of hydroclimatic variables are presented for the Upper Colorado River Basin (UCRB) in the southwestern U.S.: (1) Cross validation statistics are used for identifying optimal reconstruction models based on different alternatives of PCA-based regression. Results showed that a physically-consistent parsimonious model with low mean square error can be obtained by using strict rules for principal component selection and cross validation statistics. The improved methods were used to produce a ˜500 year high-resolution reconstruction of the UCRB's streamflow and compared with results of a previous reconstruction based on traditional procedures. (2) Tree-species' type was found to be a factor for determining chronology selection from dendrohydroclimatic models. The relative sensitivity of six tree species (Pinus edulis, Pseudotsuga menziesii, Pinus ponderosa, Pinus flexilis, Pinus aristata, and Picea engelmanni) to hydroclimatic extreme variations was determined using contingency table scores of tree-ring growth (at different lags) against hydroclimatic observations. Pinus edulis and Pseudotsuga menziesii were found to be the species most sensitive to low water. Results showed that tree-rings are biased towards greater sensitivity to hot-dry conditions and less responsive to cool-moist conditions. Resulted also showed higher streamflow response scores compared to precipitation implying a good integration and persistence representation of the basin through normal hydrological processes. (3) Previous reconstructions on the basin used data extending only up to 1963. This is an important limitation since hydroclimatic records from 1963 to the present show significantly different variation than prior to 1963. The changes are caused by variations in the strength of forcing mechanisms from the Pacific Ocean. A comparative analysis of the influence of North Pacific variation and El Nino/Southern Oscillation (ENSO) showed that the responses of Tropical and North Pacific forcing in UCRB's hydroclimate are different for annual precipitation and total streamflow and that these relationships have changed at decadal time scales. Furthermore, most of the few tree-rings available up to 1985, present the same shifts as the hydroclimatic variables studied. To capture the full range of variability observed in instrumental data is necessary to collect new tree-ring samples.
A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.
Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A
2018-01-01
In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.
Balk, Benjamin; Elder, Kelly
2000-01-01
We model the spatial distribution of snow across a mountain basin using an approach that combines binary decision tree and geostatistical techniques. In April 1997 and 1998, intensive snow surveys were conducted in the 6.9‐km2 Loch Vale watershed (LVWS), Rocky Mountain National Park, Colorado. Binary decision trees were used to model the large‐scale variations in snow depth, while the small‐scale variations were modeled through kriging interpolation methods. Binary decision trees related depth to the physically based independent variables of net solar radiation, elevation, slope, and vegetation cover type. These decision tree models explained 54–65% of the observed variance in the depth measurements. The tree‐based modeled depths were then subtracted from the measured depths, and the resulting residuals were spatially distributed across LVWS through kriging techniques. The kriged estimates of the residuals were added to the tree‐based modeled depths to produce a combined depth model. The combined depth estimates explained 60–85% of the variance in the measured depths. Snow densities were mapped across LVWS using regression analysis. Snow‐covered area was determined from high‐resolution aerial photographs. Combining the modeled depths and densities with a snow cover map produced estimates of the spatial distribution of snow water equivalence (SWE). This modeling approach offers improvement over previous methods of estimating SWE distribution in mountain basins.
Can Sap Flow Help Us to Better Understand Transpiration Patterns in Landscapes?
NASA Astrophysics Data System (ADS)
Hassler, S. K.; Weiler, M.; Blume, T.
2017-12-01
Transpiration is a key process in the hydrological cycle and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions and for improving the parameterisation of hydrological and soil-vegetation-atmosphere transfer models. At the tree scale, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status, stand-specific characteristics such as basal area or stand density and site-specific characteristics such as geology, slope position or aspect control sap flow of individual trees. However, little is known about the relative importance or the dynamic interplay of these controls. We studied these influences with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites spread over a 290 km²-catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we applied linear models to the daily spatial pattern of sap velocity and determined the importance of the different predictors. By upscaling sap velocities to the tree level with the help of species-dependent empirical estimates for sapwood area we also examined patterns of sap flow as a more direct representation of transpiration. Results indicate that a combination of mainly tree- and site-specific factors controls sap velocity patterns in this landscape, namely tree species, tree diameter, geology and aspect. For sap flow, the site-specific predictors provided the largest contribution to the explained variance, however, in contrast to the sap velocity analysis, geology was more important than aspect. Spatial variability of atmospheric demand and soil moisture explained only a small fraction of the variance. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, were correlated to the temporal dynamics of potential evaporation. We conclude that spatial representation of transpiration in models could benefit from including patterns according to tree and site characteristics.
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
International consensus on preliminary definitions of improvement in adult and juvenile myositis.
Rider, Lisa G; Giannini, Edward H; Brunner, Hermine I; Ruperto, Nicola; James-Newton, Laura; Reed, Ann M; Lachenbruch, Peter A; Miller, Frederick W
2004-07-01
To use a core set of outcome measures to develop preliminary definitions of improvement for adult and juvenile myositis as composite end points for therapeutic trials. Twenty-nine experts in the assessment of myositis achieved consensus on 102 adult and 102 juvenile paper patient profiles as clinically improved or not improved. Two hundred twenty-seven candidate definitions of improvement were developed using the experts' consensus ratings as a gold standard and their judgment of clinically meaningful change in the core set of measures. Seventeen additional candidate definitions of improvement were developed from classification and regression tree analysis, a data-mining decision tree tool analysis. Six candidate definitions specifying percentage change or raw change in the core set of measures were developed using logistic regression analysis. Adult and pediatric working groups ranked the 13 top-performing candidate definitions for face validity, clinical sensibility, and ease of use, in which the sensitivity and specificity were >/=75% in adult, pediatric, and combined data sets. Nominal group technique was used to facilitate consensus formation. The definition of improvement (common to the adult and pediatric working groups) that ranked highest was 3 of any 6 of the core set measures improved by >/=20%, with no more than 2 worse by >/=25% (which could not include manual muscle testing to assess strength). Five and 4 additional preliminary definitions of improvement for adult and juvenile myositis, respectively, were also developed, with several definitions common to both groups. Participants also agreed to prospectively test 6 logistic regression definitions of improvement in clinical trials. Consensus preliminary definitions of improvement were developed for adult and juvenile myositis, and these incorporate clinically meaningful change in all myositis core set measures in a composite end point. These definitions require prospective validation, but they are now proposed for use as end points in all myositis trials.
Lidar-derived estimate and uncertainty of carbon sink in successional phases of woody encroachment
NASA Astrophysics Data System (ADS)
Sankey, Temuulen; Shrestha, Rupesh; Sankey, Joel B.; Hardegree, Stuart; Strand, Eva
2013-07-01
encroachment is a globally occurring phenomenon that contributes to the global carbon sink. The magnitude of this contribution needs to be estimated at regional and local scales to address uncertainties present in the global- and continental-scale estimates, and guide regional policy and management in balancing restoration activities, including removal of woody plants, with greenhouse gas mitigation goals. The objective of this study was to estimate carbon stored in various successional phases of woody encroachment. Using lidar measurements of individual trees, we present high-resolution estimates of aboveground carbon storage in juniper woodlands. Segmentation analysis of lidar point cloud data identified a total of 60,628 juniper tree crowns across four watersheds. Tree heights, canopy cover, and density derived from lidar were strongly correlated with field measurements of 2613 juniper stems measured in 85 plots (30 × 30 m). Aboveground total biomass of individual trees was estimated using a regression model with lidar-derived height and crown area as predictors (Adj. R2 = 0.76, p < 0.001, RMSE = 0.58 kg). The predicted mean aboveground woody carbon storage for the study area was 677 g/m2. Uncertainty in carbon storage estimates was examined with a Monte Carlo approach that addressed major error sources. Ranges predicted with uncertainty analysis in the mean, individual tree, aboveground woody C, and associated standard deviation were 0.35 - 143.6 kg and 0.5 - 1.25 kg, respectively. Later successional phases of woody encroachment had, on average, twice the aboveground carbon relative to earlier phases. Woody encroachment might be more successfully managed and balanced with carbon storage goals by identifying priority areas in earlier phases of encroachment where intensive treatments are most effective.
Lidar-derived estimate and uncertainty of carbon sink in successional phases of woody encroachment
Sankey, Temuulen; Shrestha, Rupesh; Sankey, Joel B.; Hardgree, Stuart; Strand, Eva
2013-01-01
Woody encroachment is a globally occurring phenomenon that contributes to the global carbon sink. The magnitude of this contribution needs to be estimated at regional and local scales to address uncertainties present in the global- and continental-scale estimates, and guide regional policy and management in balancing restoration activities, including removal of woody plants, with greenhouse gas mitigation goals. The objective of this study was to estimate carbon stored in various successional phases of woody encroachment. Using lidar measurements of individual trees, we present high-resolution estimates of aboveground carbon storage in juniper woodlands. Segmentation analysis of lidar point cloud data identified a total of 60,628 juniper tree crowns across four watersheds. Tree heights, canopy cover, and density derived from lidar were strongly correlated with field measurements of 2613 juniper stems measured in 85 plots (30 × 30 m). Aboveground total biomass of individual trees was estimated using a regression model with lidar-derived height and crown area as predictors (Adj. R2 = 0.76, p 2. Uncertainty in carbon storage estimates was examined with a Monte Carlo approach that addressed major error sources. Ranges predicted with uncertainty analysis in the mean, individual tree, aboveground woody C, and associated standard deviation were 0.35 – 143.6 kg and 0.5 – 1.25 kg, respectively. Later successional phases of woody encroachment had, on average, twice the aboveground carbon relative to earlier phases. Woody encroachment might be more successfully managed and balanced with carbon storage goals by identifying priority areas in earlier phases of encroachment where intensive treatments are most effective.
High-Elevation Sierra Nevada Conifers Reveal Increasing Reliance on Snow Water with Changing Climate
NASA Astrophysics Data System (ADS)
Lepley, K. S.; Meko, D. M.; Touchan, R.; Shamir, E.; Graham, R.
2017-12-01
Snowpack in the Sierra Nevada Mountains accounts for around one third of California's water supply. Melting snow can provide water into dry summer months characteristic of the region's Mediterranean climate. As climate changes, understanding patterns of snowpack, snowmelt, and biological response are critical in this region of agricultural, recreational, and ecological value. Tree rings can act as proxy records to inform scientists and resource managers of past climate variability where instrumental data is unavailable. Here we investigate relationships between tree rings of high-elevation, snow-adapted conifer trees (Tsuga mertensiana, Abies magnifica) and April 1st snow-water equivalent (SWE) in the northern Sierra Nevada Mountains. The 1st principal component of 29 highly correlated regional SWE time series was modeled using multiple linear regression of four tree-ring chronologies including two lagged chronologies. Split-period verification analysis of this model revealed poor predictive skill in the early half (1929 - 1966) of the calibration period (1929 - 2003). Further analysis revealed a significant (p < 0.01) change in correlation between the Tsuga mertensiana chronology and SWE during early (1929 - 1970) and late (1971 - 2013) periods. Running 31-year correlations between this chronology and SWE rose from r = 0.10 in 1950 to r = 0.77 in 1996. This strengthening relationship is coincident with a positive trend in temperature, a negative trend in SWE, and increased variability in precipitation through time. Snow water is becoming a more limiting resource to tree growth as average temperatures rise and the hydrologic regime shifts. These results highlight the need for resource managers and policy makers to consider that biological response to climate is not static.
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.
Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.
Beating the Odds: Trees to Success in Different Countries
ERIC Educational Resources Information Center
Finch, W. Holmes; Marchant, Gregory J.
2017-01-01
A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…
Using Classification Trees to Predict Alumni Giving for Higher Education
ERIC Educational Resources Information Center
Weerts, David J.; Ronca, Justin M.
2009-01-01
As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…
The microcomputer scientific software series 5: the BIOMASS user's guide.
George E. Host; Stephen C. Westin; William G. Cole; Kurt S. Pregitzer
1989-01-01
BIOMASS is an interactive microcomputer program that uses allometric regression equations to calculate aboveground biomass of common tree species of the Lake States. The equations are species-specific and most use both diameter and height as independent variables. The program accommodates fixed area and variable radius sample designs and produces both individual tree...
Northern Arkansas Spring Precipitation Reconstructed from Tree Rings, 1023-1992 A.D.
Malcolm K. Cleaveland
2001-01-01
Three baldcypress (Taxodium distichum (L.) Rich.) tree-ring chronologies in northeastern Arkansas and southeastern Missouri respond strongly to April-June (spring) rainfall in northern Arkansas. I used regression to reconstruct an average of spring rainfall in the three climatic divisions of northern Arkansas since 1023 A.D. The reconstruction was...
Decision tree analysis of factors influencing rainfall-related building damage
NASA Astrophysics Data System (ADS)
Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.
2014-04-01
Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a large part of the variance in claim frequency is left unexplained, which is likely to be caused by variations in data at subdistrict scale and missing explanatory variables.
Yang, Yang; Velayudhan, Ajoy; Thornhill, Nina F; Farid, Suzanne S
2017-09-01
The need for high-concentration formulations for subcutaneous delivery of therapeutic monoclonal antibodies (mAbs) can present manufacturability challenges for the final ultrafiltration/diafiltration (UF/DF) step. Viscosity levels and the propensity to aggregate are key considerations for high-concentration formulations. This work presents novel frameworks for deriving a set of manufacturability indices related to viscosity and thermostability to rank high-concentration mAb formulation conditions in terms of their ease of manufacture. This is illustrated by analyzing published high-throughput biophysical screening data that explores the influence of different formulation conditions (pH, ions, and excipients) on the solution viscosity and product thermostability. A decision tree classification method, CART (Classification and Regression Tree) is used to identify the critical formulation conditions that influence the viscosity and thermostability. In this work, three different multi-criteria data analysis frameworks were investigated to derive manufacturability indices from analysis of the stress maps and the process conditions experienced in the final UF/DF step. Polynomial regression techniques were used to transform the experimental data into a set of stress maps that show viscosity and thermostability as functions of the formulation conditions. A mathematical filtrate flux model was used to capture the time profiles of protein concentration and flux decay behavior during UF/DF. Multi-criteria decision-making analysis was used to identify the optimal formulation conditions that minimize the potential for both viscosity and aggregation issues during UF/DF. Biotechnol. Bioeng. 2017;114: 2043-2056. © 2017 The Authors. Biotechnology and Bioengineering Published by Wiley Perodicals, Inc. © 2017 The Authors. Biotechnology and Bioengineering Published by Wiley Perodicals, Inc.
Yu, Huibin; Song, Yonghui; Liu, Ruixia; Pan, Hongwei; Xiang, Liancheng; Qian, Feng
2014-10-01
The stabilization of latent tracers of dissolved organic matter (DOM) of wastewater was analyzed by three-dimensional excitation-emission matrix (EEM) fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis (CART) in wastewater treatment performance. DOM of water samples collected from primary sedimentation, anaerobic, anoxic, oxic and secondary sedimentation tanks in a large-scale wastewater treatment plant contained four fluorescence components: tryptophan-like (C1), tyrosine-like (C2), microbial humic-like (C3) and fulvic-like (C4) materials extracted by self-organizing map. These components showed good positive linear correlations with dissolved organic carbon of DOM. C1 and C2 were representative components in the wastewater, and they were removed to a higher extent than those of C3 and C4 in the treatment process. C2 was a latent parameter determined by CART to differentiate water samples of oxic and secondary sedimentation tanks from the successive treatment units, indirectly proving that most of tyrosine-like material was degraded by anaerobic microorganisms. C1 was an accurate parameter to comprehensively separate the samples of the five treatment units from each other, indirectly indicating that tryptophan-like material was decomposed by anaerobic and aerobic bacteria. EEM fluorescence spectroscopy in combination with self-organizing map and CART analysis can be a nondestructive effective method for characterizing structural component of DOM fractions and monitoring organic matter removal in wastewater treatment process. Copyright © 2014 Elsevier Ltd. All rights reserved.
Velayudhan, Ajoy; Thornhill, Nina F.
2017-01-01
ABSTRACT The need for high‐concentration formulations for subcutaneous delivery of therapeutic monoclonal antibodies (mAbs) can present manufacturability challenges for the final ultrafiltration/diafiltration (UF/DF) step. Viscosity levels and the propensity to aggregate are key considerations for high‐concentration formulations. This work presents novel frameworks for deriving a set of manufacturability indices related to viscosity and thermostability to rank high‐concentration mAb formulation conditions in terms of their ease of manufacture. This is illustrated by analyzing published high‐throughput biophysical screening data that explores the influence of different formulation conditions (pH, ions, and excipients) on the solution viscosity and product thermostability. A decision tree classification method, CART (Classification and Regression Tree) is used to identify the critical formulation conditions that influence the viscosity and thermostability. In this work, three different multi‐criteria data analysis frameworks were investigated to derive manufacturability indices from analysis of the stress maps and the process conditions experienced in the final UF/DF step. Polynomial regression techniques were used to transform the experimental data into a set of stress maps that show viscosity and thermostability as functions of the formulation conditions. A mathematical filtrate flux model was used to capture the time profiles of protein concentration and flux decay behavior during UF/DF. Multi‐criteria decision‐making analysis was used to identify the optimal formulation conditions that minimize the potential for both viscosity and aggregation issues during UF/DF. Biotechnol. Bioeng. 2017;114: 2043–2056. © 2017 The Authors. Biotechnology and Bioengineering Published by Wiley Perodicals, Inc. PMID:28464235
Altamirano, J; Augustin, S; Muntaner, L; Zapata, L; González-Angulo, A; Martínez, B; Flores-Arroyo, A; Camargo, L; Genescá, J
2010-01-01
Variceal bleeding (VB) is the main cause of death among cirrhotic patients. About 30-50% of early rebleeding is encountered few days after the acute episode of VB. It is necessary to stratify patients with high risk of very early rebleeding (VER) for more aggressive therapies. However, there are few and incompletely understood prognostic models for this purpose. To determine the risk factors associated with VER after an acute VB. Assessment and comparison of a novel prognostic model generated by Classification and Regression Tree Analysis (CART) with classic-used models (MELD and Child-Pugh [CP]). Sixty consecutive cirrhotic patients with acute variceal bleeding. CART analysis, MELD and Child-Pugh scores were performed at admission. Receiver operating characteristic (ROC) curves were constructed to evaluate the predictive performance of the models. Very early rebleeding rate was 13%. Variables associated with VER were: serum albumin (p = 0.027), creatinine (p = 0.021) and transfused blood units in the first 24 hrs (p = 0.05). The area under the ROC for MELD, CHILD-Pugh and CART were 0.46, 0.50 and 0.82, respectively. The value of cut analyzed by CART for the significant variables were: 1) Albumin 2.85 mg/dL, 2) Packed red cells 2 units and 3) Creatinine 1.65 mg/dL the ABC-ROC. Serum albumin, creatinine and number of transfused blood units were associated with VER. A simple CART algorithm combining these variables allows an accurate predictive assessment of VER after acute variceal bleeding. Key words: cirrhosis, variceal bleeding, esophageal varices, prognosis, portal hypertension.
Tzeng, Hsy-Yu; Wang, Wei; Tseng, Yen-Hsueh; Chiu, Ching-An; Kuo, Chu-Chia
2018-01-01
Global warming-induced extreme climatic changes have increased the frequency of severe typhoons bringing heavy rains; this has considerably affected the stability of the forest ecosystems. Since the Taiwan 921 earthquake occurred in 21 September 1999, the mountain geology of the Island of Taiwan has become unstable and typhoon-induced floods and mudslides have changed the topography and geomorphology of the area; this has further affected the stability and functions of the riparian ecosystem. In this study, the vegetation of the unique Aowanda Formosan gum forest in Central Taiwan was monitored for 3 years after the occurrence of floods and mudslides during 2009–2011. Tree growth and survival, effects of floods and mudslides, and factors influencing tree survival were investigated. We hypothesized that (1) the effects of floods on the survival are significantly different for each tree species; (2) tree diameter at breast height (DBH) affects tree survival–i.e., the larger the DBH, the higher the survival rate; and (3) the relative position of trees affects tree survival after disturbances by floods and mudslides–the farther trees are from the river, the higher is their survival rate. Our results showed that after floods and mudslides, the lifespans of the major tree species varied significantly. Liquidambar formosana displayed the highest flood tolerance, and the trunks of Lagerstoemia subcostata began rooting after disturbances. Multiple regression analysis indicated that factors such as species, DBH, distance from sampled tree to the above boundary of sample plot (far from the riverbank), and distance from the upstream of the river affected the lifespans of trees; the three factors affected each tree species to different degrees. Furthermore, we showed that insect infestation had a critical role in determining tree survival rate. Our 3-year monitoring investigation revealed that severe typhoon-induced floods and mudslides disturbed the riparian vegetation in the Formosan gum forest, replacing the original vegetation and beginning secondary succession. Moreover, flooding provided new habitats for various plants to establish their progeny. By using our results, lifecycles of trees (including death) can be understood in detail, facilitating riparian vegetation engineering in forests severely disturbed by typhoon-induced floods and mudslides. PMID:29304149
Tzeng, Hsy-Yu; Wang, Wei; Tseng, Yen-Hsueh; Chiu, Ching-An; Kuo, Chu-Chia; Tsai, Shang-Te
2018-01-01
Global warming-induced extreme climatic changes have increased the frequency of severe typhoons bringing heavy rains; this has considerably affected the stability of the forest ecosystems. Since the Taiwan 921 earthquake occurred in 21 September 1999, the mountain geology of the Island of Taiwan has become unstable and typhoon-induced floods and mudslides have changed the topography and geomorphology of the area; this has further affected the stability and functions of the riparian ecosystem. In this study, the vegetation of the unique Aowanda Formosan gum forest in Central Taiwan was monitored for 3 years after the occurrence of floods and mudslides during 2009-2011. Tree growth and survival, effects of floods and mudslides, and factors influencing tree survival were investigated. We hypothesized that (1) the effects of floods on the survival are significantly different for each tree species; (2) tree diameter at breast height (DBH) affects tree survival-i.e., the larger the DBH, the higher the survival rate; and (3) the relative position of trees affects tree survival after disturbances by floods and mudslides-the farther trees are from the river, the higher is their survival rate. Our results showed that after floods and mudslides, the lifespans of the major tree species varied significantly. Liquidambar formosana displayed the highest flood tolerance, and the trunks of Lagerstoemia subcostata began rooting after disturbances. Multiple regression analysis indicated that factors such as species, DBH, distance from sampled tree to the above boundary of sample plot (far from the riverbank), and distance from the upstream of the river affected the lifespans of trees; the three factors affected each tree species to different degrees. Furthermore, we showed that insect infestation had a critical role in determining tree survival rate. Our 3-year monitoring investigation revealed that severe typhoon-induced floods and mudslides disturbed the riparian vegetation in the Formosan gum forest, replacing the original vegetation and beginning secondary succession. Moreover, flooding provided new habitats for various plants to establish their progeny. By using our results, lifecycles of trees (including death) can be understood in detail, facilitating riparian vegetation engineering in forests severely disturbed by typhoon-induced floods and mudslides.
Stemflow in low-density and hedgerow olive orchards in Portugal
NASA Astrophysics Data System (ADS)
Dias, Pedro D.; Valente, Fernanda; Pereira, Fernando L.; Abreu, Francisco G.
2015-04-01
Stemflow (Sf) is responsible for a localized water and solute input to soil around tree's trunks, playing an important eco-hydrological role in forest and agricultural ecosystems. Sf was monitored for seven months in 25 Olea europaea L. trees distributed in three orchards managed in two different ways, traditional low-density and super high density hedgerow. The orchards were located in central Portugal in the regions of Santarém (Várzea and Azóia) and Lisboa (Tapada). Seven olive varieties were analysed: Arbequina, Galega, Picual, Maçanilha, Cordovil, Azeiteira, Negrinha and Blanqueta. Measured Sf ranged from 7.5 to 87.2 mm (relative to crown-projected area), corresponding to 1.2 and 16.7% of gross rainfall (Pg). To understand better the variables that affect Sf and to be able to predict its value, linear regression models were fitted to these data. Whenever possible, the linear models were simplified using the backward stepwise algorithm based on the Akaike information criterion. For each tree, multiple linear regressions were adjusted between Sf and the duration, volume and intensity of rainfall episodes and maximum evaporation rate. In the low-density Várzea grove the more relevant explanatory variables were the three rainfall characteristics. In the super high density Azóia orchard only rainfall volume and intensity were considered relevant. In the low-density Tapada's grove all trees had a different sub-model with Pg being the only common variable. To try to explain differences between trees and to improve the quality of the modeling in each orchard, another set of explanatory variables was added: canopy volume, tree and trunk heights and trunk perimeter at the height of the first branches. The variables present in all sub-models were rainfall volume and intensity and the tree and trunk heights. Canopy volume and rainfall duration were also present in the sub-models of the two low-density groves (Tapada and Várzea). The determination coefficient (R2) of all models ranged from 0.5 to 0.76. The size of leaves was also analysed. Although there were significant differences between varieties and between trees of the same variety, they did not seem to affect the amount of Sf generated. Through analysis of bark storage capacity, it was found that older trees, with rough and thick bark, had higher trunk storage capacity and, therefore, originated less Sf. The results confirm the need for considering the contribution of stemflow when trying to correctly assess interception loss in olive orchards. Although the use of simple and general statistical models may be an attractive option, their precision may be small, making direct measurements or conceptual modelling preferable methods.
Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien
2015-09-01
According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.
Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games
Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.
2017-01-01
In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245
NASA Astrophysics Data System (ADS)
Mangla, Rohit; Kumar, Shashi; Nandy, Subrata
2016-05-01
SAR and LiDAR remote sensing have already shown the potential of active sensors for forest parameter retrieval. SAR sensor in its fully polarimetric mode has an advantage to retrieve scattering property of different component of forest structure and LiDAR has the capability to measure structural information with very high accuracy. This study was focused on retrieval of forest aboveground biomass (AGB) using Terrestrial Laser Scanner (TLS) based point clouds and scattering property of forest vegetation obtained from decomposition modelling of RISAT-1 fully polarimetric SAR data. TLS data was acquired for 14 plots of Timli forest range, Uttarakhand, India. The forest area is dominated by Sal trees and random sampling with plot size of 0.1 ha (31.62m*31.62m) was adopted for TLS and field data collection. RISAT-1 data was processed to retrieve SAR data based variables and TLS point clouds based 3D imaging was done to retrieve LiDAR based variables. Surface scattering, double-bounce scattering, volume scattering, helix and wire scattering were the SAR based variables retrieved from polarimetric decomposition. Tree heights and stem diameters were used as LiDAR based variables retrieved from single tree vertical height and least square circle fit methods respectively. All the variables obtained for forest plots were used as an input in a machine learning based Random Forest Regression Model, which was developed in this study for forest AGB estimation. Modelled output for forest AGB showed reliable accuracy (RMSE = 27.68 t/ha) and a good coefficient of determination (0.63) was obtained through the linear regression between modelled AGB and field-estimated AGB. The sensitivity analysis showed that the model was more sensitive for the major contributed variables (stem diameter and volume scattering) and these variables were measured from two different remote sensing techniques. This study strongly recommends the integration of SAR and LiDAR data for forest AGB estimation.
Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa
2017-01-01
Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.
Koch, George W; Sillett, Stephen C; Jennings, Gregory M; Davis, Stephen D
2004-04-22
Trees grow tall where resources are abundant, stresses are minor, and competition for light places a premium on height growth. The height to which trees can grow and the biophysical determinants of maximum height are poorly understood. Some models predict heights of up to 120 m in the absence of mechanical damage, but there are historical accounts of taller trees. Current hypotheses of height limitation focus on increasing water transport constraints in taller trees and the resulting reductions in leaf photosynthesis. We studied redwoods (Sequoia sempervirens), including the tallest known tree on Earth (112.7 m), in wet temperate forests of northern California. Our regression analyses of height gradients in leaf functional characteristics estimate a maximum tree height of 122-130 m barring mechanical damage, similar to the tallest recorded trees of the past. As trees grow taller, increasing leaf water stress due to gravity and path length resistance may ultimately limit leaf expansion and photosynthesis for further height growth, even with ample soil moisture.
Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping
This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...
Explicit criteria for prioritization of cataract surgery
Ma Quintana, José; Escobar, Antonio; Bilbao, Amaia
2006-01-01
Background Consensus techniques have been used previously to create explicit criteria to prioritize cataract extraction; however, the appropriateness of the intervention was not included explicitly in previous studies. We developed a prioritization tool for cataract extraction according to the RAND method. Methods Criteria were developed using a modified Delphi panel judgment process. A panel of 11 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the effect of all variables on the final panel score using general linear and logistic regression models. Priority scoring systems were developed by means of optimal scaling and general linear models. The explicit criteria developed were summarized by means of regression tree analysis. Results Eight variables were considered to create the indications. Of the 310 indications that the panel evaluated, 22.6% were considered high priority, 52.3% intermediate priority, and 25.2% low priority. Agreement was reached for 31.9% of the indications and disagreement for 0.3%. Logistic regression and general linear models showed that the preoperative visual acuity of the cataractous eye, visual function, and anticipated visual acuity postoperatively were the most influential variables. Alternative and simple scoring systems were obtained by optimal scaling and general linear models where the previous variables were also the most important. The decision tree also shows the importance of the previous variables and the appropriateness of the intervention. Conclusion Our results showed acceptable validity as an evaluation and management tool for prioritizing cataract extraction. It also provides easy algorithms for use in clinical practice. PMID:16512893
Leaf-on canopy closure in broadleaf deciduous forests predicted during winter
Twedt, Daniel J.; Ayala, Andrea J.; Shickel, Madeline R.
2015-01-01
Forest canopy influences light transmittance, which in turn affects tree regeneration and survival, thereby having an impact on forest composition and habitat conditions for wildlife. Because leaf area is the primary impediment to light penetration, quantitative estimates of canopy closure are normally made during summer. Studies of forest structure and wildlife habitat that occur during winter, when deciduous trees have shed their leaves, may inaccurately estimate canopy closure. We estimated percent canopy closure during both summer (leaf-on) and winter (leaf-off) in broadleaf deciduous forests in Mississippi and Louisiana using gap light analysis of hemispherical photographs that were obtained during repeat visits to the same locations within bottomland and mesic upland hardwood forests and hardwood plantation forests. We used mixed-model linear regression to predict leaf-on canopy closure from measurements of leaf-off canopy closure, basal area, stem density, and tree height. Competing predictive models all included leaf-off canopy closure (relative importance = 0.93), whereas basal area and stem density, more traditional predictors of canopy closure, had relative model importance of ≤ 0.51.
NASA Astrophysics Data System (ADS)
Susanti, Yuliana; Zukhronah, Etik; Pratiwi, Hasih; Respatiwulan; Sri Sulistijowati, H.
2017-11-01
To achieve food resilience in Indonesia, food diversification by exploring potentials of local food is required. Corn is one of alternating staple food of Javanese society. For that reason, corn production needs to be improved by considering the influencing factors. CHAID and CRT are methods of data mining which can be used to classify the influencing variables. The present study seeks to dig up information on the potentials of local food availability of corn in regencies and cities in Java Island. CHAID analysis yields four classifications with accuracy of 78.8%, while CRT analysis yields seven classifications with accuracy of 79.6%.
Predicting heavy metal concentrations in soils and plants using field spectrophotometry
NASA Astrophysics Data System (ADS)
Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.
2017-09-01
Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p <0.01) in a 2498 nm spectral range for soils. Pb has a significantly higher correlation at red edge for plants. A regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.
Ricker, Martin; Peña Ramírez, Víctor M.; von Rosen, Dietrich
2014-01-01
Growth curves are monotonically increasing functions that measure repeatedly the same subjects over time. The classical growth curve model in the statistical literature is the Generalized Multivariate Analysis of Variance (GMANOVA) model. In order to model the tree trunk radius (r) over time (t) of trees on different sites, GMANOVA is combined here with the adapted PL regression model Q = A·T+E, where for and for , A = initial relative growth to be estimated, , and E is an error term for each tree and time point. Furthermore, Ei[–b·r] = , , with TPR being the turning point radius in a sigmoid curve, and at is an estimated calibrating time-radius point. Advantages of the approach are that growth rates can be compared among growth curves with different turning point radiuses and different starting points, hidden outliers are easily detectable, the method is statistically robust, and heteroscedasticity of the residuals among time points is allowed. The model was implemented with dendrochronological data of 235 Pinus montezumae trees on ten Mexican volcano sites to calculate comparison intervals for the estimated initial relative growth . One site (at the Popocatépetl volcano) stood out, with being 3.9 times the value of the site with the slowest-growing trees. Calculating variance components for the initial relative growth, 34% of the growth variation was found among sites, 31% among trees, and 35% over time. Without the Popocatépetl site, the numbers changed to 7%, 42%, and 51%. Further explanation of differences in growth would need to focus on factors that vary within sites and over time. PMID:25402427
Remote sensing of changes in morphology and physiology of trees under stress
NASA Technical Reports Server (NTRS)
Olson, C. E., Jr.; Rohde, W. G.; Ward, J. M.
1970-01-01
Results of continuing studies of forest trees subjected to varying types of stress are reported. Both greenhouse and field studies are included. Greenhouse work with tree seedlings exposed to varying levels of NaCl and CaCl2 in the soil indicated that, in the initial stages, palisade cells shrink and the amount of air space in the leaf increases. As the severity of damage increases, the cells of the spongy mesophyll shrink and flatten, and the amount of air space in the leaf decreases. Statistical analysis of foliar reflectance and associated moisture content data led to a series of regression equations for predicting foliar moisture content from reflectance data. Equations were calculated for three species, yellow birch (Betula alleghaniensis Britton), sugar maple (Acer saccharum Marsh.) and white ash (Fraxinus americana L.) having multiple correlation coefficients of 0.98, 0.94 and 0.93 respectively. Interpretation of multispectral imagery of the Ann Arbor Forestry Test Site (NASA Site 190) provided evidence that infections of Fomes annosus can be detected in the early stages. Infections of two needle cast diseases were also detected in conifer plantations in the test site. A study of automatic interpretation of multispectral scanner imagery for tree species recognition provided encouraging results.
Gentilesca, Tiziana; Rita, Angelo; Brunetti, Michele; Giammarchi, Francesco; Leonardi, Stefano; Magnani, Federico; van Noije, Twan; Tonon, Giustino; Borghetti, Marco
2018-07-01
In this study, we investigated the role of climatic variability and atmospheric nitrogen deposition in driving long-term tree growth in canopy beech trees along a geographic gradient in the montane belt of the Italian peninsula, from the Alps to the southern Apennines. We sampled dominant trees at different developmental stages (from young to mature tree cohorts, with tree ages spanning from 35 to 160 years) and used stem analysis to infer historic reconstruction of tree volume and dominant height. Annual growth volume (G V ) and height (G H ) variability were related to annual variability in model simulated atmospheric nitrogen deposition and site-specific climatic variables, (i.e. mean annual temperature, total annual precipitation, mean growing period temperature, total growing period precipitation, and standard precipitation evapotranspiration index) and atmospheric CO 2 concentration, including tree cambial age among growth predictors. Generalized additive models (GAM), linear mixed-effects models (LMM), and Bayesian regression models (BRM) were independently employed to assess explanatory variables. The main results from our study were as follows: (i) tree age was the main explanatory variable for long-term growth variability; (ii) GAM, LMM, and BRM results consistently indicated climatic variables and CO 2 effects on G V and G H were weak, therefore evidence of recent climatic variability influence on beech annual growth rates was limited in the montane belt of the Italian peninsula; (iii) instead, significant positive nitrogen deposition (N dep ) effects were repeatedly observed in G V and G H ; the positive effects of N dep on canopy height growth rates, which tended to level off at N dep values greater than approximately 1.0 g m -2 y -1 , were interpreted as positive impacts on forest stand above-ground net productivity at the selected study sites. © 2018 John Wiley & Sons Ltd.
Quantifying post-fire fallen trees using multi-temporal lidar
NASA Astrophysics Data System (ADS)
Bohlin, Inka; Olsson, Håkan; Bohlin, Jonas; Granström, Anders
2017-12-01
Massive tree-felling due to root damage is a common fire effect on burnt areas in Scandinavia, but has so far not been analyzed in detail. Here we explore if pre- and post-fire lidar data can be used to estimate the proportion of fallen trees. The study was carried out within a large (14,000 ha) area in central Sweden burnt in August 2014, where we had access to airborne lidar data from both 2011 and 2015. Three data-sets of predictor variables were tested: POST (post-fire lidar metrics), DIF (difference between post- and pre-fire lidar metrics) and combination of those two (POST_DIF). Fractional logistic regression was used to predict the proportion of fallen trees. Training data consisted of 61 plots, where the number of fallen and standing trees was calculated both in the field and with interpretation of drone images. The accuracy of the best model was tested based on 100 randomly selected validation plots with a size of 25 × 25 m. Our results showed that multi-temporal lidar together with field-collected training data can be used for quantifying post-fire tree felling over large areas. Several height-, density- and intensity metrics correlated with the proportion of fallen trees. The best model combined metrics from both datasets (POST_DIF), resulting in a RMSE of 0.11. Results were slightly poorer in the validation plots with RMSE of 0.18 using pixel size of 12.5 m and RMSE of 0.15 using pixel size of 6.25 m. Our model performed least well for stands that had been exposed to high-intensity crown fire. This was likely due to the low amount of echoes from the standing black tree skeletons. Wall-to-wall maps produced with this model can be used for landscape level analysis of fire effects and to explore the relationship between fallen trees and forest structure, soil type, fire intensity or topography.
Anderson, S.C.; Kupfer, J.A.; Wilson, R.R.; Cooper, R.J.
2000-01-01
The purpose of this research was to develop a model that could be used to provide a spatial representation of uneven-aged silvicultural treatments on forest crown area. We began by developing species-specific linear regression equations relating tree DBH to crown area for eight bottomland tree species at White River National Wildlife Refuge, Arkansas, USA. The relationships were highly significant for all species, with coefficients of determination (r(2)) ranging from 0.37 for Ulmus crassifolia to nearly 0.80 for Quercus nuttalliii and Taxodium distichum. We next located and measured the diameters of more than 4000 stumps from a single tree-group selection timber harvest. Stump locations were recorded with respect to an established gl id point system and entered into a Geographic Information System (ARC/INFO). The area occupied by the crown of each logged individual was then estimated by using the stump dimensions (adjusted to DBHs) and the regression equations relating tree DBH to crown area. Our model projected that the selection cuts removed roughly 300 m(2) of basal area from the logged sites resulting in the loss of approximate to 55 000 m(2) of crown area. The model developed in this research represents a tool that can be used in conjunction with remote sensing applications to assist in forest inventory and management, as well as to estimate the impacts of selective timber harvest on wildlife.
Annual Tree Growth Predictions From Periodic Measurements
Quang V. Cao
2004-01-01
Data from annual measurements of a loblolly pine (Pinus taeda L.) plantation were available for this study. Regression techniques were employed to model annual changes of individual trees in terms of diameters, heights, and survival probabilities. Subsets of the data that include measurements every 2, 3, 4, 5, and 6 years were used to fit the same...
Understory response following varying levels of overstory removal in mixed conifer stands
Fabian C.C. Uzoh; Leroy K. Dolph; John R. Anstead
1997-01-01
Diameter growth rates of understory trees were measured for periods both before and after overstory removal on six study areas in northern California. All the species responded with increased diameter growth after adjusting to their new environments. Linear regression equations that predict post treatment diameter growth increment of the residual trees are presented...
Delayed conifer tree mortality following fire in California
Sharon M. Hood; Sheri L. Smith; Daniel R. Cluck
2007-01-01
Fire injury was characterized and survival monitored for 5,246 trees from five wildfires in California that occurred between 1999 and 2002. Logistic regression models for predicting the probability of mortality were developed for incense-cedar, Jeffrey pine, ponderosa pine, red fir and white fir. Two-year post-fire preliminary models were developed for incense-cedar,...
Estimating leaf area and leaf biomass of open-grown deciduous urban trees
David J. Nowak
1996-01-01
Logarithmic regression equations were developed to predict leaf area and leaf biomass for open-grown deciduous urban trees based on stem diameter and crown parameters. Equations based on crown parameters produced more reliable estimates. The equations can be used to help quantify forest structure and functions, particularly in urbanizing and urban/suburban areas.
Biomass of Yellow-Poplar in Natural Stands in Western North Carolina
Alexander Clark; James G. Schroeder
1977-01-01
Aboveground biomass was determined for yellow-poplar(Liriodendron tulipifera L.) trees 6 to 28 inches d. b. h. growingin natural, uneven-aged mountaincovestandsin western North Carolina.Specific gravity, moisture content, and green weight per cubic foot are presented for the total tree and its components. Tables developed from regression equations show weight and...
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis
Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini
1999-01-01
Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
NASA Astrophysics Data System (ADS)
Zhang, Wangfei; Chen, Erxue; Li, Zengyuan; Feng, Qi; Zhao, Lei
2016-08-01
DEM Differential Method is an effective and efficient way for forest tree height assessment with Polarimetric and interferometric technology, however, the assessment accuracy of it is based on the accuracy of interferometric results and DEM. Terra-SAR/TanDEM-X, which established the first spaceborne bistatic interferometer, can provide highly accurate cross-track interferometric images in the whole global without inherent accuracy limitations like temporal decorrelation and atmospheric disturbance. These characters of Terra-SAR/TandDEM-X give great potential for global or regional tree height assessment, which have been constraint by the temporal decorrelation in traditional repeat-pass interferometry. Currently, in China, it will be costly to collect high accurate DEM with Lidar. At the same time, it is also difficult to get truly representative ground survey samples to test and verify the assessment results. In this paper, we analyzed the feasibility of using TerraSAR/TanDEM-X data to assess forest tree height with current free DEM data like ASTER-GDEM and archived ground in-suit data like forest management inventory data (FMI). At first, the accuracy and of ASTER-GDEM and forest management inventory data had been assessment according to the DEM and canopy height model (CHM) extracted from Lidar data. The results show the average elevation RMSE between ASTER-GEDM and Lidar-DEM is about 13 meters, but they have high correlation with the correlation coefficient of 0.96. With a linear regression model, we can compensate ASTER-GDEM and improve its accuracy nearly to the Lidar-DEM with same scale. The correlation coefficient between FMI and CHM is 0.40. its accuracy is able to be improved by a linear regression model withinconfidence intervals of 95%. After compensation of ASTER-GDEM and FMI, we calculated the tree height in Mengla test site with DEM Differential Method. The results showed that the corrected ASTER-GDEM can effectively improve the assessment accuracy. The average assessment accuracy before and after corrected is 0.73 and 0.76, the RMSE is 5.5 and 4.4, respectively.
Chiang, Peggy Pei-Chia; Xie, Jing; Keeffe, Jill Elizabeth
2011-04-25
To identify the critical success factors (CSF) associated with coverage of low vision services. Data were collected from a survey distributed to Vision 2020 contacts, government, and non-government organizations (NGOs) in 195 countries. The Classification and Regression Tree Analysis (CART) was used to identify the critical success factors of low vision service coverage. Independent variables were sourced from the survey: policies, epidemiology, provision of services, equipment and infrastructure, barriers to services, human resources, and monitoring and evaluation. Socioeconomic and demographic independent variables: health expenditure, population statistics, development status, and human resources in general, were sourced from the World Health Organization (WHO), World Bank, and the United Nations (UN). The findings identified that having >50% of children obtaining devices when prescribed (χ(2) = 44; P < 0.000), multidisciplinary care (χ(2) = 14.54; P = 0.002), >3 rehabilitation workers per 10 million of population (χ(2) = 4.50; P = 0.034), higher percentage of population urbanized (χ(2) = 14.54; P = 0.002), a level of private investment (χ(2) = 14.55; P = 0.015), and being fully funded by government (χ(2) = 6.02; P = 0.014), are critical success factors associated with coverage of low vision services. This study identified the most important predictors for countries with better low vision coverage. The CART is a useful and suitable methodology in survey research and is a novel way to simplify a complex global public health issue in eye care.
Yaghoubian, Arezou; de Virgilio, Christian; Dauphine, Christine; Lewis, Roger J; Lin, Matthew
2007-09-01
Simple admission laboratory values can be used to classify patients with necrotizing soft-tissue infection (NSTI) into high and low mortality risk groups. Chart review. Public teaching hospital. All patients with NSTI from 1997 through 2006. Variables analyzed included medical history, admission vital signs, laboratory values, and microbiologic findings. Data analyses included univariate and classification and regression tree analyses. Mortality. One hundred twenty-four patients were identified with NSTI. The overall mortality rate was 21 of 124 (17%). On univariate analysis, factors associated with mortality included a history of cancer (P = .03), intravenous drug abuse (P < .001), low systolic blood pressure on admission (P = .03), base deficit (P = .009), and elevated white blood cell count (P = .06). On exploratory classification and regression tree analysis, admission serum lactate and sodium levels were predictors of mortality, with a sensitivity of 100%, specificity of 28%, positive predictive value of 23%, and negative predictive value of 100%. A serum lactate level greater than or equal to 54.1 mg/dL (6 mmol/L) alone was associated with a 32% mortality, whereas a serum sodium level greater than or equal to 135 mEq/L combined with a lactate level less than 54.1 mg/dL was associated with a mortality of 0%. Mortality for NSTIs remains high. A simple model, using admission serum lactate and serum sodium levels, may help identify patients at greatest risk for death.
Carriage of methicillin-resistant Staphylococcus aureus by veterinarians in Australia.
Jordan, D; Simon, J; Fury, S; Moss, S; Giffard, P; Maiwald, M; Southwell, P; Barton, M D; Axon, J E; Morris, S G; Trott, D J
2011-05-01
To estimate the prevalence of carriage of methicillin-resistant Staphylococcus aureus (MRSA) among Australian veterinarians. Individuals attending veterinary conferences in Australia in 2009 were recruited to provide nasal swabs and complete a questionnaire about their professional activities. Swabs were processed by standard methods for detecting MRSA and questionnaire responses were used to group veterinarians according to their areas of major work emphasis (species and practice type). Prevalence was estimated for each of these grouping and contingency tables and regression tree analysis used to explain the variation in MRSA carriage. Among the 771 respondents 'industry and government veterinarians' (controls) had the lowest prevalence of MRSA carriage at 0.9%. Veterinarians with horses as a major area of work emphasis had a prevalence of 11.8% (13-fold that of controls) and those whose only major emphasis was horses had a prevalence of 21.4% (23-fold that of controls). Veterinarians with dogs and cats as a major activity had a 4.9% prevalence (5-fold that of controls). Prevalence rates for other major activities (pigs, dairy and beef cattle, avian and wildlife) were also increased, but were estimated from smaller numbers of respondents. Regression tree analysis clearly isolated equine veterinarians and dog and cat practitioners as groups at increased risk of carriage of MRSA. Carriage of MRSA is a notable occupational health issue for veterinarians in clinical practice in Australia, particularly those who work with horses. © 2011 The Authors. Australian Veterinary Journal © 2011 Australian Veterinary Association.
NASA Astrophysics Data System (ADS)
Deo, Ravinesh C.; Kisi, Ozgur; Singh, Vijay P.
2017-02-01
Drought forecasting using standardized metrics of rainfall is a core task in hydrology and water resources management. Standardized Precipitation Index (SPI) is a rainfall-based metric that caters for different time-scales at which the drought occurs, and due to its standardization, is well-suited for forecasting drought at different periods in climatically diverse regions. This study advances drought modelling using multivariate adaptive regression splines (MARS), least square support vector machine (LSSVM), and M5Tree models by forecasting SPI in eastern Australia. MARS model incorporated rainfall as mandatory predictor with month (periodicity), Southern Oscillation Index, Pacific Decadal Oscillation Index and Indian Ocean Dipole, ENSO Modoki and Nino 3.0, 3.4 and 4.0 data added gradually. The performance was evaluated with root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (r2). Best MARS model required different input combinations, where rainfall, sea surface temperature and periodicity were used for all stations, but ENSO Modoki and Pacific Decadal Oscillation indices were not required for Bathurst, Collarenebri and Yamba, and the Southern Oscillation Index was not required for Collarenebri. Inclusion of periodicity increased the r2 value by 0.5-8.1% and reduced RMSE by 3.0-178.5%. Comparisons showed that MARS superseded the performance of the other counterparts for three out of five stations with lower MAE by 15.0-73.9% and 7.3-42.2%, respectively. For the other stations, M5Tree was better than MARS/LSSVM with lower MAE by 13.8-13.4% and 25.7-52.2%, respectively, and for Bathurst, LSSVM yielded more accurate result. For droughts identified by SPI ≤ - 0.5, accurate forecasts were attained by MARS/M5Tree for Bathurst, Yamba and Peak Hill, whereas for Collarenebri and Barraba, M5Tree was better than LSSVM/MARS. Seasonal analysis revealed disparate results where MARS/M5Tree was better than LSSVM. The results highlight the importance of periodicity in drought forecasting and also ascertains that model accuracy scales with geographic/seasonal factors due to complexity of drought and its relationship with inputs and data attributes that can affect the evolution of drought events.
NASA Astrophysics Data System (ADS)
Khodasevich, M. A.; Sinitsyn, G. V.; Skorbanova, E. A.; Rogovaya, M. V.; Kambur, E. I.; Aseev, V. A.
2016-06-01
Analysis of multiparametric data on transmission spectra of 24 divins (Moldovan cognacs) in the 190-2600 nm range allows identification of outliers and their removal from a sample under study in the following consideration. The principal component analysis and classification tree with a single-rank predictor constructed in the 2D space of principal components allow classification of divin manufacturers. It is shown that the accuracy of syringaldehyde, ethyl acetate, vanillin, and gallic acid concentrations in divins calculated with the regression to latent structures depends on the sample volume and is 3, 6, 16, and 20%, respectively, which is acceptable for the application.
Biomass expansion factor and root-to-shoot ratio for Pinus in Brazil.
Sanquetta, Carlos R; Corte, Ana Pd; da Silva, Fernando
2011-09-24
The Biomass Expansion Factor (BEF) and the Root-to-Shoot Ratio (R) are variables used to quantify carbon stock in forests. They are often considered as constant or species/area specific values in most studies. This study aimed at showing tree size and age dependence upon BEF and R and proposed equations to improve forest biomass and carbon stock. Data from 70 sample Pinus spp. grown in southern Brazil trees in different diameter classes and ages were used to demonstrate the correlation between BEF and R, and forest inventory data, such as DBH, tree height and age. Total dry biomass, carbon stock and CO2 equivalent were simulated using the IPCC default values of BEF and R, corresponding average calculated from data used in this study, as well as the values estimated by regression equations. The mean values of BEF and R calculated in this study were 1.47 and 0.17, respectively. The relationship between BEF and R and the tree measurement variables were inversely related with negative exponential behavior. Simulations indicated that use of fixed values of BEF and R, either IPCC default or current average data, may lead to unreliable estimates of carbon stock inventories and CDM projects. It was concluded that accounting for the variations in BEF and R and using regression equations to relate them to DBH, tree height and age, is fundamental in obtaining reliable estimates of forest tree biomass, carbon sink and CO2 equivalent.
Park, M; Lee, S-K; Choi, J; Kim, S-H; Kim, S H; Shin, N-Y; Kim, J; Ahn, S S
2015-10-01
Cystic pituitary adenomas may mimic Rathke cleft cysts when there is no solid enhancing component found on MR imaging, and preoperative differentiation may enable a more appropriate selection of treatment strategies. We investigated the diagnostic potential of MR imaging features to differentiate cystic pituitary adenomas from Rathke cleft cysts and to develop a diagnostic model. This retrospective study included 54 patients with a cystic pituitary adenoma (40 women; mean age, 37.7 years) and 28 with a Rathke cleft cyst (18 women; mean age, 31.5 years) who underwent MR imaging followed by surgery. The following imaging features were assessed: the presence or absence of a fluid-fluid level, a hypointense rim on T2-weighted images, septation, an off-midline location, the presence or absence of an intracystic nodule, size change, and signal change. On the basis of the results of logistic regression analysis, a diagnostic tree model was developed to differentiate between cystic pituitary adenomas and Rathke cleft cysts. External validation was performed for an additional 16 patients with a cystic pituitary adenoma and 8 patients with a Rathke cleft cyst. The presence of a fluid-fluid level, a hypointense rim on T2-weighted images, septation, and an off-midline location were more common with pituitary adenomas, whereas the presence of an intracystic nodule was more common with Rathke cleft cysts. Multiple logistic regression analysis showed that cystic pituitary adenomas and Rathke cleft cysts can be distinguished on the basis of the presence of a fluid-fluid level, septation, an off-midline location, and the presence of an intracystic nodule (P = .006, .032, .001, and .023, respectively). Among 24 patients in the external validation population, 22 were classified correctly on the basis of the diagnostic tree model used in this study. A systematic approach using this diagnostic tree model can be helpful in distinguishing cystic pituitary adenomas from Rathke cleft cysts. © 2015 by American Journal of Neuroradiology.
NASA Astrophysics Data System (ADS)
Parajuli, A.; Nadeau, D.; Anctil, F.; Parent, A. C.; Bouchard, B.; Jutras, S.
2017-12-01
In snow-fed catchments, it is crucial to monitor and to model snow water equivalent (SWE), particularly to simulate the melt water runoff. However, the distribution of SWE can be highly heterogeneous, particularly within forested environments, mainly because of the large variability in snow depths. Although the boreal forest is the dominant land cover in Canada and in a few other northern countries, very few studies have quantified the spatiotemporal variability of snow depths and snowpack dynamics within this biome. The objective of this paper is to fill this research gap, through a detailed monitoring of snowpack dynamics at nine locations within a 3.57 km2 experimental forested catchment in southern Quebec, Canada (47°N, 71°W). The catchment receives 6 m of snow annually on average and is predominantly covered with balsam fir stand with some traces of spruce and white birch. In this study, we used a network of nine so-called `snow profiling stations', providing automated snow depth and snowpack temperature profile measurements, as well as three contrasting sites (juvenile, sapling and open areas) where sublimation rates were directly measured with flux towers. In addition, a total of 1401 manual snow samples supported by 20 snow pits measurements were collected throughout the winter of 2017. This paper presents some preliminary analyses of this unique dataset. Simple empirical relations relying SWE with easy-to-determine proxies, such as snow depths and snow temperature, are tested. Then, binary regression trees and multiple regression analysis are used to model SWE using topographic characteristics (slope, aspect, elevation), forest features (tree height, tree diameter, forest density and gap fraction) and meteorological forcing (solar radiation, wind speed, snow-pack temperature profile, air temperature, humidity). An analysis of sublimation rates comparing open area, saplings and juvenile forest is also presented in this paper.
Pollen-limited reproduction in blue oak: Implications for wind pollination in fragmented populations
Knapp, E.E.; Goedde, M.A.; Rice, K.J.
2001-01-01
Human activities are fragmenting forests and woodlands worldwide, but the impact of reduced tree population densities on pollen transfer in wind-pollinated trees is poorly understood. In a 4-year study, we evaluated relationships among stand density, pollen availability, and seed production in a thinned and fragmented population of blue oak (Quercus douglasii). Geographic coordinates were established and flowering interval determined for 100 contiguous trees. The number of neighboring trees within 60 m that released pollen during each tree's flowering period was calculated and relationships with acorn production explored using multiple regression. We evaluated the effects of female flower production, average temperature, and relative humidity during the pollination period, and number of pollen-producing neighbors on individual trees' acorn production. All factors except temperature were significant in at least one of the years of our study, but the combination of factors influencing acorn production varied among years. In 1996, a year of large acorn crop size, acorn production was significantly positively associated with number of neighboring pollen producers and density of female flowers. In 1997, 1998, and 1999, many trees produced few or no acorns, and significant associations between number of pollen-producing neighbors and acorn production were only apparent among moderately to highly reproductive trees. Acorn production by these reproductive trees in 1997 was significantly positively associated with number of neighboring pollen producers and significantly negatively associated with average relative humidity during the pollination period. In 1998, no analysis was possible, because too few trees produced a moderate to large acorn crop. Only density of female flowers was significantly associated with acorn production of moderately to highly reproductive trees in 1999. The effect of spatial scale was also investigated by conducting analyses with pollen producers counted in radii ranging from 30 m to 80 m. The association between number of pollen-producing neighbors and acorn production was strongest when neighborhood sizes of 60 m or larger were considered. Our results suggest that fragmentation and thinning of blue oak woodlands may reduce pollen availability and limit reproduction in this wind-pollinated species.
Evaluation of impacts of trees on PM2.5 dispersion in urban streets
NASA Astrophysics Data System (ADS)
Jin, Sijia; Guo, Jiankang; Wheeler, Stephen; Kan, Liyan; Che, Shengquan
2014-12-01
Reducing airborne particulate matter (PM), especially PM2.5 (PM with aerodynamic diameters of 2.5 μm or less), in urban street canyons is critical to the health of central city population. Tree-planting in urban street canyons is a double-edged sword, providing landscape benefits while inevitably resulting in PM2.5 concentrating at street level, thus showing negative environmental effects. Thereby, it is necessary to quantify the impact of trees on PM2.5 dispersion and obtain the optimum structure of street trees for minimizing the PM2.5 concentration in street canyons. However, most of the previous findings in this field were derived from wind tunnel or numerical simulation rather than on-site measuring data. In this study, a seasonal investigation was performed in six typical street canyons in the residential area of central Shanghai, which has been suffering from haze pollution while having large numbers of green streets. We monitored and measured PM2.5 concentrations at five heights, structural parameters of street trees and weather. For tree-free street canyons, declining PM2.5 concentrations were found with increasing height. However, in presence of trees the reduction rate of PM2.5 concentrations was less pronounced, and for some cases, the concentrations even increased at the top of street canyons, indicating tree canopies are trapping PM2.5. To quantify the decrease of PM2.5 reduction rate, we developed the attenuation coefficient of PM2.5 (PMAC). The wind speed was significantly lower in street canyons with trees than in tree-free ones. A mixed-effects model indicated that canopy density (CD), leaf area index (LAI), rate of change of wind speed were the most significant predictors influencing PMAC. Further regression analysis showed that in order to balance both environmental and landscape benefits of green streets, the optimum range of CD and LAI was 50%-60% and 1.5-2.0 respectively. We concluded by suggesting an optimized tree-planting pattern and discussing strategies for a better green streets planning and pruning.
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
NASA Astrophysics Data System (ADS)
Shao, G.; Gallion, J.; Fei, S.
2016-12-01
Sound forest aboveground biomass estimation is required to monitor diverse forest ecosystems and their impacts on the changing climate. Lidar-based regression models provided promised biomass estimations in most forest ecosystems. However, considerable uncertainties of biomass estimations have been reported in the temperate hardwood and hardwood-dominated mixed forests. Varied site productivities in temperate hardwood forests largely diversified height and diameter growth rates, which significantly reduced the correlation between tree height and diameter at breast height (DBH) in mature and complex forests. It is, therefore, difficult to utilize height-based lidar metrics to predict DBH-based field-measured biomass through a simple regression model regardless the variation of site productivity. In this study, we established a multi-dimension nonlinear regression model incorporating lidar metrics and site productivity classes derived from soil features. In the regression model, lidar metrics provided horizontal and vertical structural information and productivity classes differentiated good and poor forest sites. The selection and combination of lidar metrics were discussed. Multiple regression models were employed and compared. Uncertainty analysis was applied to the best fit model. The effects of site productivity on the lidar-based biomass model were addressed.
NASA Astrophysics Data System (ADS)
Lilly, P.; Yanai, R. D.; Buckley, H. L.; Case, B. S.; Woollons, R. C.; Holdaway, R. J.; Johnson, J.
2016-12-01
Calculations of forest biomass and elemental content require many measurements and models, each contributing uncertainty to the final estimates. While sampling error is commonly reported, based on replicate plots, error due to uncertainty in the regression used to estimate biomass from tree diameter is usually not quantified. Some published estimates of uncertainty due to the regression models have used the uncertainty in the prediction of individuals, ignoring uncertainty in the mean, while others have propagated uncertainty in the mean while ignoring individual variation. Using the simple case of the calcium concentration of sugar maple leaves, we compare the variation among individuals (the standard deviation) to the uncertainty in the mean (the standard error) and illustrate the declining importance in the prediction of individual concentrations as the number of individuals increases. For allometric models, the analogous statistics are the prediction interval (or the residual variation in the model fit) and the confidence interval (describing the uncertainty in the best fit model). The effect of propagating these two sources of error is illustrated using the mass of sugar maple foliage. The uncertainty in individual tree predictions was large for plots with few trees; for plots with 30 trees or more, the uncertainty in individuals was less important than the uncertainty in the mean. Authors of previously published analyses have reanalyzed their data to show the magnitude of these two sources of uncertainty in scales ranging from experimental plots to entire countries. The most correct analysis will take both sources of uncertainty into account, but for practical purposes, country-level reports of uncertainty in carbon stocks, as required by the IPCC, can ignore the uncertainty in individuals. Ignoring the uncertainty in the mean will lead to exaggerated estimates of confidence in estimates of forest biomass and carbon and nutrient contents.
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
2017-03-01
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
Peng, Xiaobang; Thevathasan, Naresh V; Gordon, Andrew M; Mohammed, Idris; Gao, Pengxiang
2015-01-01
In order to study the effect of light competition and microclimatic modifications on the net assimilation (NA), growth and yield of soybean (Glycine max L.) as an understory crop, three 26-year-old soybean-tree (Acer saccharinum Marsh., Populus deltoides X nigra, Juglans nigra L.) intercropping systems were examined. Tree competition reduced photosynthetically active radiation (PAR) incident on soybeans and reduced net assimilation, growth and yield of soybean. Soil moisture of 20 cm depth close (< 3 m) to the tree rows was also reduced. Correlation analysis showed that NA and soil water content were highly correlated with growth and yield of soybean. When compared with the monoculture soybean system, the relative humidity (RH) of the poplar-soybean, silver maple-soybean, and black walnut-soybean intercropped systems was increased by 7.1%, 8.0% and 5.9%, soil water content was reduced by 37.8%, 26.3% and 30.9%, ambient temperature was reduced by 1.3°C, 1.4°C and 1.0°C, PAR was reduced by 53.6%, 57.9% and 39.9%, and air CO2 concentration was reduced by 3.7μmol·mol(-1), 4.2μmol·mol(-1) and 2.8μmol·mol(-1), respectively. Compared to the monoculture, the average NA of soybean in poplar, maple and walnut treatments was also reduced by 53.1%, 67.5% and 46.5%, respectively. Multivariate stepwise regression analysis showed that PAR, ambient temperature and CO2 concentration were the dominant factors influencing net photosynthetic rate.
Peng, Xiaobang; Thevathasan, Naresh V.; Gordon, Andrew M.; Mohammed, Idris; Gao, Pengxiang
2015-01-01
In order to study the effect of light competition and microclimatic modifications on the net assimilation (NA), growth and yield of soybean (Glycine max L.) as an understory crop, three 26-year-old soybean-tree (Acer saccharinum Marsh., Populus deltoides X nigra, Juglans nigra L.) intercropping systems were examined. Tree competition reduced photosynthetically active radiation (PAR) incident on soybeans and reduced net assimilation, growth and yield of soybean. Soil moisture of 20 cm depth close (< 3 m) to the tree rows was also reduced. Correlation analysis showed that NA and soil water content were highly correlated with growth and yield of soybean. When compared with the monoculture soybean system, the relative humidity (RH) of the poplar-soybean, silver maple-soybean, and black walnut-soybean intercropped systems was increased by 7.1%, 8.0% and 5.9%, soil water content was reduced by 37.8%, 26.3% and 30.9%, ambient temperature was reduced by 1.3°C, 1.4°C and 1.0°C, PAR was reduced by 53.6%, 57.9% and 39.9%, and air CO2 concentration was reduced by 3.7μmol·mol-1, 4.2μmol·mol-1 and 2.8μmol·mol-1, respectively. Compared to the monoculture, the average NA of soybean in poplar, maple and walnut treatments was also reduced by 53.1%, 67.5% and 46.5%, respectively. Multivariate stepwise regression analysis showed that PAR, ambient temperature and CO2 concentration were the dominant factors influencing net photosynthetic rate. PMID:26053375
Möckel, Martin; Muller, Reinhold; Searle, Julia; Slagman, Anna; De Bruyne, Bernard; Serruys, Patrick; Weisz, Giora; Xu, Ke; Holert, Fabian; Müller, Christian; Maehara, Akiko; Stone, Gregg W
2015-10-01
In the Providing Regional Observations to Study Predictors of Events in the Coronary Tree (PROSPECT) study, plaque burden, plaque composition, and minimal luminal area were associated with an increased risk of adverse cardiovascular events arising from untreated atherosclerotic lesions (vulnerable plaques) in patients with acute coronary syndromes (ACS). We sought to evaluate the utility of biomarker profiling and clinical risk factors to predict 3-year all-cause and nonculprit lesion-related major adverse cardiac events (MACEs). Of 697 patients who underwent successful percutaneous coronary intervention (PCI) for ACS, an array of 28 baseline biomarkers was analyzed. Median follow-up was 3.4 years. Beta2-microglobulin displayed the strongest predictive power of all variables assessed for all-cause and nonculprit lesion-related MACE. In a classification and regression tree analysis, patients with beta2-microglobulin >1.92 mg/L had an estimated 28.7% 3-year incidence of all-cause MACE; C-peptide <1.32 ng/ml was associated with a further increase in MACE to 51.2%. In a classification and regression tree analysis for untreated nonculprit lesion-related MACE, beta2-microglobulin >1.92 mg/L identified a cohort with a 3-year rate of 18.5%, and C-peptide <2.22 ng/ml was associated with a further increase to 25.5%. By multivariable analysis, beta2-microglobulin was the strongest predictor of all-cause and nonculprit MACE during follow-up. High-density lipoprotein (HDL), transferrin, and history of angina pectoris were also independent predictors of all-cause MACE, and HDL was an independent predictor of nonculprit MACE. In conclusion, in the PROSPECT study, beta2-microglobulin strongly predicted all-cause and nonculprit lesion-related MACE within 3 years after PCI in ACS. C-peptide and HDL provided further risk stratification to identify angiographically mild nonculprit lesions prone to future MACE. Copyright © 2015 Elsevier Inc. All rights reserved.
Remote sensing of species diversity using Landsat 8 spectral variables
NASA Astrophysics Data System (ADS)
Madonsela, Sabelo; Cho, Moses Azong; Ramoelo, Abel; Mutanga, Onisimo
2017-11-01
The application of remote sensing in biodiversity estimation has largely relied on the Normalized Difference Vegetation Index (NDVI). The NDVI exploits spectral information from red and near infrared bands of Landsat images and it does not consider canopy background conditions hence it is affected by soil brightness which lowers its sensitivity to vegetation. As such NDVI may be insufficient in explaining tree species diversity. Meanwhile, the Landsat program also collects essential spectral information in the shortwave infrared (SWIR) region which is related to plant properties. The study was intended to: (i) explore the utility of spectral information across Landsat-8 spectrum using the Principal Component Analysis (PCA) and estimate alpha diversity (α-diversity) in the savannah woodland in southern Africa, and (ii) define the species diversity index (Shannon (H‧), Simpson (D2) and species richness (S) - defined as number of species in a community) that best relates to spectral variability on the Landsat-8 Operational Land Imager dataset. We designed 90 m × 90 m field plots (n = 71) and identified all trees with a diameter at breast height (DbH) above 10 cm. H‧, D2 and S were used to quantify tree species diversity within each plot and the corresponding spectral information on all Landsat-8 bands were extracted from each field plot. A stepwise linear regression was applied to determine the relationship between species diversity indices (H‧, D2 and S) and Principal Components (PCs), vegetation indices and Gray Level Co-occurrence Matrix (GLCM) texture layers with calibration (n = 46) and test (n = 23) datasets. The results of regression analysis showed that the Simple Ratio Index derivative had a higher relationship with H‧, D2 and S (r2= 0.36; r2= 0.41; r2= 0.24 respectively) compared to NDVI, EVI, SAVI or their derivatives. Moreover the Landsat-8 derived PCs also had a higher relationship with H‧ and D2 (r2 of 0.36 and 0.35 respectively) than the frequently used NDVI, and this was attributed to the utilization of the entire spectral content of Landsat-8 data. Our results indicate that: (i) the measurement scales of vegetation indices impact their sensitivity to vegetation characteristics and their ability to explain tree species diversity; (ii) principal components enhance the utility of Landsat-8 spectral data for estimating tree species diversity and (iii) species diversity indices that consider both species richness and abundance (H‧ and D2) relates better with Landsat-8 spectral variables.
Olive Actual "on Year" Yield Forecast Tool Based on the Tree Canopy Geometry Using UAS Imagery.
Sola-Guirado, Rafael R; Castillo-Ruiz, Francisco J; Jiménez-Jiménez, Francisco; Blanco-Roldan, Gregorio L; Castro-Garcia, Sergio; Gil-Ribes, Jesus A
2017-07-30
Olive has a notable importance in countries of Mediterranean basin and its profitability depends on several factors such as actual yield, production cost or product price. Actual "on year" Yield (AY) is production (kg tree -1 ) in "on years", and this research attempts to relate it with geometrical parameters of the tree canopy. Regression equation to forecast AY based on manual canopy volume was determined based on data acquired from different orchard categories and cultivars during different harvesting seasons in southern Spain. Orthoimages were acquired with unmanned aerial systems (UAS) imagery calculating individual crown for relating to canopy volume and AY. Yield levels did not vary between orchard categories; however, it did between irrigated orchards (7000-17,000 kg ha -1 ) and rainfed ones (4000-7000 kg ha -1 ). After that, manual canopy volume was related with the individual crown area of trees that were calculated by orthoimages acquired with UAS imagery. Finally, AY was forecasted using both manual canopy volume and individual tree crown area as main factors for olive productivity. AY forecast only by using individual crown area made it possible to get a simple and cheap forecast tool for a wide range of olive orchards. Finally, the acquired information was introduced in a thematic map describing spatial AY variability obtained from orthoimage analysis that may be a powerful tool for farmers, insurance systems, market forecasts or to detect agronomical problems.
Soil morphology of canopy and intercanopy sites in a pinon-Juniper woodland
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davenport, D.W.; Wilcox, B.P.; Breshear, D.D.
1996-11-01
Pinon-juniper woodlands in the semiarid western USA have expanded as much as fivefold during the last 150 yr, often accompanied by losses of understory vegetation and increasing soil erosion. We conducted this study to determine the differences in soil morphology between canopy and intercanopy locations within a pinon (Pinus edulis Engelm.)-juniper [Juniperus monosperma (Engelm.) Sarg.] woodland with uniform parent material, topography, and climate. The woodland studied, located near Los Alamos, NM, has a mean tree age of 135 yr. We examined soil morphology by augering 135 profiles in a square grid pattern and comparing soils under pinon and juniper canopiesmore » with intercanopy soils. Only two of the 17 morphological properties compared showed significant differences. The B horizons make up a slightly greater proportion of total profile thickness in intercanopy soils, and there are higher percentages of coarse fragments in the lower portions of canopy soil profiles. Canopy soils have lower mean pH and higher mean organic C than intercanopy soils. Regression analysis showed that most soil properties did not closely correspond with tree size, but total soil thickness and B horizon thickness are significantly greater under the largest pinon trees, and soil reaction is lower under the largest juniper trees. Our findings suggest that during the period in which pinon-juniper woodlands have been expanding, the trees have had only minor effects on soil morphology. 36 refs., 4 figs., 4 tabs.« less
Economic injury level of the psyllid, Agonoscena pistaciae, on Pistachio, Pistacia vera cv. Ohadi.
Reza Hassani, Mohammad; Nouri-Ganbalani, Gadir; Izadi, Hamzeh; Shojai, Mahmoud; Basirat, Mehdi
2009-01-01
The pistachio psylla, Agonoscena pistaciae Burckhardt and Lauterer (Hemiptera: Psyllidae) is a major pest of pistachio trees, Pistacia vera L. (Sapindalis: Anacardiaceae) throughout pistachio-producing regions in Iran. Different density levels of A. pistaciae nymphs were maintained on pistachio trees by different insecticide dosages to evaluate the relationship between nymph density and yield loss (weight of 1000 nuts). Psylla nymph densities were monitored weekly by counting nymphs on pistachio terminal leaflets. There was a significant reduction in weight of 1000 nuts as seasonal averages of nymphs increased. Regression analysis was used to determine the relationship between nymph density and weight of 1000 nuts. The economic injury levels varied as a function of market values, management costs, insecticide efficiency and yield loss rate and ranged from 7.7 to 30.7 nymphal days per terminal leaflet, based on weight of 1000 nuts.
Economic Injury Level of the Psyllid, Agonoscena pistaciae, on Pistachio, Pistacia vera cv. Ohadi
Reza Hassani, Mohammad; Nouri-Ganbalani, Gadir; Izadi, Hamzeh; Basirat, Mehdi
2009-01-01
The pistachio psylla, Agonoscena pistaciae Burckhardt and Lauterer (Hemiptera: Psyllidae) is a major pest of pistachio trees, Pistacia vera L. (Sapindalis: Anacardiaceae) throughout pistachio-producing regions in Iran. Different density levels of A. pistaciae nymphs were maintained on pistachio trees by different insecticide dosages to evaluate the relationship between nymph density and yield loss (weight of 1000 nuts). Psylla nymph densities were monitored weekly by counting nymphs on pistachio terminal leaflets. There was a significant reduction in weight of 1000 nuts as seasonal averages of nymphs increased. Regression analysis was used to determine the relationship between nymph density and weight of 1000 nuts. The economic injury levels varied as a function of market values, management costs, insecticide efficiency and yield loss rate and ranged from 7.7 to 30.7 nymphal days per terminal leaflet, based on weight of 1000 nuts. PMID:19619034
Future credible precipitation occurrences in Los Alamos, New Mexico
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abeele, W.V.
1980-09-01
I have studied many factors thought to have influenced past climatic change. Because they might recur, they are possible suspects for future climatic alterations. Most of these factors are totally unpredictable; therefore, they cast a shadow on the validity of derived climatic predictions. Changes in atmospheric conditions and in continental surfaces, variations in solar radiation, and in the earth's orbit around the sun are among the influential mechanisms investigated. Even when models are set up that include the above parameters, their reliability will depend on unpredictable variables totally alien to the model (like volcanic eruptions). Based on climatic records, however,more » maximum precipitation amounts have been calculated for different probability levels. These seem to correspond well to past precipitation occurrences, derived from tree ring indices. The link between tree ring indices and local climate has been established through regression analysis.« less
Fabian C.C. Uzoh; Martin W. Ritchie
1996-01-01
The equations presented predict crown area for 13 species of trees and shrubs which may be found growing in competition with commercial conifers during early stages of stand development. The equations express crown area as a function of basal area and height. Parameters were estimated for each species individually using weighted nonlinear least square regression.
Biomass equations for major tree species of the Northeast
Louise M. Tritton; James W. Hornbeck
1982-01-01
Regression equations are used in both forestry and ecosystem studies to estimate tree biomass from field measurements of dbh (diameter at breast height) or a combination of dbh and height. Literature on biomass is reviewed, and 178 sets of publish equation for 25 species common to the Northeastern Unites States are listed. On the basis of these equations, estimates of...
Stand basal-area and tree-diameter growth in red spruce-fir forests in Maine, 1960-80
S.J. Zarnoch; D.A. Gansner; D.S. Powell; T.A. Birch; T.A. Birch
1990-01-01
Stand basal-area change and individual surviving red spruce d.b.h. growth from 1960 to 1980 were analyzed for red spruce-fir stands in Maine. Regression modeling was used to relate these measures of growth to stand and tree conditions and to compare growth throughout the period. Results indicate a decline in growth.
Examination of the Arborsonic Decay Detector for Detecting Bacterial Wetwood in Red Oaks
Zicai Xu; Theodor D. Leininger; James G. Williams; Frank H. Tainter
2000-01-01
The Arborsonic Decay Detector (ADD; Fujikura Europe Limited, Wiltshire, England) was used to measure the time it took an ultrasound wave to cross 280 diameters in red oak trees with varying degrees of bacterial wetwood or heartwood decay. Linear regressions derived from the ADD readings of trees in Mississippi and South Carolina with wetwood and heartwood decay...
NASA Astrophysics Data System (ADS)
Kisi, Ozgur; Parmar, Kulwinder Singh
2016-03-01
This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
Ecological Factors of Being Bullied Among Adolescents: a Classification and Regression Tree Approach
Moon, Sung Seek; Kim, Heeyoung; Seay, Kristen; Small, Eusebius; Kim, Youn Kyoung
2015-01-01
Being bullied is a well-recognized trauma for adolescents. Bullying can best be understood through an ecological framework since bullying or being bullied involves risk factors at multiple contextual levels. The purpose of the study was to identify the risk and protective factors that best differentiate groups along with the outcome variable of interest (being bullied) using Classification and Regression Tree (CART) analysis. The study used the Health Behavior in School-Aged Children (HBSC) data collected from a nationally representative sample of students in grades six through ten during the 2005–2006 school years. This study identified that for adolescents 12 and younger, lower parental support is a critical risk factor associated with bullying and among those 13 to 14 with lower parent support, adolescent with higher academic pressure reported experiencing more bullying. For the older group of adolescents (aged 15 and older), school related factors were identified to increase the risk level of being bullied. There was a critical age (15 years old) for implementing victimization interventions to reduce the damage from being bullied. Service providers working with adolescents aged 14 and less should focus more on family-oriented intervention and those working with adolescents aged 15 and more should offer peer- or school-related interventions. PMID:27617043
Ostrand, William D.; Gotthardt, Tracey A.; Howlin, Shay; Robards, Martin D.
2005-01-01
We modeled habitat selection by Pacific sand lance (Ammodytes hexapterus) by examining their distribution in relation to water depth, distance to shore, bottom slope, bottom type, distance from sand bottom, and shoreline type. Through both logistic regression and classification tree models, we compared the characteristics of 29 known sand lance locations to 58 randomly selected sites. The best models indicated a strong selection of shallow water by sand lance, with weaker association between sand lance distribution and beach shorelines, sand bottoms, distance to shore, bottom slope, and distance to the nearest sand bottom. We applied an information-theoretic approach to the interpretation of the logistic regression analysis and determined importance values of 0.99, 0.54, 0.52, 0.44, 0.39, and 0.25 for depth, beach shorelines, sand bottom, distance to shore, gradual bottom slope, and distance to the nearest sand bottom, respectively. The classification tree model indicated that sand lance selected shallow-water habitats and remained near sand bottoms when located in habitats with depths between 40 and 60 m. All sand lance locations were at depths <60 m and 93% occurred at depths <40 m. Probable reasons for the modeled relationships between the distribution of sand lance and the independent variables are discussed.
Avelino, Jacques; Cabut, Sandrine; Barboza, Bernardo; Barquero, Miguel; Alfaro, Ronny; Esquivel, César; Durand, Jean-François; Cilas, Christian
2007-12-01
ABSTRACT We monitored the development of American leaf spot of coffee, a disease caused by the gemmiferous fungus Mycena citricolor, in 57 plots in Costa Rica for 1 or 2 years in order to gain a clearer understanding of conditions conducive to the disease and improve its control. During the investigation, characteristics of the coffee trees, crop management, and the environment were recorded. For the analyses, we used partial least-squares regression via the spline functions (PLSS), which is a nonlinear extension to partial least-squares regression (PLS). The fungus developed well in areas located between approximately 1,100 and 1,550 m above sea level. Slopes were conducive to its development, but eastern-facing slopes were less affected than the others, probably because they were more exposed to sunlight, especially in the rainy season. The distance between planting rows, the shade percentage, coffee tree height, the type of shade, and the pruning system explained disease intensity due to their effects on coffee tree shading and, possibly, on the humidity conditions in the plot. Forest trees and fruit trees intercropped with coffee provided particularly propitious conditions. Apparently, fertilization was unfavorable for the disease, probably due to dilution phenomena associated with faster coffee tree growth. Finally, series of wet spells interspersed with dry spells, which were frequent in the middle of the rainy season, were critical for the disease, probably because they affected the production and release of gemmae and their viability. These results could be used to draw up a map of epidemic risks taking topographical factors into account. To reduce those risks and improve chemical control, our results suggested that farmers should space planting rows further apart, maintain light shading in the plantation, and prune their coffee trees.
NASA Astrophysics Data System (ADS)
Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph
2018-03-01
Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.
Chilling and heat requirements for flowering in temperate fruit trees
NASA Astrophysics Data System (ADS)
Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike
2014-08-01
Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut ( Castanea mollissima Blume) and jujube ( Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing's cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.
Chilling and heat requirements for flowering in temperate fruit trees.
Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike
2014-08-01
Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut (Castanea mollissima Blume) and jujube (Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing’s cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.
NASA Astrophysics Data System (ADS)
Bouffon, T.; Rice, R.; Bales, R.
2006-12-01
The spatial distributions of snow water equivalent (SWE) and snow depth within a 1, 4, and 16 km2 grid element around two automated snow pillows in a forested and open- forested region of the Upper Merced River Basin (2,800 km2) of Yosemite National Park were characterized using field observations and analyzed using binary regression trees. Snow surveys occurred at the forested site during the accumulation and ablation seasons, while at the open-forest site a survey was performed only during the accumulation season. An average of 130 snow depth and 7 snow density measurements were made on each survey, within the 4 km2 grid. Snow depth was distributed using binary regression trees and geostatistical methods using the physiographic parameters (e.g. elevation, slope, vegetation, aspect). Results in the forest region indicate that the snow pillow overestimated average SWE within the 1, 4, and 16 km2 areas by 34 percent during ablation, but during accumulation the snow pillow provides a good estimate of the modeled mean SWE grid value, however it is suspected that the snow pillow was underestimating SWE. However, at the open forest site, during accumulation, the snow pillow was 28 percent greater than the mean modeled grid element. In addition, the binary regression trees indicate that the independent variables of vegetation, slope, and aspect are the most influential parameters of snow depth distribution. The binary regression tree and multivariate linear regression models explain about 60 percent of the initial variance for snow depth and 80 percent for density, respectively. This short-term study provides motivation and direction for the installation of a distributed snow measurement network to fill the information gap in basin-wide SWE and snow depth measurements. Guided by these results, a distributed snow measurement network was installed in the Fall 2006 at Gin Flat in the Upper Merced River Basin with the specific objective of measuring accumulation and ablation across topographic variables with the aim of providing guidance for future larger scale observation network designs.
Sewell, Justin L.; Kushel, Margot B.; Inadomi, John M.; Yee, Hal F.
2009-01-01
Goals We sought to identify factors associated with gastroenterology clinic attendance in an urban safety net healthcare system. Background Missed clinic appointments reduce the efficiency and availability of healthcare, but subspecialty clinic attendance among patients with established healthcare access has not been studied. Study We performed an observational study using secondary data from administrative sources to study patients referred to, and scheduled for an appointment in, the adult gastroenterology clinic serving the safety net healthcare system of San Francisco, California. Our dependent variable was whether subjects attended or missed a scheduled appointment. Analysis included multivariable logistic regression and classification tree analysis. 1,833 patients were referred and scheduled for an appointment between 05/2005 and 08/2006. Prisoners were excluded. All patients had a primary care provider. Results 683 patients (37.3%) missed their appointment; 1,150 (62.7%) attended. Language was highly associated with attendance in the logistic regression; non-English speakers were less likely than English speakers to miss an appointment (adjusted odds ratio 0.42 [0.28,0.63] for Spanish, 0.56 [0.38,0.82] for Asian language, p < 0.001). Other factors were also associated with attendance, but classification tree analysis identified language to be the most highly associated variable. Conclusions In an urban safety net healthcare population, among patients with established healthcare access and a scheduled gastroenterology clinic appointment, not speaking English was most strongly associated with higher attendance rates. Patient related factors associated with not speaking English likely influence subspecialty clinic attendance rates, and these factors may differ from those affecting general healthcare access. PMID:19169147
Mulder, V.L.; Plotze, Michael; de Bruin, Sytze; Schaepman, Michael E.; Mavris, C.; Kokaly, Raymond F.; Egli, Markus
2013-01-01
This paper presents a methodology for assessing mineral abundances of mixtures having more than two constituents using absorption features in the 2.1-2.4 μm wavelength region. In the first step, the absorption behaviour of mineral mixtures is parameterised by exponential Gaussian optimisation. Next, mineral abundances are predicted by regression tree analysis using these parameters as inputs. The approach is demonstrated on a range of prepared samples with known abundances of kaolinite, dioctahedral mica, smectite, calcite and quartz and on a set of field samples from Morocco. The latter contained varying quantities of other minerals, some of which did not have diagnostic absorption features in the 2.1-2.4 μm region. Cross validation showed that the prepared samples of kaolinite, dioctahedral mica, smectite and calcite were predicted with a root mean square error (RMSE) less than 9 wt.%. For the field samples, the RMSE was less than 8 wt.% for calcite, dioctahedral mica and kaolinite abundances. Smectite could not be well predicted, which was attributed to spectral variation of the cations within the dioctahedral layered smectites. Substitution of part of the quartz by chlorite at the prediction phase hardly affected the accuracy of the predicted mineral content; this suggests that the method is robust in handling the omission of minerals during the training phase. The degree of expression of absorption components was different between the field sample and the laboratory mixtures. This demonstrates that the method should be calibrated and trained on local samples. Our method allows the simultaneous quantification of more than two minerals within a complex mixture and thereby enhances the perspectives of spectral analysis for mineral abundances.
Liu, Pei-Yang
2014-01-01
Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904
Data mining for rapid prediction of facility fit and debottlenecking of biomanufacturing facilities.
Yang, Yang; Farid, Suzanne S; Thornhill, Nina F
2014-06-10
Higher titre processes can pose facility fit challenges in legacy biopharmaceutical purification suites with capacities originally matched to lower titre processes. Bottlenecks caused by mismatches in equipment sizes, combined with process fluctuations upon scale-up, can result in discarding expensive product. This paper describes a data mining decisional tool for rapid prediction of facility fit issues and debottlenecking of biomanufacturing facilities exposed to batch-to-batch variability and higher titres. The predictive tool comprised advanced multivariate analysis techniques to interrogate Monte Carlo stochastic simulation datasets that mimicked batch fluctuations in cell culture titres, step yields and chromatography eluate volumes. A decision tree classification method, CART (classification and regression tree) was introduced to explore the impact of these process fluctuations on product mass loss and reveal the root causes of bottlenecks. The resulting pictorial decision tree determined a series of if-then rules for the critical combinations of factors that lead to different mass loss levels. Three different debottlenecking strategies were investigated involving changes to equipment sizes, using higher capacity chromatography resins and elution buffer optimisation. The analysis compared the impact of each strategy on mass output, direct cost of goods per gram and processing time, as well as consideration of extra capital investment and space requirements. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan
2012-01-01
In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.
NASA Astrophysics Data System (ADS)
Lawrence, R.; Landenburger, L.; Jewett, J.
2007-12-01
Whitebark pine seeds have long been identified as the most significant vegetative food source for grizzly bears in the Greater Yellowstone Ecosystem (GYE) and, hence, a crucial element of suitable grizzly bear habitat. The overall health and status of whitebark pine in the GYE is currently threatened by mountain pine beetle infestations and the spread of whitepine blister rust. Whitebark pine distribution (presence/absence) was mapped for the GYE using Landsat 7 Enhanced Thematic Mapper (ETM+) imagery and topographic data as part of a long-term inter-agency monitoring program. Logistic regression was compared with classification tree analysis (CTA) with and without boosting. Overall comparative classification accuracies for the central portion of the GYE covering three ETM+ images along a single path ranged from 91.6% using logistic regression to 95.8% with See5's CTA algorithm with the maximum 99 boosts. The analysis is being extended to the entire northern Rocky Mountain Ecosystem and extended over decadal time scales. The analysis is being extended to the entire northern Rocky Mountain Ecosystem and extended over decadal time scales.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Influence of shifting cultivation practices on soil-plant-beetle interactions.
Ibrahim, Kalibulla Syed; Momin, Marcy D; Lalrotluanga, R; Rosangliana, David; Ghatak, Souvik; Zothansanga, R; Kumar, Nachimuthu Senthil; Gurusubramanian, Guruswami
2016-08-01
Shifting cultivation (jhum) is a major land use practice in Mizoram. It was considered as an eco-friendly and efficient method when the cycle duration was long (15-30 years), but it poses the problem of land degradation and threat to ecology when shortened (4-5 years) due to increased intensification of farming systems. Studying beetle community structure is very helpful in understanding how shifting cultivation affects the biodiversity features compared to natural forest system. The present study examines the beetle species diversity and estimates the effects of shifting cultivation practices on the beetle assemblages in relation to change in tree species composition and soil nutrients. Scarabaeidae and Carabidae were observed to be the dominant families in the land use systems studied. Shifting cultivation practice significantly (P < 0.05) affected the beetle and tree species diversity as well as the soil nutrients as shown by univariate (one-way analysis of variance (ANOVA), correlation and regression, diversity indices) and multivariate (cluster analysis, principal component analysis (PCA), detrended correspondence analysis (DCA), canonical variate analysis (CVA), permutational multivariate analysis of variance (PERMANOVA), permutational multivariate analysis of dispersion (PERMDISP)) statistical analyses. Besides changing the tree species composition and affecting the soil fertility, shifting cultivation provides less suitable habitat conditions for the beetle species. Bioindicator analysis categorized the beetle species into forest specialists, anthropogenic specialists (shifting cultivation habitat specialist), and habitat generalists. Molecular analysis of bioindicator beetle species was done using mitochondrial cytochrome oxidase subunit I (COI) marker to validate the beetle species and describe genetic variation among them in relation to heterogeneity, transition/transversion bias, codon usage bias, evolutionary distance, and substitution pattern. The present study revealed the fact that shifting cultivation practice significantly affects the beetle species in terms of biodiversity pattern as well as evolutionary features. Spatiotemporal assessment of soil-plant-beetle interactions in shifting cultivation system and their influence in land degradation and ecology will be helpful in making biodiversity conservation decisions in the near future.
[The Application of the Fault Tree Analysis Method in Medical Equipment Maintenance].
Liu, Hongbin
2015-11-01
In this paper, the traditional fault tree analysis method is presented, detailed instructions for its application characteristics in medical instrument maintenance is made. It is made significant changes when the traditional fault tree analysis method is introduced into the medical instrument maintenance: gave up the logic symbolic, logic analysis and calculation, gave up its complicated programs, and only keep its image and practical fault tree diagram, and the fault tree diagram there are also differences: the fault tree is no longer a logical tree but the thinking tree in troubleshooting, the definition of the fault tree's nodes is different, the composition of the fault tree's branches is also different.
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research
ERIC Educational Resources Information Center
He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne
2018-01-01
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
Tree allometry and improved estimation of carbon stocks and balance in tropical forests.
Chave, J; Andalo, C; Brown, S; Cairns, M A; Chambers, J Q; Eamus, D; Fölster, H; Fromard, F; Higuchi, N; Kira, T; Lescure, J-P; Nelson, B W; Ogawa, H; Puig, H; Riéra, B; Yamakura, T
2005-08-01
Tropical forests hold large stores of carbon, yet uncertainty remains regarding their quantitative contribution to the global carbon cycle. One approach to quantifying carbon biomass stores consists in inferring changes from long-term forest inventory plots. Regression models are used to convert inventory data into an estimate of aboveground biomass (AGB). We provide a critical reassessment of the quality and the robustness of these models across tropical forest types, using a large dataset of 2,410 trees >or= 5 cm diameter, directly harvested in 27 study sites across the tropics. Proportional relationships between aboveground biomass and the product of wood density, trunk cross-sectional area, and total height are constructed. We also develop a regression model involving wood density and stem diameter only. Our models were tested for secondary and old-growth forests, for dry, moist and wet forests, for lowland and montane forests, and for mangrove forests. The most important predictors of AGB of a tree were, in decreasing order of importance, its trunk diameter, wood specific gravity, total height, and forest type (dry, moist, or wet). Overestimates prevailed, giving a bias of 0.5-6.5% when errors were averaged across all stands. Our regression models can be used reliably to predict aboveground tree biomass across a broad range of tropical forests. Because they are based on an unprecedented dataset, these models should improve the quality of tropical biomass estimates, and bring consensus about the contribution of the tropical forest biome and tropical deforestation to the global carbon cycle.
Vegetation Continuous Fields--Transitioning from MODIS to VIIRS
NASA Astrophysics Data System (ADS)
DiMiceli, C.; Townshend, J. R.; Sohlberg, R. A.; Kim, D. H.; Kelly, M.
2015-12-01
Measurements of fractional vegetation cover are critical for accurate and consistent monitoring of global deforestation rates. They also provide important parameters for land surface, climate and carbon models and vital background data for research into fire, hydrological and ecosystem processes. MODIS Vegetation Continuous Fields (VCF) products provide four complementary layers of fractional cover: tree cover, non-tree vegetation, bare ground, and surface water. MODIS VCF products are currently produced globally and annually at 250m resolution for 2000 to the present. Additionally, annual VCF products at 1/20° resolution derived from AVHRR and MODIS Long-Term Data Records are in development to provide Earth System Data Records of fractional vegetation cover for 1982 to the present. In order to provide continuity of these valuable products, we are extending the VCF algorithms to create Suomi NPP/VIIRS VCF products. This presentation will highlight the first VIIRS fractional cover product: global percent tree cover at 1 km resolution. To create this product, phenological and physiological metrics were derived from each complete year of VIIRS 8-day surface reflectance products. A supervised regression tree method was applied to the metrics, using training derived from Landsat data supplemented by high-resolution data from Ikonos, RapidEye and QuickBird. The regression tree model was then applied globally to produce fractional tree cover. In our presentation we will detail our methods for creating the VIIRS VCF product. We will compare the new VIIRS VCF product to our current MODIS VCF products and demonstrate continuity between instruments. Finally, we will outline future VIIRS VCF development plans.
Eric J. Gustafson
2014-01-01
Regression models developed in the upper Midwest (United States) to predict drought-induced tree mortality from measures of drought (Palmer Drought Severity Index) were tested in the northeastern United States and found inadequate. The most likely cause of this result is that long drought events were rare in the Northeast during the period when inventory data were...
W. Henry McNab; David L. Loftis; Callie J. Schweitzer; Raymond Sheffield
2004-01-01
We used tree indicator species occurring on 438 plots in the Plateau counties of Tennessee to test the uniqueness of four conterminous ecoregions. Multinomial logistic regression indicated that the presence of 14 tree species allowed classification of sample plots according to ecoregion with an average overall accuracy of 75 percent (range 45 to 94 percent). Additional...
Dating tree mortality using log decay in the White Mountains of New Hampshire
Andrew J. Fast; Mark J. Ducey; Jeffrey H. Gove; William B. Leak
2008-01-01
Coarse woody material (CWM) is an important component of forest ecosystems. To meet specific CWM management objectives, it is important to understand rates of decay. We present results from a silvicultural trial at the Bartlett Experimental Forest, in which time of death is known for a large sample of trees. Either a simple table or regression equations that use...
Development of post-fire crown damage mortality thresholds in ponderosa pine
James F. Fowler; Carolyn Hull Sieg; Joel McMillin; Kurt K. Allen; Jose F. Negron; Linda L. Wadleigh; John A. Anhold; Ken E. Gibson
2010-01-01
Previous research has shown that crown scorch volume and crown consumption volume are the major predictors of post-fire mortality in ponderosa pine. In this study, we use piecewise logistic regression models of crown scorch data from 6633 trees in five wildfires from the Intermountain West to locate a mortality threshold at 88% scorch by volume for trees with no crown...
Scott M. Bretthauer; George Z. Gertner; Gary L. Rolfe; Jeffery O. Dawson
2003-01-01
Tree species diversity increases and dominance decreases with proximity to forest border in two 60-year-old successional forest stands developed on abandoned agricultural land in Piatt County, Illinois. A regression equation allowed us to quantify an increase in diversity with closeness to forest border for one of the forest stands. Shingle oak is the most dominant...
Regeneration of Douglas-fir cutblocks on the Six Rivers National Forest in northwestern California
R. O. Strothmann
1979-01-01
A survey of 61 cutblocks planted since 1964 evaluated stocking of conifers (trees 1 foot tall or taller) on 2-milacre quadrats. Overall stocking percentage averaged 42.2 and ranged from 15 to 8 1. Overall number of trees per acre averaged 396. In the regression model, based on 36 cutblocks, better stocking was associated with high site class, northerly aspect,...
Goldkorn, Amir; Ely, Benjamin; Quinn, David I.; Tangen, Catherine M.; Fink, Louis M.; Xu, Tong; Twardowski, Przemyslaw; Van Veldhuizen, Peter J.; Agarwal, Neeraj; Carducci, Michael A.; Monk, J. Paul; Datar, Ram H.; Garzotto, Mark; Mack, Philip C.; Lara, Primo; Higano, Celestia S.; Hussain, Maha; Thompson, Ian Murchie; Cote, Richard J.; Vogelzang, Nicholas J.
2014-01-01
Purpose Circulating tumor cell (CTC) enumeration has not been prospectively validated in standard first-line docetaxel treatment for metastatic castration-resistant prostate cancer. We assessed the prognostic value of CTCs for overall survival (OS) and disease response in S0421, a phase III trial of docetaxel plus prednisone with or without atrasentan. Patients and Methods CTCs were enumerated at baseline (day 0) and before cycle two (day 21) using CellSearch. Baseline counts and changes in counts from day 0 to 21 were evaluated for association with OS, prostate-specific antigen (PSA), and RECIST response using Cox regression as well as receiver operator characteristic (ROC) curves, integrated discrimination improvement (IDI) analysis, and regression trees. Results Median day-0 CTC count was five cells per 7.5 mL, and CTCs < versus ≥ five per 7.5 mL were significantly associated with baseline PSA, bone pain, liver disease, hemoglobin, alkaline phosphatase, and subsequent PSA and RECIST response. Median OS was 26 months for < five versus 13 months for ≥ five CTCs per 7.5 mL at day 0 (hazard ratio [HR], 2.74 [adjusting for covariates]). ROC curves had higher areas under the curve for day-0 CTCs than for PSA, and IDI analysis showed that adding day-0 CTCs to baseline PSA and other covariates increased predictive accuracy for survival by 8% to 10%. Regression trees yielded new prognostic subgroups, and rising CTC count from day 0 to 21 was associated with shorter OS (HR, 2.55). Conclusion These data validate the prognostic utility of CTC enumeration in a large docetaxel-based prospective cohort. Baseline CTC counts were prognostic, and rising CTCs at 3 weeks heralded significantly worse OS, potentially serving as an early metric to help redirect and optimize therapy in this clinical setting. PMID:24616308
Goldkorn, Amir; Ely, Benjamin; Quinn, David I; Tangen, Catherine M; Fink, Louis M; Xu, Tong; Twardowski, Przemyslaw; Van Veldhuizen, Peter J; Agarwal, Neeraj; Carducci, Michael A; Monk, J Paul; Datar, Ram H; Garzotto, Mark; Mack, Philip C; Lara, Primo; Higano, Celestia S; Hussain, Maha; Thompson, Ian Murchie; Cote, Richard J; Vogelzang, Nicholas J
2014-04-10
Circulating tumor cell (CTC) enumeration has not been prospectively validated in standard first-line docetaxel treatment for metastatic castration-resistant prostate cancer. We assessed the prognostic value of CTCs for overall survival (OS) and disease response in S0421, a phase III trial of docetaxel plus prednisone with or without atrasentan. CTCs were enumerated at baseline (day 0) and before cycle two (day 21) using CellSearch. Baseline counts and changes in counts from day 0 to 21 were evaluated for association with OS, prostate-specific antigen (PSA), and RECIST response using Cox regression as well as receiver operator characteristic (ROC) curves, integrated discrimination improvement (IDI) analysis, and regression trees. Median day-0 CTC count was five cells per 7.5 mL, and CTCs < versus ≥ five per 7.5 mL were significantly associated with baseline PSA, bone pain, liver disease, hemoglobin, alkaline phosphatase, and subsequent PSA and RECIST response. Median OS was 26 months for < five versus 13 months for ≥ five CTCs per 7.5 mL at day 0 (hazard ratio [HR], 2.74 [adjusting for covariates]). ROC curves had higher areas under the curve for day-0 CTCs than for PSA, and IDI analysis showed that adding day-0 CTCs to baseline PSA and other covariates increased predictive accuracy for survival by 8% to 10%. Regression trees yielded new prognostic subgroups, and rising CTC count from day 0 to 21 was associated with shorter OS (HR, 2.55). These data validate the prognostic utility of CTC enumeration in a large docetaxel-based prospective cohort. Baseline CTC counts were prognostic, and rising CTCs at 3 weeks heralded significantly worse OS, potentially serving as an early metric to help redirect and optimize therapy in this clinical setting.
NASA Astrophysics Data System (ADS)
Kult, J. M.; Fry, L. M.; Gronewold, A. D.
2012-12-01
Methods for predicting streamflow in areas with limited or nonexistent measures of hydrologic response typically invoke the concept of regionalization, whereby knowledge pertaining to gauged catchments is transferred to ungauged catchments. In this study, we identify watershed physical characteristics acting as primary drivers of hydrologic response throughout the US portion of the Great Lakes basin. Relationships between watershed physical characteristics and hydrologic response are generated from 166 catchments spanning a variety of climate, soil, land cover, and land form regimes through regression tree analysis, leading to a grouping of watersheds exhibiting similar hydrologic response characteristics. These groupings are then used to predict response in ungauged watersheds in an uncertainty framework. Results from this method are assessed alongside one historical regionalization approach which, while simple, has served as a cornerstone of Great Lakes regional hydrologic research for several decades. Our approach expands upon previous research by considering multiple temporal characterizations of hydrologic response. Due to the substantial inter-annual and seasonal variability in hydrologic response observed over the Great Lakes basin, results from the regression tree analysis differ considerably depending on the level of temporal aggregation used to define the response. Specifically, higher levels of temporal aggregation for the response metric (for example, indices derived from long-term means of climate and streamflow observations) lead to improved watershed groupings with lower within-group variance. However, this perceived improvement in model skill occurs at the cost of understated uncertainty when applying the regression to time series simulations or as a basis for model calibration. In such cases, our results indicate that predictions based on long-term characterizations of hydrologic response can produce misleading conclusions when applied at shorter time steps. This study suggests that measures of hydrologic response quantified at these shorter time steps may provide a more robust basis for making predictions in applications of water resource management, model calibration and simulations, and human health and safety.
Zhong, Buqing; Liang, Tao; Wang, Lingqing; Li, Kexin
2014-08-15
An extensive soil survey was conducted to study pollution sources and delineate contamination of heavy metals in one of the metalliferous industrial bases, in the karst areas of southwest China. A total of 597 topsoil samples were collected and the concentrations of five heavy metals, namely Cd, As (metalloid), Pb, Hg and Cr were analyzed. Stochastic models including a conditional inference tree (CIT) and a finite mixture distribution model (FMDM) were applied to identify the sources and partition the contribution from natural and anthropogenic sources for heavy metal in topsoils of the study area. Regression trees for Cd, As, Pb and Hg were proved to depend mostly on indicators of anthropogenic activities such as industrial type and distance from urban area, while the regression tree for Cr was found to be mainly influenced by the geogenic characteristics. The FMDM analysis showed that the geometric means of modeled background values for Cd, As, Pb, Hg and Cr were close to their background values previously reported in the study area, while the contamination of Cd and Hg were widespread in the study area, imposing potentially detrimental effects on organisms through the food chain. Finally, the probabilities of single and multiple heavy metals exceeding the threshold values derived from the FMDM were estimated using indicator kriging (IK) and multivariate indicator kriging (MVIK). The high probabilities exceeding the thresholds of heavy metals were associated with metalliferous production and atmospheric deposition of heavy metals transported from the urban and industrial areas. Geostatistics coupled with stochastic models provide an effective way to delineate multiple heavy metal pollution to facilitate improved environmental management. Copyright © 2014 Elsevier B.V. All rights reserved.
Zigler, S.J.; Newton, T.J.; Steuer, J.J.; Bartsch, M.R.; Sauer, J.S.
2008-01-01
Interest in understanding physical and hydraulic factors that might drive distribution and abundance of freshwater mussels has been increasing due to their decline throughout North America. We assessed whether the spatial distribution of unionid mussels could be predicted from physical and hydraulic variables in a reach of the Upper Mississippi River. Classification and regression tree (CART) models were constructed using mussel data compiled from various sources and explanatory variables derived from GIS coverages. Prediction success of CART models for presence-absence of mussels ranged from 71 to 76% across three gears (brail, sled-dredge, and dive-quadrat) and 51% of the deviance in abundance. Models were largely driven by shear stress and substrate stability variables, but interactions with simple physical variables, especially slope, were also important. Geospatial models, which were based on tree model results, predicted few mussels in poorly connected backwater areas (e.g., floodplain lakes) and the navigation channel, whereas main channel border areas with high geomorphic complexity (e.g., river bends, islands, side channel entrances) and small side channels were typically favorable to mussels. Moreover, bootstrap aggregation of discharge-specific regression tree models of dive-quadrat data indicated that variables measured at low discharge were about 25% more predictive (PMSE = 14.8) than variables measured at median discharge (PMSE = 20.4) with high discharge (PMSE = 17.1) variables intermediate. This result suggests that episodic events such as droughts and floods were important in structuring mussel distributions. Although the substantial mussel and ancillary data in our study reach is unusual, our approach to develop exploratory statistical and geospatial models should be useful even when data are more limited. ?? 2007 Springer Science+Business Media B.V.
Patel, Rita B; Mathur, Maya B; Gould, Michael; Uyeki, Timothy M; Bhattacharya, Jay; Xiao, Yang; Khazeni, Nayer
2014-01-01
Human infections with highly pathogenic avian influenza (HPAI) A (H5N1) viruses have occurred in 15 countries, with high mortality to date. Determining risk factors for morbidity and mortality from HPAI H5N1 can inform preventive and therapeutic interventions. We included all cases of human HPAI H5N1 reported in World Health Organization Global Alert and Response updates and those identified through a systematic search of multiple databases (PubMed, Scopus, and Google Scholar), including articles in all languages. We abstracted predefined clinical and demographic predictors and mortality and used bivariate logistic regression analyses to examine the relationship of each candidate predictor with mortality. We developed and pruned a decision tree using nonparametric Classification and Regression Tree methods to create risk strata for mortality. We identified 617 human cases of HPAI H5N1 occurring between December 1997 and April 2013. The median age of subjects was 18 years (interquartile range 6-29 years) and 54% were female. HPAI H5N1 case-fatality proportion was 59%. The final decision tree for mortality included age, country, per capita government health expenditure, and delay from symptom onset to hospitalization, with an area under the receiver operator characteristic (ROC) curve of 0.81 (95% CI: 0.76-0.86). A model defined by four clinical and demographic predictors successfully estimated the probability of mortality from HPAI H5N1 illness. These parameters highlight the importance of early diagnosis and treatment and may enable early, targeted pharmaceutical therapy and supportive care for symptomatic patients with HPAI H5N1 virus infection.
Identifying pollution sources and predicting urban air quality using ensemble learning methods
NASA Astrophysics Data System (ADS)
Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali
2013-12-01
In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
Plieninger, Tobias; Levers, Christian; Mantel, Martin; Costa, Augusta; Schaich, Harald; Kuemmerle, Tobias
2015-01-01
Scattered trees support high levels of farmland biodiversity and ecosystem services in agricultural landscapes, but they are threatened by agricultural intensification, urbanization, and land abandonment. This study aimed to map and quantify the decline of orchard meadows (scattered fruit trees of high nature conservation value) for a region in Southwestern Germany for the 1968 2009 period and to identify the driving forces of this decline. We derived orchard meadow loss from 1968 and 2009 aerial images and used a boosted regression trees modelling framework to assess the relative importance of 18 environmental, demographic, and socio-economic variables to test five alternative hypothesis explaining orchard meadow loss. We found that orchard meadow loss occurred in flatter areas, in areas where smaller plot sizes and fragmented orchard meadows prevailed, and in areas near settlements and infrastructure. The analysis did not confirm that orchard meadow loss was higher in areas where agricultural intensification was stronger and in areas of lower implementation levels of conservation policies. Our results demonstrated that the influential drivers of orchard meadow loss were those that reduce economic profitability and increase opportunity costs for orchards, providing incentives for converting orchard meadows to other, more profitable land uses. These insights could be taken up by local- and regional-level conservation policies to identify the sites of persistent orchard meadows in agricultural landscapes that would be prioritized in conservation efforts. PMID:25932914
NASA Astrophysics Data System (ADS)
Togashi, Henrique; Prentice, Colin; Evans, Bradley; Forrester, David; Drake, Paul; Feikema, Paul; Brooksbank, Kim; Eamus, Derek; Taylor, Daniel
2014-05-01
The leaf area to sapwood area ratio (LA:SA) is a key plant trait that links photosynthesis to transpiration. Pipe model theory states that the sapwood cross-sectional area of a stem or branch at any point should scale isometrically with the area of leaves distal to that point. Optimization theory further suggests that LA:SA should decrease towards drier climates. Although acclimation of LA:SA to climate has been reported within species, much less is known about the scaling of this trait with climate among species. We compiled LA:SA measurements from 184 species of Australian evergreen angiosperm trees. The pipe model was broadly confirmed, based on measurements on branches and trunks of trees from one to 27 years old. We found considerable scatter in LA:SA among species. However quantile regression showed strong (0.2
Togashi, Henrique Furstenau; Prentice, Iain Colin; Evans, Bradley John; Forrester, David Ian; Drake, Paul; Feikema, Paul; Brooksbank, Kim; Eamus, Derek; Taylor, Daniel
2015-03-01
The leaf area-to-sapwood area ratio (LA:SA) is a key plant trait that links photosynthesis to transpiration. The pipe model theory states that the sapwood cross-sectional area of a stem or branch at any point should scale isometrically with the area of leaves distal to that point. Optimization theory further suggests that LA:SA should decrease toward drier climates. Although acclimation of LA:SA to climate has been reported within species, much less is known about the scaling of this trait with climate among species. We compiled LA:SA measurements from 184 species of Australian evergreen angiosperm trees. The pipe model was broadly confirmed, based on measurements on branches and trunks of trees from one to 27 years old. Despite considerable scatter in LA:SA among species, quantile regression showed strong (0.2 < R1 < 0.65) positive relationships between two climatic moisture indices and the lowermost (5%) and uppermost (5-15%) quantiles of log LA:SA, suggesting that moisture availability constrains the envelope of minimum and maximum values of LA:SA typical for any given climate. Interspecific differences in plant hydraulic conductivity are probably responsible for the large scatter of values in the mid-quantile range and may be an important determinant of tree morphology.
Togashi, Henrique Furstenau; Prentice, Iain Colin; Evans, Bradley John; Forrester, David Ian; Drake, Paul; Feikema, Paul; Brooksbank, Kim; Eamus, Derek; Taylor, Daniel
2015-01-01
The leaf area-to-sapwood area ratio (LA:SA) is a key plant trait that links photosynthesis to transpiration. The pipe model theory states that the sapwood cross-sectional area of a stem or branch at any point should scale isometrically with the area of leaves distal to that point. Optimization theory further suggests that LA:SA should decrease toward drier climates. Although acclimation of LA:SA to climate has been reported within species, much less is known about the scaling of this trait with climate among species. We compiled LA:SA measurements from 184 species of Australian evergreen angiosperm trees. The pipe model was broadly confirmed, based on measurements on branches and trunks of trees from one to 27 years old. Despite considerable scatter in LA:SA among species, quantile regression showed strong (0.2 < R1 < 0.65) positive relationships between two climatic moisture indices and the lowermost (5%) and uppermost (5–15%) quantiles of log LA:SA, suggesting that moisture availability constrains the envelope of minimum and maximum values of LA:SA typical for any given climate. Interspecific differences in plant hydraulic conductivity are probably responsible for the large scatter of values in the mid-quantile range and may be an important determinant of tree morphology. PMID:25859331
Sidewalk Landscape Structure and Thermal Conditions for Child and Adult Pedestrians
Kim, Young-Jae; Lee, Chanam; Kim, Jun-Hyun
2018-01-01
Walking is being promoted for health and transportation purposes across all climatic regions in the US and beyond. Despite this, an uncomfortable microclimate condition along sidewalks is one of the major deterrents of walking, and more empirical research is needed to determine the risks of heat exposure to pedestrians while walking. This study examined the effect of street trees and grass along sidewalks on air temperatures. A series of thermal images were taken at the average heights of adults and children in the US to objectively measure the air temperatures of 10 sidewalk segments in College Station, TX, USA. After controlling the other key physical environmental conditions, sidewalks with more trees or wider grass buffer areas had lower air temperatures than those with less vegetation. Children were exposed to higher temperatures due to the greater exposure or proximity to the pavement surface, which tends to have higher radiant heat. Multivariate regression analysis suggested that the configuration of trees and grass buffers along the sidewalks helped to promote pleasant thermal conditions and reduced the differences in ambient air temperatures measured at child and adult heights. This study suggests that street trees and vegetated ground help reduce the air temperatures, leading to more thermally comfortable environments for both child and adult pedestrians in warm climates. The thermal implications of street landscape require further attention by researchers and policy makers that are interested in promoting outdoor walking. PMID:29346312
Rossi, Sergio; Deslauriers, Annie; Anfodillo, Tommaso; Morin, Hubert; Saracino, Antonio; Motta, Renzo; Borghetti, Marco
2006-01-01
Intra-annual radial growth rates and durations in trees are reported to differ greatly in relation to species, site and environmental conditions. However, very similar dynamics of cambial activity and wood formation are observed in temperate and boreal zones. Here, we compared weekly xylem cell production and variation in stem circumference in the main northern hemisphere conifer species (genera Picea, Pinus, Abies and Larix) from 1996 to 2003. Dynamics of radial growth were modeled with a Gompertz function, defining the upper asymptote (A), x-axis placement (beta) and rate of change (kappa). A strong linear relationship was found between the constants beta and kappa for both types of analysis. The slope of the linear regression, which corresponds to the time at which maximum growth rate occurred, appeared to converge towards the summer solstice. The maximum growth rate occurred around the time of maximum day length, and not during the warmest period of the year as previously suggested. The achievements of photoperiod could act as a growth constraint or a limit after which the rate of tree-ring formation tends to decrease, thus allowing plants to safely complete secondary cell wall lignification before winter.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei
2014-01-01
Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone. PMID:25170918
Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei
2014-01-01
Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone.
Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao
2015-01-01
Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568
NASA Astrophysics Data System (ADS)
Bigdeli, Behnaz; Pahlavani, Parham
2017-01-01
Interpretation of synthetic aperture radar (SAR) data processing is difficult because the geometry and spectral range of SAR are different from optical imagery. Consequently, SAR imaging can be a complementary data to multispectral (MS) optical remote sensing techniques because it does not depend on solar illumination and weather conditions. This study presents a multisensor fusion of SAR and MS data based on the use of classification and regression tree (CART) and support vector machine (SVM) through a decision fusion system. First, different feature extraction strategies were applied on SAR and MS data to produce more spectral and textural information. To overcome the redundancy and correlation between features, an intrinsic dimension estimation method based on noise-whitened Harsanyi, Farrand, and Chang determines the proper dimension of the features. Then, principal component analysis and independent component analysis were utilized on stacked feature space of two data. Afterward, SVM and CART classified each reduced feature space. Finally, a fusion strategy was utilized to fuse the classification results. To show the effectiveness of the proposed methodology, single classification on each data was compared to the obtained results. A coregistered Radarsat-2 and WorldView-2 data set from San Francisco, USA, was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with optical sensor based on the proposed methodology improve the classification results for most of the classes. The proposed fusion method provided approximately 93.24% and 95.44% for two different areas of the data.
Erdman, Laura K.; D’Acremont, Valérie; Hayford, Kyla; Kilowoko, Mary; Kyungu, Esther; Hongoa, Philipina; Alamo, Leonor; Streiner, David L.; Genton, Blaise; Kain, Kevin C.
2015-01-01
Background Diagnosing pediatric pneumonia is challenging in low-resource settings. The World Health Organization (WHO) has defined primary end-point radiological pneumonia for use in epidemiological and vaccine studies. However, radiography requires expertise and is often inaccessible. We hypothesized that plasma biomarkers of inflammation and endothelial activation may be useful surrogates for end-point pneumonia, and may provide insight into its biological significance. Methods We studied children with WHO-defined clinical pneumonia (n = 155) within a prospective cohort of 1,005 consecutive febrile children presenting to Tanzanian outpatient clinics. Based on x-ray findings, participants were categorized as primary end-point pneumonia (n = 30), other infiltrates (n = 31), or normal chest x-ray (n = 94). Plasma levels of 7 host response biomarkers at presentation were measured by ELISA. Associations between biomarker levels and radiological findings were assessed by Kruskal-Wallis test and multivariable logistic regression. Biomarker ability to predict radiological findings was evaluated using receiver operating characteristic curve analysis and Classification and Regression Tree analysis. Results Compared to children with normal x-ray, children with end-point pneumonia had significantly higher C-reactive protein, procalcitonin and Chitinase 3-like-1, while those with other infiltrates had elevated procalcitonin and von Willebrand Factor and decreased soluble Tie-2 and endoglin. Clinical variables were not predictive of radiological findings. Classification and Regression Tree analysis generated multi-marker models with improved performance over single markers for discriminating between groups. A model based on C-reactive protein and Chitinase 3-like-1 discriminated between end-point pneumonia and non-end-point pneumonia with 93.3% sensitivity (95% confidence interval 76.5–98.8), 80.8% specificity (72.6–87.1), positive likelihood ratio 4.9 (3.4–7.1), negative likelihood ratio 0.083 (0.022–0.32), and misclassification rate 0.20 (standard error 0.038). Conclusions In Tanzanian children with WHO-defined clinical pneumonia, combinations of host biomarkers distinguished between end-point pneumonia, other infiltrates, and normal chest x-ray, whereas clinical variables did not. These findings generate pathophysiological hypotheses and may have potential research and clinical utility. PMID:26366571
Chen, Yun; Niu, Shuai; Li, Peikun; Jia, Hongru; Wang, Hailiang; Ye, Yongzhong; Yuan, Zhiliang
2017-01-01
Elucidating the major drivers of bryophyte distribution is the first step to protecting bryophyte diversity. Topography, forest, substrates (ground, tree trunks, roots, rocks, and rotten wood), and spatial factor, which factors are the major drivers of bryophyte distribution? In this study, 53 plots were set in 400 m2 along the elevation gradient in Xiaoqinling, China. All bryophytes in the plots were collected and identified. Regression analysis was used to examine the relationship between bryophyte and substrate diversity. We compared the patterns of overall bryophyte diversity and diversity of bryophytes found on the ground, tree, and rock along elevational gradients. Canonical correspondence analysis was applied to relate species composition to selected environmental variables. The importance of topography, forest, substrates, and spatial factors was determined by variance partitioning. A total of 1378 bryophyte specimens were collected, and 240 species were identified. Bryophyte diversity was closely related to substrate diversity. The overall bryophyte diversity significantly increased with elevation; however, the response varied among ground, tree, and rock bryophytes. Tree diversity and herb layer were considered important environmental factors in determining bryophyte distribution. Species abundance was best explained by stand structure (17%), and species diversity was best explained by stand structure (35%) and substrate (40%). Results directly indicated that substrate diversity can improve bryophyte species diversity. The effects of micro-habitat formed by stand structure and substrate diversity were higher than those of spatial processes and topography factors on bryophyte distribution. This study proved that the determinant factors influencing bryophyte diversity reflect the trends in recent forest management, providing a real opportunity to improve forest biodiversity conservation. PMID:28603535
Chen, Yun; Niu, Shuai; Li, Peikun; Jia, Hongru; Wang, Hailiang; Ye, Yongzhong; Yuan, Zhiliang
2017-01-01
Elucidating the major drivers of bryophyte distribution is the first step to protecting bryophyte diversity. Topography, forest, substrates (ground, tree trunks, roots, rocks, and rotten wood), and spatial factor, which factors are the major drivers of bryophyte distribution? In this study, 53 plots were set in 400 m 2 along the elevation gradient in Xiaoqinling, China. All bryophytes in the plots were collected and identified. Regression analysis was used to examine the relationship between bryophyte and substrate diversity. We compared the patterns of overall bryophyte diversity and diversity of bryophytes found on the ground, tree, and rock along elevational gradients. Canonical correspondence analysis was applied to relate species composition to selected environmental variables. The importance of topography, forest, substrates, and spatial factors was determined by variance partitioning. A total of 1378 bryophyte specimens were collected, and 240 species were identified. Bryophyte diversity was closely related to substrate diversity. The overall bryophyte diversity significantly increased with elevation; however, the response varied among ground, tree, and rock bryophytes. Tree diversity and herb layer were considered important environmental factors in determining bryophyte distribution. Species abundance was best explained by stand structure (17%), and species diversity was best explained by stand structure (35%) and substrate (40%). Results directly indicated that substrate diversity can improve bryophyte species diversity. The effects of micro-habitat formed by stand structure and substrate diversity were higher than those of spatial processes and topography factors on bryophyte distribution. This study proved that the determinant factors influencing bryophyte diversity reflect the trends in recent forest management, providing a real opportunity to improve forest biodiversity conservation.
Application Research of Fault Tree Analysis in Grid Communication System Corrective Maintenance
NASA Astrophysics Data System (ADS)
Wang, Jian; Yang, Zhenwei; Kang, Mei
2018-01-01
This paper attempts to apply the fault tree analysis method to the corrective maintenance field of grid communication system. Through the establishment of the fault tree model of typical system and the engineering experience, the fault tree analysis theory is used to analyze the fault tree model, which contains the field of structural function, probability importance and so on. The results show that the fault tree analysis can realize fast positioning and well repairing of the system. Meanwhile, it finds that the analysis method of fault tree has some guiding significance to the reliability researching and upgrading f the system.
Evaluation and prediction of shrub cover in coastal Oregon forests (USA)
Becky K. Kerns; Janet L. Ohmann
2004-01-01
We used data from regional forest inventories and research programs, coupled with mapped climatic and topographic information, to explore relationships and develop multiple linear regression (MLR) and regression tree models for total and deciduous shrub cover in the Oregon coastal province. Results from both types of models indicate that forest structure variables were...
ERIC Educational Resources Information Center
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Weighted linear regression using D2H and D2 as the independent variables
Hans T. Schreuder; Michael S. Williams
1998-01-01
Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
Guo, Jin-Cheng; Wu, Yang; Chen, Yang; Pan, Feng; Wu, Zhi-Yong; Zhang, Jia-Sheng; Wu, Jian-Yi; Xu, Xiu-E; Zhao, Jian-Mei; Li, En-Min; Zhao, Yi; Xu, Li-Yan
2018-04-09
Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal carcinoma in China. This study was to develop a staging model to predict outcomes of patients with ESCC. Using Cox regression analysis, principal component analysis (PCA), partitioning clustering, Kaplan-Meier analysis, receiver operating characteristic (ROC) curve analysis, and classification and regression tree (CART) analysis, we mined the Gene Expression Omnibus database to determine the expression profiles of genes in 179 patients with ESCC from GSE63624 and GSE63622 dataset. Univariate cox regression analysis of the GSE63624 dataset revealed that 2404 protein-coding genes (PCGs) and 635 long non-coding RNAs (lncRNAs) were associated with the survival of patients with ESCC. PCA categorized these PCGs and lncRNAs into three principal components (PCs), which were used to cluster the patients into three groups. ROC analysis demonstrated that the predictive ability of PCG-lncRNA PCs when applied to new patients was better than that of the tumor-node-metastasis staging (area under ROC curve [AUC]: 0.69 vs. 0.65, P < 0.05). Accordingly, we constructed a molecular disaggregated model comprising one lncRNA and two PCGs, which we designated as the LSB staging model using CART analysis in the GSE63624 dataset. This LSB staging model classified the GSE63622 dataset of patients into three different groups, and its effectiveness was validated by analysis of another cohort of 105 patients. The LSB staging model has clinical significance for the prognosis prediction of patients with ESCC and may serve as a three-gene staging microarray.
Price, B; Gomez, A; Mathys, L; Gardi, O; Schellenberger, A; Ginzler, C; Thürig, E
2017-03-01
Trees outside forest (TOF) can perform a variety of social, economic and ecological functions including carbon sequestration. However, detailed quantification of tree biomass is usually limited to forest areas. Taking advantage of structural information available from stereo aerial imagery and airborne laser scanning (ALS), this research models tree biomass using national forest inventory data and linear least-square regression and applies the model both inside and outside of forest to create a nationwide model for tree biomass (above ground and below ground). Validation of the tree biomass model against TOF data within settlement areas shows relatively low model performance (R 2 of 0.44) but still a considerable improvement on current biomass estimates used for greenhouse gas inventory and carbon accounting. We demonstrate an efficient and easily implementable approach to modelling tree biomass across a large heterogeneous nationwide area. The model offers significant opportunity for improved estimates on land use combination categories (CC) where tree biomass has either not been included or only roughly estimated until now. The ALS biomass model also offers the advantage of providing greater spatial resolution and greater within CC spatial variability compared to the current nationwide estimates.
Sah, Jay P.; Ross, Michael S.; Snyder, James R.; Ogurcak, Danielle E.
2010-01-01
In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated with tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.
NASA Astrophysics Data System (ADS)
Berner, Logan T.; Beck, Pieter S. A.; Bunn, Andrew G.; Lloyd, Andrea H.; Goetz, Scott J.
2011-03-01
Vegetation in northern high latitudes affects regional and global climate through energy partitioning and carbon storage. Spaceborne observations of vegetation, largely based on the normalized difference vegetation index (NDVI), suggest decreased productivity during recent decades in many regions of the Eurasian and North American boreal forests. To improve interpretation of NDVI trends over forest regions, we examined the relationship between NDVI from the advanced very high resolution radiometers and tree ring width measurements, a proxy of tree productivity. We collected tree core samples from spruce, pine, and larch at 22 sites in northeast Russia and northwest Canada. Annual growth rings were measured and used to generate site-level ring width index (RWI) chronologies. Correlation analysis was used to assess the association between RWI and summer NDVI from 1982 to 2008, while linear regression was used to examine trends in both measurements. The correlation between NDVI and RWI was highly variable across sites, though consistently positive (r = 0.43, SD = 0.19, n = 27). We observed significant temporal autocorrelation in both NDVI and RWI measurements at sites with evergreen conifers (spruce and pine), though weak autocorrelation at sites with deciduous conifers (larch). No sites exhibited a positive trend in both NDVI and RWI, although five sites showed negative trends in both measurements. While there are technological and physiological limitations to this approach, these findings demonstrate a positive association between NDVI and tree ring measurements, as well as the importance of considering lagged effects when modeling vegetation productivity using satellite data.
Environmental correlates of plant diversity in Korean temperate forests
NASA Astrophysics Data System (ADS)
Černý, Tomáš; Doležal, Jiří; Janeček, Štěpán; Šrůtek, Miroslav; Valachovič, Milan; Petřík, Petr; Altman, Jan; Bartoš, Michael; Song, Jong-Suk
2013-02-01
Mountainous areas of the Korean Peninsula are among the biodiversity hotspots of the world's temperate forests. Understanding patterns in spatial distribution of their species richness requires explicit consideration of different environmental drivers and their effects on functionally differing components. In this study, we assess the impact of both geographical and soil variables on the fine-scale (400 m2) pattern of plant diversity using field data from six national parks, spanning a 1300 m altitudinal gradient. Species richness and the slopes of species-area curves were calculated separately for the tree, shrub and herb layer and used as response variables in regression tree analyses. A cluster analysis distinguished three dominant forest communities with specific patterns in the diversity-environment relationship. The most widespread middle-altitude oak forests had the highest tree richness but the lowest richness of herbaceous plants due to a dense bamboo understory. Total richness was positively associated with soil reaction and negatively associated with soluble phosphorus and solar radiation (site dryness). Tree richness was associated mainly with soil factors, although trees are frequently assumed to be controlled mainly by factors with large-scale impact. A U-shaped relationship was found between herbaceous plant richness and altitude, caused by a distribution pattern of dwarf bamboo in understory. No correlation between the degree of canopy openness and herb layer richness was detected. Slopes of the species-area curves indicated the various origins of forest communities. Variable diversity-environment responses in different layers and communities reinforce the necessity of context-dependent differentiation for the assessment of impacts of climate and land-use changes in these diverse but intensively exploited regions.
Carvalho-Oliveira, Regiani; Amato-Lourenço, Luís F; Moreira, Tiana C L; Silva, Douglas R Rocha; Vieira, Bruna D; Mauad, Thais; Saiki, Mitiko; Saldiva, Paulo H Nascimento
2017-02-01
The majority of epidemiological studies correlate the cardiorespiratory effects of air pollution exposure by considering the concentrations of pollutants measured from conventional monitoring networks. The conventional air quality monitoring methods are expensive, and their data are insufficient for providing good spatial resolution. We hypothesized that bioassays using plants could effectively determine pollutant gradients, thus helping to assess the risks associated with air pollution exposure. The study regions were determined from different prevalent respiratory death distributions in the Sao Paulo municipality. Samples of tree flower buds were collected from twelve sites in four regional districts. The genotoxic effects caused by air pollution were tested through a pollen abortion bioassay. Elements derived from vehicular traffic that accumulated in tree barks were determined using energy-dispersive X-ray fluorescence spectrometry (EDXRF). Mortality data were collected from the mortality information program of Sao Paulo City. Principal component analysis (PCA) was applied to the concentrations of elements accumulated in tree barks. Pearson correlation and exponential regression were performed considering the elements, pollen abortion rates and mortality data. PCA identified five factors, of which four represented elements related to vehicular traffic. The elements Al, S, Fe, Mn, Cu, and Zn showed a strong correlation with mortality rates (R 2 >0.87) and pollen abortion rates (R 2 >0.82). These results demonstrate that tree barks and pollen abortion rates allow for correlations between vehicular traffic emissions and associated outcomes such as genotoxic effects and mortality data. Copyright © 2016 Elsevier Ltd. All rights reserved.
The influence of tree morphology on stemflow generation in a tropical lowland rainforest
NASA Astrophysics Data System (ADS)
Uber, Magdalena; Levia, Delphis F.; Zimmermann, Beate; Zimmermann, Alexander
2014-05-01
Even though stemflow usually accounts for only a small proportion of rainfall, it is an important point source of water and ion input to forest floors and may, for instance, influence soil moisture patterns and groundwater recharge. Previous studies showed that the generation of stemflow depends on a multitude of meteorological and biological factors. Interestingly, despite the tremendous progress in stemflow research during the last decades it is still largely unknown which combination of tree characteristics determines stemflow volumes in species-rich tropical forests. This knowledge gap motivated us to analyse the influence of tree characteristics on stemflow volumes in a 1 hectare plot located in a Panamanian lowland rainforest. Our study comprised stemflow measurements in six randomly selected 10 m by 10 m subplots. In each subplot we measured stemflow of all trees with a diameter at breast height (DBH) > 5 cm on an event-basis for a period of six weeks. Additionally, we identified all tree species and determined a set of tree characteristics including DBH, crown diameter, bark roughness, bark furrowing, epiphyte coverage, tree architecture, stem inclination, and crown position. During the sampling period, we collected 985 L of stemflow (0.98 % of total rainfall). Based on regression analyses and comparisons among plant functional groups we show that palms were most efficient in yielding stemflow due to their large inclined fronds. Trees with large emergent crowns also produced relatively large amounts of stemflow. Due to their abundance, understory trees contribute much to stemflow yield not on individual but on the plot scale. Even though parameters such as crown diameter, branch inclination and position of the crown influence stemflow generation to some extent, these parameters explain less than 30 % of the variation in stemflow volumes. In contrast to published results from temperate forests, we did not detect a negative correlation between bark roughness and stemflow volume. This is because other parameters such as crown diameter obscured this relationship. Due to multicollinearity and poor correlations between single tree characteristics with stemflow volume, an assessment of stemflow volumes based on forest characteristics remains cumbersome in highly diverse ecosystems. Instead of relying on regression relationships, we therefore advocate a total sampling of trees in several plots to determine stand-scale stemflow yield in tropical forests.
Bayesian and Phylogenic Approaches for Studying Relationships among Table Olive Cultivars.
Ben Ayed, Rayda; Ennouri, Karim; Ben Amar, Fathi; Moreau, Fabienne; Triki, Mohamed Ali; Rebai, Ahmed
2017-08-01
To enhance table olive tree authentication, relationship, and productivity, we consider the analysis of 18 worldwide table olive cultivars (Olea europaea L.) based on morphological, biological, and physicochemical markers analyzed by bioinformatic and biostatistic tools. Accordingly, we assess the relationships between the studied varieties, on the one hand, and the potential productivity-quantitative parameter links on the other hand. The bioinformatic analysis based on the graphical representation of the matrix of Euclidean distances, the principal components analysis, unweighted pair group method with arithmetic mean, and principal coordinate analysis (PCoA) revealed three major clusters which were not correlated with the geographic origin. The statistical analysis based on Kendall's and Spearman correlation coefficients suggests two highly significant associations with both fruit color and pollinization and the productivity character. These results are confirmed by the multiple linear regression prediction models. In fact, based on the coefficient of determination (R 2 ) value, the best model demonstrated the power of the pollinization on the tree productivity (R 2 = 0.846). Moreover, the derived directed acyclic graph showed that only two direct influences are detected: effect of tolerance on fruit and stone symmetry on side and effect of tolerance on stone form and oil content on the other side. This work provides better understanding of the diversity available in worldwide table olive cultivars and supplies an important contribution for olive breeding and authenticity.
Presence of indicator plant species as a predictor of wetland vegetation integrity
Stapanian, Martin A.; Adams, Jean V.; Gara, Brian
2013-01-01
We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.
Automatic energy expenditure measurement for health science.
Catal, Cagatay; Akbulut, Akhan
2018-04-01
It is crucial to predict the human energy expenditure in any sports activity and health science application accurately to investigate the impact of the activity. However, measurement of the real energy expenditure is not a trivial task and involves complex steps. The objective of this work is to improve the performance of existing estimation models of energy expenditure by using machine learning algorithms and several data from different sensors and provide this estimation service in a cloud-based platform. In this study, we used input data such as breathe rate, and hearth rate from three sensors. Inputs are received from a web form and sent to the web service which applies a regression model on Azure cloud platform. During the experiments, we assessed several machine learning models based on regression methods. Our experimental results showed that our novel model which applies Boosted Decision Tree Regression in conjunction with the median aggregation technique provides the best result among other five regression algorithms. This cloud-based energy expenditure system which uses a web service showed that cloud computing technology is a great opportunity to develop estimation systems and the new model which applies Boosted Decision Tree Regression with the median aggregation provides remarkable results. Copyright © 2018 Elsevier B.V. All rights reserved.
Preliminary Survey on TRY Forest Traits and Growth Index Relations - New Challenges
NASA Astrophysics Data System (ADS)
Lyubenova, Mariyana; Kattge, Jens; van Bodegom, Peter; Chikalanov, Alexandre; Popova, Silvia; Zlateva, Plamena; Peteva, Simona
2016-04-01
Forest ecosystems provide critical ecosystem goods and services, including food, fodder, water, shelter, nutrient cycling, and cultural and recreational value. Forests also store carbon, provide habitat for a wide range of species and help alleviate land degradation and desertification. Thus they have a potentially significant role to play in climate change adaptation planning through maintaining ecosystem services and providing livelihood options. Therefore the study of forest traits is such an important issue not just for individual countries but for the planet as a whole. We need to know what functional relations between forest traits exactly can express TRY data base and haw it will be significant for the global modeling and IPBES. The study of the biodiversity characteristics at all levels and functional links between them is extremely important for the selection of key indicators for assessing biodiversity and ecosystem services for sustainable natural capital control. By comparing the available information in tree data bases: TRY, ITR (International Tree Ring) and SP-PAM the 42 tree species are selected for the traits analyses. The dependence between location characteristics (latitude, longitude, altitude, annual precipitation, annual temperature and soil type) and forest traits (specific leaf area, leaf weight ratio, wood density and growth index) is studied by by multiply regression analyses (RDA) using the statistical software package Canoco 4.5. The Pearson correlation coefficient (measure of linear correlation), Kendal rank correlation coefficient (non parametric measure of statistical dependence) and Spearman correlation coefficient (monotonic function relationship between two variables) are calculated for each pair of variables (indexes) and species. After analysis of above mentioned correlation coefficients the dimensional linear regression models, multidimensional linear and nonlinear regression models and multidimensional neural networks models are built. The strongest dependence between It and WD was obtained. The research will support the work on: Strategic Plan for Biodiversity 2011-2020, modelling and implementation of ecosystem-based approaches to climate change adaptation and disaster risk reduction. Key words: Specific leaf area (SLA), Leaf weight ratio (LWR), Wood density (WD), Growth index (It)
Selective Tree-ring Models: A Novel Method for Reconstructing Streamflow Using Tree Rings
NASA Astrophysics Data System (ADS)
Foard, M. B.; Nelson, A. S.; Harley, G. L.
2017-12-01
Surface water is among the most instrumental and vulnerable resources in the Northwest United States (NW). Recent observations show that overall water quantity is declining in streams across the region, while extreme flooding events occur more frequently. Historical streamflow models inform probabilities of extreme flow events (flood or drought) by describing frequency and duration of past events. There are numerous examples of tree-rings being utilized to reconstruct streamflow in the NW. These models confirm that tree-rings are highly accurate at predicting streamflow, however there are many nuances that limit their applicability through time and space. For example, most models predict streamflow from hydrologically altered rivers (e.g. dammed, channelized) which may hinder our ability to predict natural prehistoric flow. They also have a tendency to over/under-predict extreme flow events. Moreover, they often neglect to capture the changing relationships between tree-growth and streamflow over time and space. To address these limitations, we utilized national tree-ring and streamflow archives to investigate the relationships between the growth of multiple coniferous species and free-flowing streams across the NW using novel species-and site-specific streamflow models - a term we coined"selective tree-ring models." Correlation function analysis and regression modeling were used to evaluate the strengths and directions of the flow-growth relationships. Species with significant relationships in the same direction were identified as strong candidates for selective models. Temporal and spatial patterns of these relationships were examined using running correlations and inverse distance weighting interpolation, respectively. Our early results indicate that (1) species adapted to extreme climates (e.g. hot-dry, cold-wet) exhibit the most consistent relationships across space, (2) these relationships weaken in locations with mild climatic variability, and (3) some species appear to be strong candidates for predicting high flow events, while others may be better at pridicting drought. These findings indicate that selective models may outperform traditional models when reconstructing distinctive aspects of streamflow.
Ricker, Martin; Peña Ramírez, Víctor M; von Rosen, Dietrich
2014-01-01
Growth curves are monotonically increasing functions that measure repeatedly the same subjects over time. The classical growth curve model in the statistical literature is the Generalized Multivariate Analysis of Variance (GMANOVA) model. In order to model the tree trunk radius (r) over time (t) of trees on different sites, GMANOVA is combined here with the adapted PL regression model Q = A · T+E, where for b ≠ 0 : Q = Ei[-b · r]-Ei[-b · r1] and for b = 0 : Q = Ln[r/r1], A = initial relative growth to be estimated, T = t-t1, and E is an error term for each tree and time point. Furthermore, Ei[-b · r] = ∫(Exp[-b · r]/r)dr, b = -1/TPR, with TPR being the turning point radius in a sigmoid curve, and r1 at t1 is an estimated calibrating time-radius point. Advantages of the approach are that growth rates can be compared among growth curves with different turning point radiuses and different starting points, hidden outliers are easily detectable, the method is statistically robust, and heteroscedasticity of the residuals among time points is allowed. The model was implemented with dendrochronological data of 235 Pinus montezumae trees on ten Mexican volcano sites to calculate comparison intervals for the estimated initial relative growth A. One site (at the Popocatépetl volcano) stood out, with A being 3.9 times the value of the site with the slowest-growing trees. Calculating variance components for the initial relative growth, 34% of the growth variation was found among sites, 31% among trees, and 35% over time. Without the Popocatépetl site, the numbers changed to 7%, 42%, and 51%. Further explanation of differences in growth would need to focus on factors that vary within sites and over time.
Villamor, Grace B.; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi
2018-01-01
Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield. PMID:29346406
Aleza, Koutchoukalo; Villamor, Grace B; Nyarko, Benjamin Kofi; Wala, Kperkouma; Akpagana, Koffi
2018-01-01
Vitellaria paradoxa (Gaertn C. F.), or shea tree, remains one of the most valuable trees for farmers in the Atacora district of northern Benin, where rural communities depend on shea products for both food and income. To optimize productivity and management of shea agroforestry systems, or "parklands," accurate and up-to-date data are needed. For this purpose, we monitored120 fruiting shea trees for two years under three land-use scenarios and different soil groups in Atacora, coupled with a farm household survey to elicit information on decision making and management practices. To examine the local pattern of shea tree productivity and relationships between morphological factors and yields, we used a randomized branch sampling method and applied a regression analysis to build a shea yield model based on dendrometric, soil and land-use variables. We also compared potential shea yields based on farm household socio-economic characteristics and management practices derived from the survey data. Soil and land-use variables were the most important determinants of shea fruit yield. In terms of land use, shea trees growing on farmland plots exhibited the highest yields (i.e., fruit quantity and mass) while trees growing on Lixisols performed better than those of the other soil group. Contrary to our expectations, dendrometric parameters had weak relationships with fruit yield regardless of land-use and soil group. There is an inter-annual variability in fruit yield in both soil groups and land-use type. In addition to observed inter-annual yield variability, there was a high degree of variability in production among individual shea trees. Furthermore, household socioeconomic characteristics such as road accessibility, landholding size, and gross annual income influence shea fruit yield. The use of fallow areas is an important land management practice in the study area that influences both conservation and shea yield.
Wilson, Jordan L; Samaranayake, V A; Limmer, Matthew A; Schumacher, John G; Burken, Joel G
2017-12-19
Contaminated sites pose ecological and human-health risks through exposure to contaminated soil and groundwater. Whereas we can readily locate, monitor, and track contaminants in groundwater, it is harder to perform these tasks in the vadose zone. In this study, tree-core samples were collected at a Superfund site to determine if the sample-collection location around a particular tree could reveal the subsurface location, or direction, of soil and soil-gas contaminant plumes. Contaminant-centroid vectors were calculated from tree-core data to reveal contaminant distributions in directional tree samples at a higher resolution, and vectors were correlated with soil-gas characterization collected using conventional methods. Results clearly demonstrated that directional tree coring around tree trunks can indicate gradients in soil and soil-gas contaminant plumes, and the strength of the correlations were directly proportionate to the magnitude of tree-core concentration gradients (spearman's coefficient of -0.61 and -0.55 in soil and tree-core gradients, respectively). Linear regression indicates agreement between the concentration-centroid vectors is significantly affected by in planta and soil concentration gradients and when concentration centroids in soil are closer to trees. Given the existing link between soil-gas and vapor intrusion, this study also indicates that directional tree coring might be applicable in vapor intrusion assessment.
Wilson, Jordan L.; Samaranayake, V.A.; Limmer, Matthew A.; Schumacher, John G.; Burken, Joel G.
2017-01-01
Contaminated sites pose ecological and human-health risks through exposure to contaminated soil and groundwater. Whereas we can readily locate, monitor, and track contaminants in groundwater, it is harder to perform these tasks in the vadose zone. In this study, tree-core samples were collected at a Superfund site to determine if the sample-collection location around a particular tree could reveal the subsurface location, or direction, of soil and soil-gas contaminant plumes. Contaminant-centroid vectors were calculated from tree-core data to reveal contaminant distributions in directional tree samples at a higher resolution, and vectors were correlated with soil-gas characterization collected using conventional methods. Results clearly demonstrated that directional tree coring around tree trunks can indicate gradients in soil and soil-gas contaminant plumes, and the strength of the correlations were directly proportionate to the magnitude of tree-core concentration gradients (spearman’s coefficient of -0.61 and -0.55 in soil and tree-core gradients, respectively). Linear regression indicates agreement between the concentration-centroid vectors is significantly affected by in-planta and soil concentration gradients and when concentration centroids in soil are closer to trees. Given the existing link between soil-gas and vapor intrusion, this study also indicates that directional tree coring might be applicable in vapor intrusion assessment.
Kojima, Gotaro; Iliffe, Steve; Tanabe, Marianne
2017-10-16
A recent controversy in vitamin D research is a "U-shaped association", with elevated disease risks at both high and low 25-hydroxyvitamin D (25 (OH) D) levels. This is a cross-sectional study of 238 male nursing home veterans in Hawaii. Classification and regression tree (CART) analysis identified groups based on 25 (OH) D and vitamin D supplementation for frailty risk. Characteristics were examined and compared across the groups using logistic regression and receiver operating characteristic (ROC) curve analyses. CART analysis identified three distinct groups: vitamin D supplement users (n = 86), non-users with low vitamin D (n = 55), and non-users with high vitamin D (n = 97). Supplement users were the most frail, but had high mean 25 (OH) D of 26.6 ng/mL, which was compatible with 27.1 ng/mL in non-users with high vitamin D, while mean 25 (OH) D of non-users with low vitamin D was 11.7 ng/mL. Supplement users and non-users with low vitamin D were significantly more likely to be frail (odds ratio (OR) = 9.90, 95% CI = 2.18-44.86, p = 0.003; OR = 4.28, 95% CI = 1.44-12.68, p = 0.009, respectively), compared with non-users with low vitamin D. ROC curve analysis showed the three groups significantly predicted frailty (area under the curve = 0.73), with sensitivity of 64.4% and specificity of 76.7%, while 25 (OH) D did not predict frailty. In these nursing home veterans, vitamin D supplement users were the most frail but with high 25 (OH) D. This can potentially be a cause of U-shaped associations between vitamin D levels and negative health outcomes.
Red-shouldered hawk nesting habitat preference in south Texas
Strobel, Bradley N.; Boal, Clint W.
2010-01-01
We examined nesting habitat preference by red-shouldered hawks Buteo lineatus using conditional logistic regression on characteristics measured at 27 occupied nest sites and 68 unused sites in 2005–2009 in south Texas. We measured vegetation characteristics of individual trees (nest trees and unused trees) and corresponding 0.04-ha plots. We evaluated the importance of tree and plot characteristics to nesting habitat selection by comparing a priori tree-specific and plot-specific models using Akaike's information criterion. Models with only plot variables carried 14% more weight than models with only center tree variables. The model-averaged odds ratios indicated red-shouldered hawks selected to nest in taller trees and in areas with higher average diameter at breast height than randomly available within the forest stand. Relative to randomly selected areas, each 1-m increase in nest tree height and 1-cm increase in the plot average diameter at breast height increased the probability of selection by 85% and 10%, respectively. Our results indicate that red-shouldered hawks select nesting habitat based on vegetation characteristics of individual trees as well as the 0.04-ha area surrounding the tree. Our results indicate forest management practices resulting in tall forest stands with large average diameter at breast height would benefit red-shouldered hawks in south Texas.
Discrimination of rectal cancer through human serum using surface-enhanced Raman spectroscopy
NASA Astrophysics Data System (ADS)
Li, Xiaozhou; Yang, Tianyue; Li, Siqi; Zhang, Su; Jin, Lili
2015-05-01
In this paper, surface-enhanced Raman spectroscopy (SERS) was used to detect the changes in blood serum components that accompany rectal cancer. The differences in serum SERS data between rectal cancer patients and healthy controls were examined. Postoperative rectal cancer patients also participated in the comparison to monitor the effects of cancer treatments. The results show that there are significant variations at certain wavenumbers which indicates alteration of corresponding biological substances. Principal component analysis (PCA) and parameters of intensity ratios were used on the original SERS spectra for the extraction of featured variables. These featured variables then underwent linear discriminant analysis (LDA) and classification and regression tree (CART) for the discrimination analysis. Accuracies of 93.5 and 92.4 % were obtained for PCA-LDA and parameter-CART, respectively.
NASA Astrophysics Data System (ADS)
Pathak, Prasad A.
The Arctic region of Alaska is experiencing severe impacts of climate change. The Arctic lakes ecosystems are bound to undergo alterations in its trophic structure and other chemical properties. However, landscape factors controlling the lake influxes were not studied till date. This research has examined the currently existing lake landscape interactions using Remote Sensing and GIS technology. The statistical modeling was carried out using Regression and CART methods. Remote sensing data was applied to derive the required landscape indices. Remote sensing in the Arctic Alaska faces many challenges including persistent cloud cover, low sun angle and limited snow free period. Tundra vegetation types are interspersed and intricate to classify unlike managed forest stands. Therefore, historical studies have remained underachieved with respect thematic accuracies. However, looking at vegetation communities at watershed level and the implementation of expert classification system achieved the accuracies up to 90%. The research has highlighted the probable role of interactions between vegetation root zones, nutrient availability within active zone, as well as importance of permafrost thawing. Multiple regression analyses and Classification Trees were developed to understand relationships between landscape factors with various chemical parameters as well as chlorophyll readings. Spatial properties of Shrubs and Riparian complexes such as complexity of individual patches at watershed level and within proximity of water channels were influential on Chlorophyll production of lakes. Till-age had significant impact on Total Nitrogen contents. Moreover, relatively young tills exhibited significantly positive correlation with concentration of various ions and conductivity of lakes. Similarly, density of patches of Heath complexes was found to be important with respect to Total Phosphorus contents in lakes. All the regression models developed in this study were significant at 95% confidence level. However, the classification trees could not achieve high predictabilities due to limited number of lakes sampled. Keywords: Landscape factors, Lake primary productivity, Arctic, Climate change, Regression, CART
Multivariate analysis of cytokine profiles in pregnancy complications.
Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali
2018-03-01
The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.
Compound analysis via graph kernels incorporating chirality.
Brown, J B; Urata, Takashi; Tamura, Takeyuki; Arai, Midori A; Kawabata, Takeo; Akutsu, Tatsuya
2010-12-01
High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
The relationship between urban forests and race: A meta-analysis
Watkins, Shannon Lea; Gerrish, Ed
2018-01-01
There is ample evidence that urban trees benefit the physical, mental, and social health of urban residents. The environmental justice hypothesis posits that environmental amenities are inequitably low in poor and minority communities, and predicts these communities experience fewer urban environmental benefits. Some previous research has found that urban forest cover is inequitably distributed by race, though other studies have found no relationship or negative inequity. These conflicting results and the single-city nature of the current literature suggest a need for a research synthesis. Using a systematic literature search and meta-analytic techniques, we examined the relationship between urban forest cover and race. First, we estimated the average (unconditional) relationship between urban forest cover and race across studies (studies = 40; effect sizes = 388). We find evidence of significant race-based inequity in urban forest cover. Second, we included characteristics of the original studies and study sites in meta-regressions to illuminate drivers of variation of urban forest cover between studies. Our meta-regressions reveal that the relationship varies across racial groups and by study methodology. Models reveal significant inequity on public land and that environmental and social characteristics of cities help explain variation across studies. As tree planting and other urban forestry programs proliferate, urban forestry professionals are encouraged to consider the equity consequences of urban forestry activities, particularly on public land. PMID:29289843
The relationship between urban forests and race: A meta-analysis.
Watkins, Shannon Lea; Gerrish, Ed
2018-03-01
There is ample evidence that urban trees benefit the physical, mental, and social health of urban residents. The environmental justice hypothesis posits that environmental amenities are inequitably low in poor and minority communities, and predicts these communities experience fewer urban environmental benefits. Some previous research has found that urban forest cover is inequitably distributed by race, though other studies have found no relationship or negative inequity. These conflicting results and the single-city nature of the current literature suggest a need for a research synthesis. Using a systematic literature search and meta-analytic techniques, we examined the relationship between urban forest cover and race. First, we estimated the average (unconditional) relationship between urban forest cover and race across studies (studies = 40; effect sizes = 388). We find evidence of significant race-based inequity in urban forest cover. Second, we included characteristics of the original studies and study sites in meta-regressions to illuminate drivers of variation of urban forest cover between studies. Our meta-regressions reveal that the relationship varies across racial groups and by study methodology. Models reveal significant inequity on public land and that environmental and social characteristics of cities help explain variation across studies. As tree planting and other urban forestry programs proliferate, urban forestry professionals are encouraged to consider the equity consequences of urban forestry activities, particularly on public land. Copyright © 2017. Published by Elsevier Ltd.
Determination of colonoscopy indication from administrative claims data.
Ko, Cynthia W; Dominitz, Jason A; Neradilek, Moni; Polissar, Nayak; Green, Pam; Kreuter, William; Baldwin, Laura-Mae
2014-04-01
Colonoscopy outcomes, such as polyp detection or complication rates, may differ by procedure indication. To develop methods to classify colonoscopy indications from administrative data, facilitating study of colonoscopy quality and outcomes. We linked 14,844 colonoscopy reports from the Clinical Outcomes Research Initiative, a national repository of endoscopic reports, to the corresponding Medicare Carrier and Outpatient File claims. Colonoscopy indication was determined from the procedure reports. We developed algorithms using classification and regression trees and linear discriminant analysis (LDA) to classify colonoscopy indication. Predictor variables included ICD-9CM and CPT/HCPCS codes present on the colonoscopy claim or in the 12 months prior, patient demographics, and site of colonoscopy service. Algorithms were developed on a training set of 7515 procedures, then validated using a test set of 7329 procedures. Sensitivity was lowest for identifying average-risk screening colonoscopies, varying between 55% and 86% for the different algorithms, but specificity for this indication was consistently over 95%. Sensitivity for diagnostic colonoscopy varied between 77% and 89%, with specificity between 55% and 87%. Algorithms with classification and regression trees with 7 variables or LDA with 10 variables had similar overall accuracy, and generally lower accuracy than the algorithm using LDA with 30 variables. Algorithms using Medicare claims data have moderate sensitivity and specificity for colonoscopy indication, and will be useful for studying colonoscopy quality in this population. Further validation may be needed before use in alternative populations.
ERIC Educational Resources Information Center
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
B. Desta Fekedulegn; J.J. Colbert; R.R., Jr. Hicks; Michael E. Schuckers
2002-01-01
The theory and application of principal components regression, a method for coping with multicollinearity among independent variables in analyzing ecological data, is exhibited in detail. A concrete example of the complex procedures that must be carried out in developing a diagnostic growth-climate model is provided. We use tree radial increment data taken from breast...
Regression methods for spatially correlated data: an example using beetle attacks in a seed orchard
Preisler Haiganoush; Nancy G. Rappaport; David L. Wood
1997-01-01
We present a statistical procedure for studying the simultaneous effects of observed covariates and unmeasured spatial variables on responses of interest. The procedure uses regression type analyses that can be used with existing statistical software packages. An example using the rate of twig beetle attacks on Douglas-fir trees in a seed orchard illustrates the...
J. Stephen Brewer
2010-01-01
Quantifying per capita impacts of invasive species on resident communities requires integrating regression analyses with experiments under natural conditions. Using multivariate and univariate approaches, I regressed the abundance of 105 resident species of groundcover plants and tree seedlings against the abundance and height of an invasive grass, Microstegium...
Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly
2013-01-01
We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Shin, Yoonseok
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
NASA Astrophysics Data System (ADS)
Omer, Galal; Mutanga, Onisimo; Abdel-Rahman, Elfatih M.; Peerbhay, Kabir; Adam, Elhadi
2017-09-01
Forest nitrogen (N) and carbon (C) are among the most important biochemical components of tree organic matter, and the estimation of their concentrations can help to monitor the nutrient uptake processes and health of forest trees. Traditionally, these tree biochemical components are estimated using costly, labour intensive, time-consuming and subjective analytical protocols. The use of very high spatial resolution multispectral data and advanced machine learning regression algorithms such as support vector machines (SVM) and artificial neural networks (ANN) provide an opportunity to accurately estimate foliar N and C concentrations over intact and fragmented forest ecosystems. In the present study, the utility of spectral vegetation indices calculated from WorldView-2 (WV-2) imagery for mapping leaf N and C concentrations of fragmented and intact indigenous forest ecosystems was explored. We collected leaf samples from six tree species in the fragmented as well as intact Dukuduku indigenous forest ecosystems. Leaf samples (n = 85 for each of the fragmented and intact forests) were subjected to chemical analysis for estimating the concentrations of N and C. We used 70% of samples for training our models and 30% for validating the accuracy of our predictive empirical models. The study showed that the N concentration was significantly higher (p = 0.03) in the intact forests than in the fragmented forest. There was no significant difference (p = 0.55) in the C concentration between the intact and fragmented forest strata. The results further showed that the foliar N and C concentrations could be more accurately estimated using the fragmented stratum data compared with the intact stratum data. Further, SVM achieved relatively more accurate N (maximum R2 Val = 0.78 and minimum RMSEVal = 1.07% of the mean) and C (maximum R2 Val = 0.67 and minimum RMSEVal = 1.64% of the mean) estimates compared with ANN (maximum R2Val = 0.70 for N and 0.51 for C and minimum RMSEVal = 5.40% of the mean for N and 2.21% of the mean for C). Overall, SVM regressions achieved more accurate models for estimating forest foliar N and C concentrations in the fragmented and intact indigenous forests compared to the ANN regression method. It is concluded that the successful application of the WV-2 data integrated with SVM can provide an accurate framework for mapping the concentrations of biochemical elements in two indigenous forest ecosystems.
Tree Species with Photosynthetic Stems Have Greater Nighttime Sap Flux.
Chen, Xia; Gao, Jianguo; Zhao, Ping; McCarthy, Heather R; Zhu, Liwei; Ni, Guangyan; Ouyang, Lei
2018-01-01
An increasing body of evidence has shown that nighttime sap flux occurs in most plants, but the physiological implications and regulatory mechanism are poorly known. The significance of corticular photosynthesis has received much attention during the last decade, however, the knowledge of the relationship between corticular photosynthesis and nocturnal stem sap flow is limited at present. In this study, we divided seven tree species into two groups according to different photosynthetic capabilities: trees of species with ( Castanopsis hystrix, Michelia macclurei, Eucalyptus citriodora , and Eucalyptus grandis × urophylla ) and without ( Castanopsis fissa, Schima superba , and Acacia auriculiformis ) photosynthetic stems, and the sap flux ( J s ) and chlorophyll fluorescence parameters for these species were measured. One-way ANOVA analysis showed that the F v / F m (Maximum photochemical quantum yield of PSII) and Φ PSII (effective photochemical quantum yield of PSII) values were lower in non-photosynthetic stem species compared to photosynthetic stem species. The linear regression analysis showed that J s,d (daytime sap flux) and J s,n (nighttime sap flux) of non-photosynthetic stem species was 87.7 and 60.9% of the stem photosynthetic species. Furthermore, for a given daytime transpiration water loss, total nighttime sap flux was higher in species with photosynthetic stems (Slope SMA = 2.680) than in non-photosynthetic stems species (Slope SMA = 1.943). These results mean that stem corticular photosynthesis has a possible effect on the nighttime water flow, highlighting the important eco-physiological relationship between nighttime sap flux and corticular photosynthesis.
Tree Species with Photosynthetic Stems Have Greater Nighttime Sap Flux
Chen, Xia; Gao, Jianguo; Zhao, Ping; McCarthy, Heather R.; Zhu, Liwei; Ni, Guangyan; Ouyang, Lei
2018-01-01
An increasing body of evidence has shown that nighttime sap flux occurs in most plants, but the physiological implications and regulatory mechanism are poorly known. The significance of corticular photosynthesis has received much attention during the last decade, however, the knowledge of the relationship between corticular photosynthesis and nocturnal stem sap flow is limited at present. In this study, we divided seven tree species into two groups according to different photosynthetic capabilities: trees of species with (Castanopsis hystrix, Michelia macclurei, Eucalyptus citriodora, and Eucalyptus grandis × urophylla) and without (Castanopsis fissa, Schima superba, and Acacia auriculiformis) photosynthetic stems, and the sap flux (Js) and chlorophyll fluorescence parameters for these species were measured. One-way ANOVA analysis showed that the Fv/Fm (Maximum photochemical quantum yield of PSII) and ΦPSII (effective photochemical quantum yield of PSII) values were lower in non-photosynthetic stem species compared to photosynthetic stem species. The linear regression analysis showed that Js,d (daytime sap flux) and Js,n (nighttime sap flux) of non-photosynthetic stem species was 87.7 and 60.9% of the stem photosynthetic species. Furthermore, for a given daytime transpiration water loss, total nighttime sap flux was higher in species with photosynthetic stems (SlopeSMA = 2.680) than in non-photosynthetic stems species (SlopeSMA = 1.943). These results mean that stem corticular photosynthesis has a possible effect on the nighttime water flow, highlighting the important eco-physiological relationship between nighttime sap flux and corticular photosynthesis. PMID:29416547
Can tree species diversity be assessed with Landsat data in a temperate forest?
Arekhi, Maliheh; Yılmaz, Osman Yalçın; Yılmaz, Hatice; Akyüz, Yaşar Feyza
2017-10-28
The diversity of forest trees as an indicator of ecosystem health can be assessed using the spectral characteristics of plant communities through remote sensing data. The objectives of this study were to investigate alpha and beta tree diversity using Landsat data for six dates in the Gönen dam watershed of Turkey. We used richness and the Shannon and Simpson diversity indices to calculate tree alpha diversity. We also represented the relationship between beta diversity and remotely sensed data using species composition similarity and spectral distance similarity of sampling plots via quantile regression. A total of 99 sampling units, each 20 m × 20 m, were selected using geographically stratified random sampling method. Within each plot, the tree species were identified, and all of the trees with a diameter at breast height (dbh) larger than 7 cm were measured. Presence/absence and abundance data (tree species number and tree species basal area) of tree species were used to determine the relationship between richness and the Shannon and Simpson diversity indices, which were computed with ground field data, and spectral variables derived (2 × 2 pixels and 3 × 3 pixels) from Landsat 8 OLI data. The Shannon-Weiner index had the highest correlation. For all six dates, NDVI (normalized difference vegetation index) was the spectral variable most strongly correlated with the Shannon index and the tree diversity variables. The Ratio of green to red (VI) was the spectral variable least correlated with the tree diversity variables and the Shannon basal area. In both beta diversity curves, the slope of the OLS regression was low, while in the upper quantile, it was approximately twice the lower quantiles. The Jaccard index is closed to one with little difference in both two beta diversity approaches. This result is due to increasing the similarity between the sampling plots when they are located close to each other. The intercept differences between two investigated beta diversity were strongly related to the development stage of a number of sampling plots in the tree species basal area method. To obtain beta diversity, the tree basal area method indicates better result than the tree species number method at representing similarity of regions which are located close together. In conclusion, NDVI is helpful for estimating the alpha diversity of trees over large areas when the vegetation is at the maximum growing season. Beta diversity could be obtained with the spectral heterogeneity of Landsat data. Future tree diversity studies using remote sensing data should select data sets when vegetation is at the maximum growing season. Also, forest tree diversity investigations can be identified by using higher-resolution remote sensing data such as ESA Sentinel 2 data which is freely available since June 2015.
Iturriaga, H; Hirsch, S; Bunout, D; Díaz, M; Kelly, M; Silva, G; de la Maza, M P; Petermann, M; Ugarte, G
1993-04-01
Looking for a noninvasive method to predict liver histologic alterations in alcoholic patients without clinical signs of liver failure, we studied 187 chronic alcoholics recently abstinent, divided in 2 series. In the model series (n = 94) several clinical variables and results of common laboratory tests were confronted to the findings of liver biopsies. These were classified in 3 groups: 1. Normal liver; 2. Moderate alterations; 3. Marked alterations, including alcoholic hepatitis and cirrhosis. Multivariate methods used were logistic regression analysis and a classification and regression tree (CART). Both methods entered gamma-glutamyltransferase (GGT), aspartate-aminotransferase (AST), weight and age as significant and independent variables. Univariate analysis with GGT and AST at different cutoffs were also performed. To predict the presence of any kind of damage (Groups 2 and 3), CART and AST > 30 IU showed the higher sensitivity, specificity and correct prediction, both in the model and validation series. For prediction of marked liver damage, a score based on logistic regression and GGT > 110 IU had the higher efficiencies. It is concluded that GGT and AST are good markers of alcoholic liver damage and that, using sample cutoffs, histologic diagnosis can be correctly predicted in 80% of recently abstinent asymptomatic alcoholics.
Trees grow on money: urban tree canopy cover and environmental justice.
Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G; Zhou, Weiqi; McHale, Melissa; Grove, J Morgan; O'Neil-Dunne, Jarlath; McFadden, Joseph P; Buckley, Geoffrey L; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L
2015-01-01
This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman's correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns.
Willke, Richard J; Zheng, Zhiyuan; Subedi, Prasun; Althin, Rikard; Mullins, C Daniel
2012-12-13
Implicit in the growing interest in patient-centered outcomes research is a growing need for better evidence regarding how responses to a given intervention or treatment may vary across patients, referred to as heterogeneity of treatment effect (HTE). A variety of methods are available for exploring HTE, each associated with unique strengths and limitations. This paper reviews a selected set of methodological approaches to understanding HTE, focusing largely but not exclusively on their uses with randomized trial data. It is oriented for the "intermediate" outcomes researcher, who may already be familiar with some methods, but would value a systematic overview of both more and less familiar methods with attention to when and why they may be used. Drawing from the biomedical, statistical, epidemiological and econometrics literature, we describe the steps involved in choosing an HTE approach, focusing on whether the intent of the analysis is for exploratory, initial testing, or confirmatory testing purposes. We also map HTE methodological approaches to data considerations as well as the strengths and limitations of each approach. Methods reviewed include formal subgroup analysis, meta-analysis and meta-regression, various types of predictive risk modeling including classification and regression tree analysis, series of n-of-1 trials, latent growth and growth mixture models, quantile regression, and selected non-parametric methods. In addition to an overview of each HTE method, examples and references are provided for further reading.By guiding the selection of the methods and analysis, this review is meant to better enable outcomes researchers to understand and explore aspects of HTE in the context of patient-centered outcomes research.
Using Time Series Analysis to Predict Cardiac Arrest in a PICU.
Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P
2015-11-01
To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.
Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils’ carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms—including the model tuning and predictor selection—were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models’ predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction. PMID:27128736
Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.
Hutton, Eileen K; Simioni, Julia C; Thabane, Lehana
2017-08-01
Among women with a fetus with a non-cephalic presentation, external cephalic version (ECV) has been shown to reduce the rate of breech presentation at birth and cesarean birth. Compared with ECV at term, beginning ECV prior to 37 weeks' gestation decreases the number of infants in a non-cephalic presentation at birth. The purpose of this secondary analysis was to investigate factors associated with a successful ECV procedure and to present this in a clinically useful format. Data were collected as part of the Early ECV Pilot and Early ECV2 Trials, which randomized 1776 women with a fetus in breech presentation to either early ECV (34-36 weeks' gestation) or delayed ECV (at or after 37 weeks). The outcome of interest was successful ECV, defined as the fetus being in a cephalic presentation immediately following the procedure, as well as at the time of birth. The importance of several factors in predicting successful ECV was investigated using two statistical methods: logistic regression and classification and regression tree (CART) analyses. Among nulliparas, non-engagement of the presenting part and an easily palpable fetal head were independently associated with success. Among multiparas, non-engagement of the presenting part, gestation less than 37 weeks and an easily palpable fetal head were found to be independent predictors of success. These findings were consistent with results of the CART analyses. Regardless of parity, descent of the presenting part was the most discriminating factor in predicting successful ECV and cephalic presentation at birth. © 2017 Nordic Federation of Societies of Obstetrics and Gynecology.
Chen, Xuexia; Liu, Shuguang; Zhu, Zhiliang; Vogelmann, James E.; Li, Zhengpeng; Ohlen, Donald O.
2011-01-01
The concentrations of CO2 and other greenhouse gases in the atmosphere have been increasing and greatly affecting global climate and socio-economic systems. Actively growing forests are generally considered to be a major carbon sink, but forest wildfires lead to large releases of biomass carbon into the atmosphere. Aboveground forest biomass carbon (AFBC), an important ecological indicator, and fire-induced carbon emissions at regional scales are highly relevant to forest sustainable management and climate change. It is challenging to accurately estimate the spatial distribution of AFBC across large areas because of the spatial heterogeneity of forest cover types and canopy structure. In this study, Forest Inventory and Analysis (FIA) data, Landsat, and Landscape Fire and Resource Management Planning Tools Project (LANDFIRE) data were integrated in a regression tree model for estimating AFBC at a 30-m resolution in the Utah High Plateaus. AFBC were calculated from 225 FIA field plots and used as the dependent variable in the model. Of these plots, 10% were held out for model evaluation with stratified random sampling, and the other 90% were used as training data to develop the regression tree model. Independent variable layers included Landsat imagery and the derived spectral indicators, digital elevation model (DEM) data and derivatives, biophysical gradient data, existing vegetation cover type and vegetation structure. The cross-validation correlation coefficient (r value) was 0.81 for the training model. Independent validation using withheld plot data was similar with r value of 0.82. This validated regression tree model was applied to map AFBC in the Utah High Plateaus and then combined with burn severity information to estimate loss of AFBC in the Longston fire of Zion National Park in 2001. The final dataset represented 24 forest cover types for a 4 million ha forested area. We estimated a total of 353 Tg AFBC with an average of 87 MgC/ha in the Utah High Plateaus. We also estimated that 8054 Mg AFBC were released from 2.24 km2 burned forest area in the Longston fire. These results demonstrate that an AFBC spatial map and estimated biomass carbon consumption can readily be generated using existing database. The methodology provides a consistent, practical, and inexpensive way for estimating AFBC at 30-m resolution over large areas throughout the United States.
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Decision Tree Approach for Soil Liquefaction Assessment
Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498
Decision tree approach for soil liquefaction assessment.
Gandomi, Amir H; Fridline, Mark M; Roke, David A
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.
NASA Astrophysics Data System (ADS)
Dobrowski, S. Z.; Greenberg, J. A.; Schladow, G.
2006-12-01
There is evidence from the Sierra Nevada that sub-alpine and alpine environments are currently experiencing landscape-mediated changes in growth and recruitment due to recent climate change. Understanding the biophysical controls of forest structure, growth, and recruitment in these environments is critical for interpreting and predicting the direction and magnitude of biotic responses to climate shift. We examined the abiotic controls of forest biomass within a 305 km2 region of the Carson Range on the eastern shore of Lake Tahoe, CA USA using estimates of forest structure and biophysical drivers developed continuously over the landscape. The study area ranged from 1900 m to 3400 m a.s.l. and encompassed montane, sub-alpine, and alpine environments. From hyperspatial optical imagery (IKONOS), we derived per-tree positions and crown sizes using a template matching approach applied to a pre-classified image of sunlit and shadowed vegetation pixels. From this remote sensing derived stem map, we calculated plot-level estimates of stem density, tree cover and average crown size. Additionally, we developed high resolution (30 m) estimates of climate variables within the study area using meteorological station data, topographic data, and a combination of empirical and mechanistic modeling approaches. From these climate surfaces, digital elevation data, and soil survey data, we derived estimates of direct and indirect biophysical drivers including heat loading, reference evapotranspiration, water deficit, solar radiation, topographic convergence, soil depth, and soil water holding capacity. Using these data sets, we conducted a regression tree analysis with stem density, tree cover, and average tree size as response and biophysical drivers as predictors. Trees were fit using half of the dataset randomly sampled (168,000 samples) and pruned using cost-complexity pruning based on 10-fold cross- validation. Predictions from pruned trees were then assessed against the hold-out data. Preliminary results from this analysis suggest that: 1) the relative importance and dependencies of biophysical drivers on forest structure are contingent upon the position of these forests along gradients of a limiting resource, 2) stem density shows a stronger dependence on water availability than tree size and 3) that the predictive power of abiotic variables are limited with our best models accounting for only 36-40 percent of the variance in the response. These results suggest that the response of forest structure to climate change may be highly idiosyncratic and difficult to predict using abiotic drivers alone.
NASA Astrophysics Data System (ADS)
Kirby, Nicola Frances; Dempster, Edith Roslyn
2014-11-01
The Foundation Programme of the Centre for Science Access at the University of KwaZulu-Natal, South Africa provides access to tertiary science studies to educationally disadvantaged students who do not meet formal faculty entrance requirements. The low number of students proceeding from the programme into mainstream is of concern, particularly given the national imperative to increase participation and levels of performance in tertiary-level science. An attempt was made to understand foundation student performance in a campus of this university, with the view to identifying challenges and opportunities for remediation in the curriculum and processes of selection into the programme. A classification and regression tree analysis was used to identify which variables best described student performance. The explanatory variables included biographical and school-history data, performance in selection tests, and socio-economic data pertaining to their year in the programme. The results illustrate the prognostic reliability of the model used to select students, raise concerns about the inefficiency of school performance indicators as a measure of students' academic potential in the Foundation Programme, and highlight the importance of accommodation arrangements and financial support for student success in their access year.
Adólfsdóttir, Steinunn; Haász, Judit; Wehling, Eike; Ystad, Martin; Lundervold, Arvid; Lundervold, Astri J
2014-11-01
To investigate brain-behavior relationships between morphometric brain measures and salient executive function (EF) measures of inhibition and switching. One hundred participants (49-80 years) performed the Color Word Interference Test from the Delis-Kaplan Executive Function System (D-KEFS). Salient measures of EF components of inhibition and switching, of which the effect of more fundamental skills were regressed out, were analyzed using linear models and a conditional inference trees analysis taking intercorrelations between predictor variables (brain volumes, age, gender, and education) into account. The conditional inference trees analysis demonstrated a primary role of the middle frontal gyrus (MFG) in explaining variations in the salient EF measure of switching and combined inhibition/switching. Age predicted measures of inhibition. The study highlights the importance of considering fundamental cognitive skills and the use of a statistical method taking possible complex relationships between predictor variables into account when interpreting standard EF test results. Further studies should include MRI measures representing neural networks that may relate to CWIT performance, and longitudinal studies are required to investigate any causal relationships. PsycINFO Database Record (c) 2014 APA, all rights reserved.
ECOPASS - a multivariate model used as an index of growth performance of poplar clones
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceulemans, R.; Impens, I.
The model (ECOlogical PASSport) reported was constructed by principal component analysis from a combination of biochemical, anatomical/morphological and ecophysiological gas exchange parameters measured on 5 fast growing poplar clones. Productivity data were 10 selected trees in 3 plantations in Belgium and given as m.a.i.(b.a.). The model is shown to be able to reflect not only genetic origin and the relative effects of the different parameters of the clones, but also their production potential. Multiple regression analysis of the 4 principal components showed a high cumulative correlation (96%) between the 3 components related to ecophysiological, biochemical and morphological parameters, and productivity;more » the ecophysiological component alone correlated 85% with productivity.« less
Devarajan, Karthik; Parsons, Theodore; Wang, Qiong; O'Neill, Raymond; Solomides, Charalambos; Peiper, Stephen C.; Testa, Joseph R.; Uzzo, Robert; Yang, Haifeng
2017-01-01
Intratumoral heterogeneity (ITH) is a prominent feature of kidney cancer. It is not known whether it has utility in finding associations between protein expression and clinical parameters. We used ITH that is detected by immunohistochemistry (IHC) to aid the association analysis between the loss of SWI/SNF components and clinical parameters.160 ccRCC tumors (40 per tumor stage) were used to generate tissue microarray (TMA). Four foci from different regions of each tumor were selected. IHC was performed against PBRM1, ARID1A, SETD2, SMARCA4, and SMARCA2. Statistical analyses were performed to correlate biomarker losses with patho-clinical parameters. Categorical variables were compared between groups using Fisher's exact tests. Univariate and multivariable analyses were used to correlate biomarker changes and patient survivals. Multivariable analyses were performed by constructing decision trees using the classification and regression trees (CART) methodology. IHC detected widespread ITH in ccRCC tumors. The statistical analysis of the “Truncal loss” (root loss) found additional correlations between biomarker losses and tumor stages than the traditional “Loss in tumor (total)”. Losses of SMARCA4 or SMARCA2 significantly improved prognosis for overall survival (OS). Losses of PBRM1, ARID1A or SETD2 had the opposite effect. Thus “Truncal Loss” analysis revealed hidden links between protein losses and patient survival in ccRCC. PMID:28445125
Sah, Jay P.; Ross, Michael S.; Snyder, James R.; ...
2010-01-01
In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated withmore » tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.« less