regression tree construction: Topics by Science.gov

Sample records for regression tree construction

Estimating tree biomass regressions and their error, proceedings of the workshop on tree biomass regression functions and their contribution to the error

Treesearch

Eric H. Wharton; Tiberius Cunia

1987-01-01

Proceedings of a workshop co-sponsored by the USDA Forest Service, the State University of New York, and the Society of American Foresters. Presented were papers on the methodology of sample tree selection, tree biomass measurement, construction of biomass tables and estimation of their error, and combining the error of biomass tables with that of the sample plots or...
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

PubMed Central

2015-01-01

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects.

PubMed

Shin, Yoonseok

2015-01-01

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.

ERIC Educational Resources Information Center

Kim, Sung-Ho

Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

PubMed Central

Goo, Yeong-Jia James; Shen, Zone-De

2014-01-01

As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

PubMed

Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

2014-01-01

As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Blood oxygen level dependent magnetic resonance imaging for detecting pathological patterns in lupus nephritis patients: a preliminary study using a decision tree model.

PubMed

Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng

2018-02-09

Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
A survival tree method for the analysis of discrete event times in clinical and epidemiological studies.

PubMed

Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard

2016-02-28

Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.
Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

PubMed

Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

2017-10-11

Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rupšys, P.

A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
An Intelligent Decision Support System for Workforce Forecast

DTIC Science & Technology

2011-01-01

ARIMA ) model to forecast the demand for construction skills in Hong Kong. This model was based...Decision Trees ARIMA Rule Based Forecasting Segmentation Forecasting Regression Analysis Simulation Modeling Input-Output Models LP and NLP Markovian...data • When results are needed as a set of easily interpretable rules 4.1.4 ARIMA Auto-regressive, integrated, moving-average ( ARIMA ) models
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

NASA Astrophysics Data System (ADS)

Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar

2018-01-01

Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
[RS estimation of inventory parameters and carbon storage of moso bamboo forest based on synergistic use of object-based image analysis and decision tree].

PubMed

Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie

2017-10-01

By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.
Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

USDA-ARS?s Scientific Manuscript database

In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...
Is Susceptibility to Prenatal Methylmercury Exposure from Fish Consumption Non-Homogeneous? Tree-Structured Analysis for the Seychelles Child Development Study

PubMed Central

Huang, Li-Shan; Myers, Gary J.; Davidson, Philip W.; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W.; Cernichiari, Elsa; Shamlaye, Conrad F.; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W.

2007-01-01

Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age nine years. The analyses for the most recent nine-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated nonlinearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of twenty-one endpoints available at age nine years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other nine-year endpoints that in the linear analysis has a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158
Is susceptibility to prenatal methylmercury exposure from fish consumption non-homogeneous? Tree-structured analysis for the Seychelles Child Development Study.

PubMed

Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W

2007-11-01

Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels.
Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

PubMed

Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A

2018-04-29

Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.
A Comparison of Two Scoring Methods for an Automated Speech Scoring System

ERIC Educational Resources Information Center

Xi, Xiaoming; Higgins, Derrick; Zechner, Klaus; Williamson, David

2012-01-01

This paper compares two alternative scoring methods--multiple regression and classification trees--for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models…
Detection of fraudulent financial statements using the hybrid data mining approach.

PubMed

Chen, Suduan

2016-01-01

The purpose of this study is to construct a valid and rigorous fraudulent financial statement detection model. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2002 and 2013. In the first stage, two decision tree algorithms, including the classification and regression trees (CART) and the Chi squared automatic interaction detector (CHAID) are applied in the selection of major variables. The second stage combines CART, CHAID, Bayesian belief network, support vector machine and artificial neural network in order to construct fraudulent financial statement detection models. According to the results, the detection performance of the CHAID-CART model is the most effective, with an overall accuracy of 87.97 % (the FFS detection accuracy is 92.69 %).
Extensions and applications of ensemble-of-trees methods in machine learning

NASA Astrophysics Data System (ADS)

Bleich, Justin

Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.

Identifying pollution sources and predicting urban air quality using ensemble learning methods

NASA Astrophysics Data System (ADS)

Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali

2013-12-01

In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

PubMed

Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

2017-02-01

At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
A novel dendrochronological approach reveals drivers of carbon sequestration in tree species of riparian forests across spatiotemporal scales.

PubMed

Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne

2017-01-01

Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and locally adapted forest management plans to increase carbon dioxide sequestration from the atmosphere in trees. Copyright © 2016 Elsevier B.V. All rights reserved.
A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

PubMed

Lin, Lei; Wang, Qian; Sadek, Adel W

2016-06-01

The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Tree allometry and improved estimation of carbon stocks and balance in tropical forests.

PubMed

Chave, J; Andalo, C; Brown, S; Cairns, M A; Chambers, J Q; Eamus, D; Fölster, H; Fromard, F; Higuchi, N; Kira, T; Lescure, J-P; Nelson, B W; Ogawa, H; Puig, H; Riéra, B; Yamakura, T

2005-08-01

Tropical forests hold large stores of carbon, yet uncertainty remains regarding their quantitative contribution to the global carbon cycle. One approach to quantifying carbon biomass stores consists in inferring changes from long-term forest inventory plots. Regression models are used to convert inventory data into an estimate of aboveground biomass (AGB). We provide a critical reassessment of the quality and the robustness of these models across tropical forest types, using a large dataset of 2,410 trees >or= 5 cm diameter, directly harvested in 27 study sites across the tropics. Proportional relationships between aboveground biomass and the product of wood density, trunk cross-sectional area, and total height are constructed. We also develop a regression model involving wood density and stem diameter only. Our models were tested for secondary and old-growth forests, for dry, moist and wet forests, for lowland and montane forests, and for mangrove forests. The most important predictors of AGB of a tree were, in decreasing order of importance, its trunk diameter, wood specific gravity, total height, and forest type (dry, moist, or wet). Overestimates prevailed, giving a bias of 0.5-6.5% when errors were averaged across all stands. Our regression models can be used reliably to predict aboveground tree biomass across a broad range of tropical forests. Because they are based on an unprecedented dataset, these models should improve the quality of tropical biomass estimates, and bring consensus about the contribution of the tropical forest biome and tropical deforestation to the global carbon cycle.
Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

Treesearch

Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

2006-01-01

We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
The process and utility of classification and regression tree methodology in nursing research

PubMed Central

Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

2014-01-01

Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048
The process and utility of classification and regression tree methodology in nursing research.

PubMed

Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

2014-06-01

This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

PubMed Central

Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

2012-01-01

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Boosted Regression Trees Outperforms Support Vector Machines in Predicting (Regional) Yields of Winter Wheat from Single and Cumulated Dekadal Spot-VGT Derived Normalized Difference Vegetation Indices

NASA Astrophysics Data System (ADS)

Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos

2016-08-01

This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Personalized Modeling for Prediction with Decision-Path Models

PubMed Central

Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.

2015-01-01

Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

PubMed

Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

2017-01-01

Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan's construction industry.

PubMed

Cheng, Ching-Wu; Leu, Sou-Sen; Cheng, Ying-Mei; Wu, Tsung-Chih; Lin, Chen-Chung

2012-09-01

Construction accident research involves the systematic sorting, classification, and encoding of comprehensive databases of injuries and fatalities. The present study explores the causes and distribution of occupational accidents in the Taiwan construction industry by analyzing such a database using the data mining method known as classification and regression tree (CART). Utilizing a database of 1542 accident cases during the period 2000-2009, the study seeks to establish potential cause-and-effect relationships regarding serious occupational accidents in the industry. The results of this study show that the occurrence rules for falls and collapses in both public and private project construction industries serve as key factors to predict the occurrence of occupational injuries. The results of the study provide a framework for improving the safety practices and training programs that are essential to protecting construction workers from occasional or unexpected accidents. Copyright © 2011 Elsevier Ltd. All rights reserved.
Undergraduate Students’ Difficulties in Reading and Constructing Phylogenetic Tree

NASA Astrophysics Data System (ADS)

Sa'adah, S.; Tapilouw, F. S.; Hidayat, T.

2017-02-01

Representation is a very important communication tool to communicate scientific concepts. Biologists produce phylogenetic representation to express their understanding of evolutionary relationships. The phylogenetic tree is visual representation depict a hypothesis about the evolutionary relationship and widely used in the biological sciences. Phylogenetic tree currently growing for many disciplines in biology. Consequently, learning about phylogenetic tree become an important part of biological education and an interesting area for biology education research. However, research showed many students often struggle with interpreting the information that phylogenetic trees depict. The purpose of this study was to investigate undergraduate students’ difficulties in reading and constructing a phylogenetic tree. The method of this study is a descriptive method. In this study, we used questionnaires, interviews, multiple choice and open-ended questions, reflective journals and observations. The findings showed students experiencing difficulties, especially in constructing a phylogenetic tree. The students’ responds indicated that main reasons for difficulties in constructing a phylogenetic tree are difficult to placing taxa in a phylogenetic tree based on the data provided so that the phylogenetic tree constructed does not describe the actual evolutionary relationship (incorrect relatedness). Students also have difficulties in determining the sister group, character synapomorphy, autapomorphy from data provided (character table) and comparing among phylogenetic tree. According to them building the phylogenetic tree is more difficult than reading the phylogenetic tree. Finding this studies provide information to undergraduate instructor and students to overcome learning difficulties of reading and constructing phylogenetic tree.
Boosted regression tree, table, and figure data

EPA Pesticide Factsheets

Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).
A review of logistic regression models used to predict post-fire tree mortality of western North American conifers

Treesearch

Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald

2012-01-01

Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
The effect of using genealogy-based haplotypes for genomic prediction

PubMed Central

2013-01-01

Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971
The effect of using genealogy-based haplotypes for genomic prediction.

PubMed

Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

2013-03-06

Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.
Using traveling salesman problem algorithms for evolutionary tree construction.

PubMed

Korostensky, C; Gonnet, G H

2000-07-01

The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.
New star on the stage: amount of ray parenchyma in tree rings shows a link to climate.

PubMed

Olano, José Miguel; Arzac, Alberto; García-Cervigón, Ana I; von Arx, Georg; Rozas, Vicente

2013-04-01

Tree-ring anatomy reflects the year-by-year impact of environmental factors on tree growth. Up to now, research in this field has mainly focused on the hydraulic architecture, with ray parenchyma neglected despite the growing recognition of its relevance for xylem function. Our aim was to address this gap by exploring the potential of the annual patterns of xylem parenchyma as a climate proxy. We constructed ring-width and ray-parenchyma chronologies from 1965 to 2004 for 20 Juniperus thurifera trees growing in a Mediterranean continental climate. Chronologies were related to climate records by means of correlation, multiple regression and partial correlation analyses. Ray parenchyma responded to climatic conditions at critical stages during the xylogenetic process; namely, at the end of the previous year's xylogenesis (October) and at the onset of earlywood (May) and latewood formation (August). Ray parenchyma-based chronologies have potential to complement ring-width chronologies as a tool for climate reconstructions. Furthermore, medium- and low-frequency signals in the variation of ray parenchyma may improve our understanding of how trees respond to environmental fluctuations and to global change. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.

A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods.

PubMed

Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R

2014-04-01

The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.
Predicting Potential Changes in Suitable Habitat and Distribution by 2100 for Tree Species of the Eastern United States

Treesearch

Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz

2005-01-01

We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...
Association between split selection instability and predictive error in survival trees.

PubMed

Radespiel-Tröger, M; Gefeller, O; Rabenstein, T; Hothorn, T

2006-01-01

To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study. We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node. We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error. The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.
Estimating parameters for tree basal area growth with a system of equations and seemingly unrelated regressions

Treesearch

Charles E. Rose; Thomas B. Lynch

2001-01-01

A method was developed for estimating parameters in an individual tree basal area growth model using a system of equations based on dbh rank classes. The estimation method developed is a compromise between an individual tree and a stand level basal area growth model that accounts for the correlation between trees within a plot by using seemingly unrelated regression (...
Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

Treesearch

Susan L. King

2003-01-01

The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Modeling the prediction of business intelligence system effectiveness.

PubMed

Weng, Sung-Shun; Yang, Ming-Hsien; Koo, Tian-Lih; Hsiao, Pei-I

2016-01-01

Although business intelligence (BI) technologies are continually evolving, the capability to apply BI technologies has become an indispensable resource for enterprises running in today's complex, uncertain and dynamic business environment. This study performed pioneering work by constructing models and rules for the prediction of business intelligence system effectiveness (BISE) in relation to the implementation of BI solutions. For enterprises, effectively managing critical attributes that determine BISE to develop prediction models with a set of rules for self-evaluation of the effectiveness of BI solutions is necessary to improve BI implementation and ensure its success. The main study findings identified the critical prediction indicators of BISE that are important to forecasting BI performance and highlighted five classification and prediction rules of BISE derived from decision tree structures, as well as a refined regression prediction model with four critical prediction indicators constructed by logistic regression analysis that can enable enterprises to improve BISE while effectively managing BI solution implementation and catering to academics to whom theory is important.
Relationship between leaf functional traits and productivity in Aquilaria crassna (Thymelaeaceae) plantations: a tool to aid in the early selection of high-yielding trees.

PubMed

López-Sampson, Arlene; Cernusak, Lucas A; Page, Tony

2017-05-01

Physiological traits are frequently used as indicators of tree productivity. Aquilaria species growing in a research planting were studied to investigate relationships between leaf-productivity traits and tree growth. Twenty-eight trees were selected to measure isotopic composition of carbon (δ13C) and nitrogen (δ15N) and monitor six leaf attributes. Trees were sampled randomly within each of four diametric classes (at 150 mm above ground level) ensuring the variability in growth of the whole population was represented. A model averaging technique based on the Akaike's information criterion was computed to identify whether leaf traits could assist in diameter prediction. Regression analysis was performed to test for relationships between carbon isotope values and diameter and leaf traits. Approximately one new leaf per week was produced by a shoot. The rate of leaf expansion was estimated as 1.45 mm day-1. The range of δ13C values in leaves of Aquilaria species was from -25.5‰ to -31‰, with an average of -28.4 ‰ (±1.5‰ SD). A moderate negative correlation (R2 = 0.357) between diameter and δ13C in leaf dry matter indicated that individuals with high intercellular CO2 concentrations (low δ13C) and associated low water-use efficiency sustained rapid growth. Analysis of the 95% confidence of best-ranked regression models indicated that the predictors that could best explain growth in Aquilaria species were δ13C, δ15N, petiole length, number of new leaves produced per week and specific leaf area. The model constructed with these variables explained 55% (R2 = 0.55) of the variability in stem diameter. This demonstrates that leaf traits can assist in the early selection of high-productivity trees in Aquilaria species. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
[Introduction and advantage analysis of the stepwise method for the construction of vascular trees].

PubMed

Zhang, Yan; Xie, Haiwei; Zhu, Kai

2010-08-01

A new method for constructing the model of vascular trees was proposed in this paper. By use of this method, the arterial trees in good agreement with the actual structure could be grown. In this process, all vessels in the vascular tree were divided into two groups: the conveying vessels, and the delivering branches. And different branches could be built by different ways. Firstly, the distributing rules of conveying vessels were ascertained by use of measurement data, and then the conveying vessels were constructed in accordance to the statistical rule and optimization criterion. Lastly, delivering branches were modeled by constrained constructive optimization (CCO) on the conveying vessel-trees which had already been generated. In order to compare the CCO method and stepwise method proposed here, two 3D arterial trees of human tongue were grown with their vascular tree having a special structure. Based on the corrosion casts of real arterial tree of human tongue, the data about the two trees constructed by different methods were compared and analyzed, including the averaged segment diameters at respective levels, the distribution and the diameters of the branches of first level at respective directions. The results show that the vascular tree built by stepwise method is more similar to the true arterial of human tongue when compared against the tree built by CCO method.
[Satellite remote sensing retrieval of canopy nitrogen nutritional status of apple trees at blossom stage].

PubMed

Wang, Ling; Zhao, Geng-Xing; Zhu, Xi-Cun; Wang, Rui-Yan; Chang, Chun-Yan

2013-10-01

Taking Qixia City of Shandong, China as the study area, and based on the Landsat-5 TM and ALOS AVNIR-2 images, the canopy retrieval reflectance of apple trees at blossom stage was acquired. In combining with the measured reflectance of sample trees, the nitrogen-sensitive spectral indices were constructed and selected. By using the sensitive spectral indices as the independent variables, the nitrogen retrieval models were established, and the model with the best accuracy was used for spatial retrieve. The correlations between the spectral indices and the nitrogen nutritional status were in the order of canopy > leaf > flower. The sensitive indices were mainly composed of green, red, and near infrared bands. The accuracy of the retrieval models was in the order of support vector regression > multi-variable stepwise regression > one-variable regression. The retrieval results based on different images were similar, and showed that the leaf nitrogen content was mainly of grades 3-4 (27-33 g x kg(-1)), and the canopy nitrogen nutrient indices were mainly of grades 2-4 (TM: 38-47 g x kg(-1); ALOS: 32-41 g x kg(-1)). The spatial distribution of the retrieval nitrogen nutritional status based on different images also showed the similar trend, i. e., the nitrogen nutritional status was higher in the north and south than that in the middle part of the study area, and the areas with the high grades of leaf nitrogen and canopy nitrogen were mainly located in Sujiadian Town and Songshan subdistrict in the northwest, Zangjiazhuang Town and Tingkou Town in the northeast, and Shewopo Town in the south, which were consistent with the distribution of the key towns for apple production in Qixia City. This study provided a feasible method for the acquisition of nitrogen nutritional status of apple trees on macroscopic scale, and also, provided reference for other similar remote sensing retrievals.
A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods

PubMed Central

Duncan, Dustin T.; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A.; Arbia, Giuseppe; Castro, Marcia C.; White, Kellee; Williams, David R.

2017-01-01

The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran’s I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran’s I range from 0.24 to 0.86, all P=0.001), for tree density (Global Moran’s I=0.452, P=0.001), and in the OLS regression residuals (Global Moran’s I range from 0.32 to 0.38, all P<0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (rS=−0.19; conventional P-value=0.016; spatially adjusted P-value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (rS=−0.18; conventional P-value=0.019; spatially adjusted P-value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed. PMID:29354668
A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

USGS Publications Warehouse

Huang, C.; Townshend, J.R.G.

2003-01-01

A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

ERIC Educational Resources Information Center

Koon, Sharon; Petscher, Yaacov

2015-01-01

The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.

PubMed

Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

2010-08-01

Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Tree-ring growth of Scots pine, Common beech and Pedunculate oak under future climate in northeastern Germany

NASA Astrophysics Data System (ADS)

Jurasinski, Gerald; Scharnweber, Tobias; Schröder, Christian; Lennartz, Bernd; Bauwe, Andreas

2017-04-01

Tree growth depends, among other factors, largely on the prevailing climatic conditions. Therefore, tree growth patterns are to be expected under climate change. Here, we analyze the tree-ring growth response of three major European tree species to projected future climate across a climatic (mostly precipitation) gradient in northeastern Germany. We used monthly data for temperature, precipitation, and the standardized precipitation evapotranspiration index (SPEI) over multiple time scales (1, 3, 6, 12, and 24 months) to construct models of tree-ring growth for Scots pine (Pinus syl- vestris L.) at three pure stands, and for Common beech (Fagus sylvatica L.) and Pedunculate oak (Quercus robur L.) at three mature mixed stands. The regression models were derived using a two-step approach based on partial least squares regression (PLSR) to extract potentially well explaining variables followed by ordinary least squares regression (OLSR) to consolidate the models to the least number of variables while retaining high explanatory power. The stability of the models was tested with a comprehensive calibration-verification scheme. All models were successfully verified with R2s ranging from 0.21 for the western pine stand to 0.62 for the beech stand in the east. For growth prediction, climate data forecasted until 2100 by the regional climate model WETTREG2010 based on the A1B Intergovernmental Panel on Climate Change (IPCC) emission scenario was used. For beech and oak, growth rates will likely decrease until the end of the 21st century. For pine, modeled growth trends vary and range from a slight growth increase to a weak decrease in growth rates depending on the position along the climatic gradient. The climatic gradient across the study area will possibly affect the future growth of oak with larger growth reductions towards the drier east. For beech, site-specific adaptations seem to override the influence of the climatic gradient. We conclude that in Northeastern Germany Scots pine has great potential to remain resilient to projected climate change without any greater impairment, whereas Common beech and Pedunculate oak will likely face lesser growth under the expected warmer and dryer climate conditions. The results call for an adaptation of forest management to mitigate the negative effects of climate change for beech and oak in the region.
Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

PubMed

Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

2014-12-01

Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.

PubMed

Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H

2014-02-26

Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
Binary Decision Trees for Preoperative Periapical Cyst Screening Using Cone-beam Computed Tomography.

PubMed

Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa

2017-03-01

Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.

ERIC Educational Resources Information Center

Brewer, Steven D.

This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…
Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

PubMed

Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

2017-11-15

A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.

PubMed

Chung, Yi-Shih

2013-12-01

Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.

Dynamic travel time estimation using regression trees.

DOT National Transportation Integrated Search

2008-10-01

This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...
Probability of infestation and extent of mortality associated with the Douglas-fir beetle in the Colorado Front Range

Treesearch

Jose F. Negron

1998-01-01

Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...
Using nonlinear quantile regression to estimate the self-thinning boundary curve

Treesearch

Quang V. Cao; Thomas J. Dean

2015-01-01

The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...
The stopping rules for winsorized tree

NASA Astrophysics Data System (ADS)

Ch'ng, Chee Keong; Mahat, Nor Idayu

2017-11-01

Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.
Predicting the limits to tree height using statistical regressions of leaf traits.

PubMed

Burgess, Stephen S O; Dawson, Todd E

2007-01-01

Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized.
Dendroclimatic estimates of a drought index for northern Virginia

USGS Publications Warehouse

Puckett, Larry J.

1981-01-01

A 230-year record of the Palmer drought-severity index (PDSI) was estimated for northern Virginia from variations in widths of tree rings. Increment cores were extracted from eastern hemlock, Tsuga canadensis (L.) Carr., at three locations in northern Virginia. Measurements of annual growth increments were made and converted to standardized indices of growth. A response function was derived for hemlock to determine the growth-climate relationship. Growth was positively correlated with precipitation and negatively correlated with temperature during the May-July growing season. Combined standardized indices of growth were calibrated with the July PDSI. Growth accounted for 20-30 percent of the PDSI variance. Further regressions using factor scores of combined tree growth indices resulted in a small but significant improvement. Greatest improvement was made by using factor scores of growth indices of individual trees, thereby accounting for 64 percent of the July PDSI variance in the regression. Comparison of the results with a 241-year reconstruction from New York showed good agreement between low-frequency climatic trends. Analysis of the estimated Central Mountain climatic division of Virginia PDSI record indicated that, relative to the long-term record (1746-1975), dry years have occurred in disproportionally larger numbers during the last half of the 19th century and the mid-20th century. This trend appears reversed for the last half of the 18th century and the first half of the 19th century. Although these results are considered first-generation products, they are encouraging, suggesting that once additional tree-ring chronologies are constructed and techniques are refined, it will be possible to obtain more accurate estimates of prior climatic conditions in the mid-Atlantic region.
New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

Treesearch

L.R. Iverson; A.M. Prasad; A. Liaw

2004-01-01

More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii

Treesearch

Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie

1988-01-01

Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...
Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

Treesearch

Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

2009-01-01

Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Analyzing and synthesizing phylogenies using tree alignment graphs.

PubMed

Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E

2013-01-01

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs

PubMed Central

Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.

2013-01-01

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118
Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

PubMed

Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

2016-01-01

Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Policy Implications and Suggestions on Administrative Measures of Urban Flood

NASA Astrophysics Data System (ADS)

Lee, S. V.; Lee, M. J.; Lee, C.; Yoon, J. H.; Chae, S. H.

2017-12-01

The frequency and intensity of floods are increasing worldwide as recent climate change progresses gradually. Flood management should be policy-oriented in urban municipalities due to the characteristics of urban areas with a lot of damage. Therefore, the purpose of this study is to prepare a flood susceptibility map by using data mining model and make a policy suggestion on administrative measures of urban flood. Therefore, we constructed a spatial database by collecting relevant factors including the topography, geology, soil and land use data of the representative city, Seoul, the capital city of Korea. Flood susceptibility map was constructed by applying the data mining models of random forest and boosted tree model to input data and existing flooded area data in 2010. The susceptibility map has been validated using the 2011 flood area data which was not used for training. The predictor importance value of each factor to the results was calculated in this process. The distance from the water, DEM and geology showed a high predictor importance value which means to be a high priority for flood preparation policy. As a result of receiver operating characteristic (ROC), random forest model showed 78.78% and 79.18% accuracy of regression and classification and boosted tree model showed 77.55% and 77.26% accuracy of regression and classification, respectively. The results show that the flood susceptibility maps can be applied to flood prevention and management, and it also can help determine the priority areas for flood mitigation policy by providing useful information to policy makers.
Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

PubMed

Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

2017-06-01

Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

EPA Science Inventory

Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Differences in Risk Factors for Rotator Cuff Tears between Elderly Patients and Young Patients.

PubMed

Watanabe, Akihisa; Ono, Qana; Nishigami, Tomohiko; Hirooka, Takahiko; Machida, Hirohisa

2018-02-01

It has been unclear whether the risk factors for rotator cuff tears are the same at all ages or differ between young and older populations. In this study, we examined the risk factors for rotator cuff tears using classification and regression tree analysis as methods of nonlinear regression analysis. There were 65 patients in the rotator cuff tears group and 45 patients in the intact rotator cuff group. Classification and regression tree analysis was performed to predict rotator cuff tears. The target factor was rotator cuff tears; explanatory variables were age, sex, trauma, and critical shoulder angle≥35°. In the results of classification and regression tree analysis, the tree was divided at age 64. For patients aged≥64, the tree was divided at trauma. For patients aged<64, the tree was divided at critical shoulder angle≥35°. The odds ratio for critical shoulder angle≥35° was significant for all ages (5.89), and for patients aged<64 (10.3) while trauma was only a significant factor for patients aged≥64 (5.13). Age, trauma, and critical shoulder angle≥35° were related to rotator cuff tears in this study. However, these risk factors showed different trends according to age group, not a linear relationship.
BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.

PubMed

Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu

2013-09-15

Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.
Constructing phylogenetic trees using interacting pathways.

PubMed

Wan, Peng; Che, Dongsheng

2013-01-01

Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms-eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.
MASTtreedist: visualization of tree space based on maximum agreement subtree.

PubMed

Huang, Hong; Li, Yongji

2013-01-01

Phylogenetic tree construction process might produce many candidate trees as the "best estimates." As the number of constructed phylogenetic trees grows, the need to efficiently compare their topological or physical structures arises. One of the tree comparison's software tools, the Mesquite's Tree Set Viz module, allows the rapid and efficient visualization of the tree comparison distances using multidimensional scaling (MDS). Tree-distance measures, such as Robinson-Foulds (RF), for the topological distance among different trees have been implemented in Tree Set Viz. New and sophisticated measures such as Maximum Agreement Subtree (MAST) can be continuously built upon Tree Set Viz. MAST can detect the common substructures among trees and provide more precise information on the similarity of the trees, but it is NP-hard and difficult to implement. In this article, we present a practical tree-distance metric: MASTtreedist, a MAST-based comparison metric in Mesquite's Tree Set Viz module. In this metric, the efficient optimizations for the maximum weight clique problem are applied. The results suggest that the proposed method can efficiently compute the MAST distances among trees, and such tree topological differences can be translated as a scatter of points in two-dimensional (2D) space. We also provide statistical evaluation of provided measures with respect to RF-using experimental data sets. This new comparison module provides a new tree-tree pairwise comparison metric based on the differences of the number of MAST leaves among constructed phylogenetic trees. Such a new phylogenetic tree comparison metric improves the visualization of taxa differences by discriminating small divergences of subtree structures for phylogenetic tree reconstruction.
Encoding phylogenetic trees in terms of weighted quartets.

PubMed

Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles

2008-04-01

One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.

Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

PubMed Central

Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

2016-01-01

We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
Individualized Prediction of Heat Stress in Firefighters: A Data-Driven Approach Using Classification and Regression Trees.

PubMed

Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit

2015-01-01

The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.
Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

PubMed Central

Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

2014-01-01

Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Dendrometer bands made easy: using modified cable ties to measure incremental growth of trees

USGS Publications Warehouse

Anemaet, Evelyn R.; Middleton, Beth A.

2013-01-01

Dendrometer bands are a useful way to make sequential repeated measurements of tree growth, but traditional dendrometer bands can be expensive, time consuming, and difficult to construct in the field. An alternative to the traditional method of band construction is to adapt commercially available materials. This paper describes how to construct and install dendrometer bands using smooth-edged, stainless steel, cable tie banding and attachable rollerball heads. As a performance comparison, both traditional and cable tie dendrometer bands were installed on baldcypress trees at the National Wetlands Research Center in Lafayette, Louisiana, by both an experienced and a novice worker. Band installation times were recorded, and growth of the trees as estimated by the two band types was measured after approximately one year, demonstrating equivalence of the two methods. This efficient approach to dendrometer band construction can help advance the knowledge of long-term tree growth in ecological studies.
A self-trained classification technique for producing 30 m percent-water maps from Landsat data

USGS Publications Warehouse

Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei

2010-01-01

Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.
Scalable Regression Tree Learning on Hadoop using OpenPlanet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor

As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
What contributes to perceived stress in later life? A recursive partitioning approach.

PubMed

Scott, Stacey B; Jackson, Brenda R; Bergeman, C S

2011-12-01

One possible explanation for the individual differences in outcomes of stress is the diversity of inputs that produce perceptions of being stressed. The current study examines how combinations of contextual features (e.g., social isolation, neighborhood quality, health problems, age discrimination, financial concerns, and recent life events) of later life contribute to overall feelings of stress. Recursive partitioning techniques (regression trees and random forests) were used to examine unique interrelations between predictors of perceived stress in a sample of 282 community-dwelling adults. Trees provided possible examples of equifinality (i.e., subsets of people with similar levels of perceived stress but different predictors) as well as identification both of contextual combinations that separated participants with very high and very low perceived stress. Random forest analyses aggregated across many trees based on permuted versions of the data and predictors; loneliness, financial strain, neighborhood strain, ageism, and to some extent life events emerged as important predictors. Interviews with a subsample of participants provided both thick description of the complex relationships identified in the trees, as well as additional risks not appearing in the survey results. Together, the analyses highlight what may be missed when stress is used as a simple unidimensional construct and can guide differential intervention efforts.
What contributes to perceived stress in later life? A recursive partitioning approach

PubMed Central

Scott, Stacey B.; Jackson, Brenda R.; Bergeman, C. S.

2011-01-01

One possible explanation for the individual differences in outcomes of stress is the diversity of inputs that produce perceptions of being stressed. The current study examines how combinations of contextual features (e.g., social isolation, neighborhood quality, health problems, age discrimination, financial concerns, and recent life events) of later life contribute to overall feelings of stress. Recursive partitioning techniques (regression trees and random forests) were used to examine unique interrelations between predictors of perceived stress in a sample of 282 community-dwelling adults. Trees provided possible examples of equifinality (i.e., subsets of people with similar levels of perceived stress but different predictors) as well as for the identification both of contextual combinations that separated participants with very high and very low perceived stress. Random forest analyses aggregated across many trees based on permuted versions of the data and predictors; loneliness, financial strain, neighborhood strain, ageism, and to some extent life events emerged as important predictors. Interviews with a subsample of participants provided both thick description of the complex relationships identified in the trees, as well as additional risks not appearing in the survey results. Together, the analyses highlight what may be missed when stress is used as a simple unidimensional construct and can guide differential intervention efforts. PMID:21604885
Compound analysis via graph kernels incorporating chirality.

PubMed

Brown, J B; Urata, Takashi; Tamura, Takeyuki; Arai, Midori A; Kawabata, Takeo; Akutsu, Tatsuya

2010-12-01

High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.
Method for estimating potential tree-grade distributions for northeastern forest species

Treesearch

Daniel A. Yaussy; Daniel A. Yaussy

1993-01-01

Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...
Additivity of nonlinear biomass equations

Treesearch

Bernard R. Parresol

2001-01-01

Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...
Explanatory Power of Multi-scale Physical Descriptors in Modeling Benthic Indices Across Nested Ecoregions of the Pacific Northwest

NASA Astrophysics Data System (ADS)

Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.

2005-05-01

Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.
A retrospective analysis to identify the factors affecting infection in patients undergoing chemotherapy.

PubMed

Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung

2015-12-01

This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data

USGS Publications Warehouse

Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis

2016-01-01

Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
A new algorithm to construct phylogenetic networks from trees.

PubMed

Wang, J

2014-03-06

Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.
75 FR 5941 - Umatilla National Forest, Walla Walla Ranger District, Walla Walla, WA; Cobbler II Timber Sale...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-02-05

... construction (that will be decommissioned after project use), new road construction, danger tree removal along... increasing population. Late seral tree species have become dominant after long periods without disturbance... and vigor. Timber stands of seral tree species such as western larch and ponderosa pine are infilling...
Reinforcement Learning Trees

PubMed Central

Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

2015-01-01

In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Reconstructing missing daily precipitation data using regression trees and artificial neural networks

USDA-ARS?s Scientific Manuscript database

Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
Reconstructing missing daily precipitation data using regression trees and artificial neural networks

USDA-ARS?s Scientific Manuscript database

Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...
Alternative construction of graceful symmetric trees

NASA Astrophysics Data System (ADS)

Sandy, I. P.; Rizal, A.; Manurung, E. N.; Sugeng, K. A.

2018-04-01

Graceful labeling is one of the interesting topics in graph theory. Let G = (V, E) be a tree. The injective mapping f:V\\to \\{0,1,\\ldots,|E|\\} is called graceful if the weight of edge w(xy)=|f(x)-f(y)| are all different for every edge xy. The famous conjecture in this area is all trees are graceful. In this paper we give alternative construction of graceful labeling on symmetric tree using adjacency matrix.

Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree

NASA Astrophysics Data System (ADS)

Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao

2016-08-01

Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
Weighing risk factors associated with bee colony collapse disorder by classification and regression tree analysis.

PubMed

VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude

2010-10-01

Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

NASA Astrophysics Data System (ADS)

Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

2017-02-01

Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Tree STEM and Canopy Biomass Estimates from Terrestrial Laser Scanning Data

NASA Astrophysics Data System (ADS)

Olofsson, K.; Holmgren, J.

2017-10-01

In this study an automatic method for estimating both the tree stem and the tree canopy biomass is presented. The point cloud tree extraction techniques operate on TLS data and models the biomass using the estimated stem and canopy volume as independent variables. The regression model fit error is of the order of less than 5 kg, which gives a relative model error of about 5 % for the stem estimate and 10-15 % for the spruce and pine canopy biomass estimates. The canopy biomass estimate was improved by separating the models by tree species which indicates that the method is allometry dependent and that the regression models need to be recomputed for different areas with different climate and different vegetation.
Development of Interpretable Predictive Models for BPH and Prostate Cancer.

PubMed

Bermejo, Pablo; Vivo, Alicia; Tárraga, Pedro J; Rodríguez-Montes, J A

2015-01-01

Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. Statistical dependence with PC and BPH was found for prostate volume (P-value < 0.001), PSA (P-value < 0.001), international prostate symptom score (IPSS; P-value < 0.001), digital rectal examination (DRE; P-value < 0.001), age (P-value < 0.002), antecedents (P-value < 0.006), and meat consumption (P-value < 0.08). The two predictive models that were constructed selected a subset of these, namely, volume, PSA, DRE, and IPSS, obtaining an area under the ROC curve (AUC) between 72% and 80% for both PC and BPH prediction. PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced.
Dendrometer bands made easy: Using modified cable ties to measure incremental growth of trees1

PubMed Central

Anemaet, Evelyn R.; Middleton, Beth A.

2013-01-01

• Premise of the study: Dendrometer bands are a useful way to make sequential repeated measurements of tree growth, but traditional dendrometer bands can be expensive, time consuming, and difficult to construct in the field. An alternative to the traditional method of band construction is to adapt commercially available materials. This paper describes how to construct and install dendrometer bands using smooth-edged, stainless steel, cable tie banding and attachable rollerball heads. • Methods and Results: As a performance comparison, both traditional and cable tie dendrometer bands were installed on baldcypress trees at the National Wetlands Research Center in Lafayette, Louisiana, by both an experienced and a novice worker. Band installation times were recorded, and growth of the trees as estimated by the two band types was measured after approximately one year, demonstrating equivalence of the two methods. • Conclusions: This efficient approach to dendrometer band construction can help advance the knowledge of long-term tree growth in ecological studies. PMID:25202589
Constructing Student Problems in Phylogenetic Tree Construction.

ERIC Educational Resources Information Center

Brewer, Steven D.

Evolution is often equated with natural selection and is taught from a primarily functional perspective while comparative and historical approaches, which are critical for developing an appreciation of the power of evolutionary theory, are often neglected. This report describes a study of expert problem-solving in phylogenetic tree construction.…
Methods for estimating population density in data-limited areas: evaluating regression and tree-based models in Peru.

PubMed

Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

2014-01-01

Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru

PubMed Central

Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

2014-01-01

Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
Identifying Risk Factors for Drug Use in an Iranian Treatment Sample: A Prediction Approach Using Decision Trees.

PubMed

Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid

2018-05-12

Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.
Generalized and synthetic regression estimators for randomized branch sampling

Treesearch

David L. R. Affleck; Timothy G. Gregoire

2015-01-01

In felled-tree studies, ratio and regression estimators are commonly used to convert more readily measured branch characteristics to dry crown mass estimates. In some cases, data from multiple trees are pooled to form these estimates. This research evaluates the utility of both tactics in the estimation of crown biomass following randomized branch sampling (...
Cloud-Free Satellite Image Mosaics with Regression Trees and Histogram Matching.

Treesearch

E.H. Helmer; B. Ruefenacht

2005-01-01

Cloud-free optical satellite imagery simplifies remote sensing, but land-cover phenology limits existing solutions to persistent cloudiness to compositing temporally resolute, spatially coarser imagery. Here, a new strategy for developing cloud-free imagery at finer resolution permits simple automatic change detection. The strategy uses regression trees to predict...
Regression estimators for late-instar gypsy moth larvae at low pupulation densities

Treesearch

W.E. Wallnr; A.S. Devito; Stanley J. Zarnoch

1989-01-01

Two regression estimators were developed for determining densities of late-instar gypsy moth, Lymantria dispar (Lepidoptera: Lymantriidae), larvae from burlap band and pyrethrin spray counts on oak trees in Vermont, Massachusetts, Connecticut, and New York. Studies were conducted by marking larvae on individual burlap banded trees within 15...
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

ERIC Educational Resources Information Center

Thomas, Emily H.; Galambos, Nora

2004-01-01

To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
VCFtoTree: a user-friendly tool to construct locus-specific alignments and phylogenies from thousands of anthropologically relevant genome sequences.

PubMed

Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer

2017-09-26

Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .
Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

PubMed

Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

2015-01-01

Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Regression analysis using dependent Polya trees.

PubMed

Schörgendorfer, Angela; Branscum, Adam J

2013-11-30

Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Comparison of modeling methods to predict the spatial distribution of deep-sea coral and sponge in the Gulf of Alaska

NASA Astrophysics Data System (ADS)

Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.

2017-08-01

Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
DIF Trees: Using Classification Trees to Detect Differential Item Functioning

ERIC Educational Resources Information Center

Vaughn, Brandon K.; Wang, Qiu

2010-01-01

A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

PubMed Central

Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip

2015-01-01

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304

Automatic creation of object hierarchies for ray tracing

NASA Technical Reports Server (NTRS)

Goldsmith, Jeffrey; Salmon, John

1987-01-01

Various methods for evaluating generated trees are proposed. The use of the hierarchical extent method of Rubin and Whitted (1980) to find the objects that will be hit by a ray is examined. This method employs tree searching; the construction of a tree of bounding volumes in order to determine the number of objects that will be hit by a ray is discussed. A tree generation algorithm, which uses a heuristic tree search strategy, is described. The effects of shuffling and sorting on the input data are investigated. The cost of inserting an object into the hierarchy during the construction of a tree algorithm is estimated. The steps involved in estimating the number of intersection calculations are presented.
Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

PubMed

Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

2017-01-01

Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Mortality predictions of fire-injured large Douglas-fir and ponderosa pine in Oregon and Washington, USA

Treesearch

Lisa M. Ganio; Robert A. Progar

2017-01-01

Wild and prescribed fire-induced injury to forest trees can produce immediate or delayed tree mortality but fire-injured trees can also survive. Land managers use logistic regression models that incorporate tree-injury variables to discriminate between fatally injured trees and those that will survive. We used data from 4024 ponderosa pine (Pinus ponderosa...
Effects of Phylogenetic Tree Style on Student Comprehension

NASA Astrophysics Data System (ADS)

Dees, Jonathan Andrew

Phylogenetic trees are powerful tools of evolutionary biology that have become prominent across the life sciences. Consequently, learning to interpret and reason from phylogenetic trees is now an essential component of biology education. However, students often struggle to understand these diagrams, even after explicit instruction. One factor that has been observed to affect student understanding of phylogenetic trees is style (i.e., diagonal or bracket). The goal of this dissertation research was to systematically explore effects of style on student interpretations and construction of phylogenetic trees in the context of an introductory biology course. Before instruction, students were significantly more accurate with bracket phylogenetic trees for a variety of interpretation and construction tasks. Explicit instruction that balanced the use of diagonal and bracket phylogenetic trees mitigated some, but not all, style effects. After instruction, students were significantly more accurate for interpretation tasks involving taxa relatedness and construction exercises when using the bracket style. Based on this dissertation research and prior studies on style effects, I advocate for introductory biology instructors to use only the bracket style. Future research should examine causes of style effects and variables other than style to inform the development of research-based instruction that best supports student understanding of phylogenetic trees.
Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

ERIC Educational Resources Information Center

Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

2014-01-01

This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…
Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

ERIC Educational Resources Information Center

Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

2013-01-01

This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
Forest type mapping of the Interior West

Treesearch

Bonnie Ruefenacht; Gretchen G. Moisen; Jock A. Blackard

2004-01-01

This paper develops techniques for the mapping of forest types in Arizona, New Mexico, and Wyoming. The methods involve regression-tree modeling using a variety of remote sensing and GIS layers along with Forest Inventory Analysis (FIA) point data. Regression-tree modeling is a fast and efficient technique of estimating variables for large data sets with high accuracy...
What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

ERIC Educational Resources Information Center

Thomas, Emily H.; Galambos, Nora

To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis

ERIC Educational Resources Information Center

Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.

2016-01-01

In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…
Determination of the chemical parameters and manufacturer of divins from their broadband transmission spectra

NASA Astrophysics Data System (ADS)

Khodasevich, M. A.; Sinitsyn, G. V.; Skorbanova, E. A.; Rogovaya, M. V.; Kambur, E. I.; Aseev, V. A.

2016-06-01

Analysis of multiparametric data on transmission spectra of 24 divins (Moldovan cognacs) in the 190-2600 nm range allows identification of outliers and their removal from a sample under study in the following consideration. The principal component analysis and classification tree with a single-rank predictor constructed in the 2D space of principal components allow classification of divin manufacturers. It is shown that the accuracy of syringaldehyde, ethyl acetate, vanillin, and gallic acid concentrations in divins calculated with the regression to latent structures depends on the sample volume and is 3, 6, 16, and 20%, respectively, which is acceptable for the application.
Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis.

PubMed

Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L

2017-07-01

To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.
In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

PubMed

Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

2017-04-01

Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.
Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

PubMed Central

Farreras, Montse

2014-01-01

Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675
Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

PubMed

Mirzaei, Sajad; Wu, Yufeng

2016-01-01

Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.
Graphical fault tree analysis for fatal falls in the construction industry.

PubMed

Chi, Chia-Fen; Lin, Syuan-Zih; Dewi, Ratna Sari

2014-11-01

The current study applied a fault tree analysis to represent the causal relationships among events and causes that contributed to fatal falls in the construction industry. Four hundred and eleven work-related fatalities in the Taiwanese construction industry were analyzed in terms of age, gender, experience, falling site, falling height, company size, and the causes for each fatality. Given that most fatal accidents involve multiple events, the current study coded up to a maximum of three causes for each fall fatality. After the Boolean algebra and minimal cut set analyses, accident causes associated with each falling site can be presented as a fault tree to provide an overview of the basic causes, which could trigger fall fatalities in the construction industry. Graphical icons were designed for each falling site along with the associated accident causes to illustrate the fault tree in a graphical manner. A graphical fault tree can improve inter-disciplinary discussion of risk management and the communication of accident causation to first line supervisors. Copyright © 2014 Elsevier Ltd. All rights reserved.
Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

PubMed

Henrard, S; Speybroeck, N; Hermans, C

2015-11-01

Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Application of the Feynman-tree theorem together with BCFW recursion relations

NASA Astrophysics Data System (ADS)

Maniatis, M.

2018-03-01

Recently, it has been shown that on-shell scattering amplitudes can be constructed by the Feynman-tree theorem combined with the BCFW recursion relations. Since the BCFW relations are restricted to tree diagrams, the preceding application of the Feynman-tree theorem is essential. In this way, amplitudes can be constructed by on-shell and gauge-invariant tree amplitudes. Here, we want to apply this method to the electron-photon vertex correction. We present all the single, double, and triple phase-space tensor integrals explicitly and show that the sum of amplitudes coincides with the result of the conventional calculation of a virtual loop correction.
Hyperspectral Analysis of Soil Nitrogen, Carbon, Carbonate, and Organic Matter Using Regression Trees

PubMed Central

Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L. Monika

2012-01-01

The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

ERIC Educational Resources Information Center

Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

2012-01-01

Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Analytical framework for reconstructing heterogeneous environmental variables from mammal community structure.

PubMed

Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C

2015-01-01

We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.

Modeling vertebrate diversity in Oregon using satellite imagery

NASA Astrophysics Data System (ADS)

Cablk, Mary Elizabeth

Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
The scientific dating of standing buildings.

PubMed

Alcock, Nathaniel W

2017-11-17

The techniques of dendrochronology (tree-ring dating) and radiocarbon (14C) dating are described, as they are applied to historic buildings. Both rely on determining the felling dates of the trees used in their construction. For dendrochronology, the construction of master chronologies and the matching of individual ring-width sequences to them is described and, for radiocarbon dating, the use of tree-ring results in calibration. Results of dating are discussed, ranging from the cathedrals of Peterborough and Beauvais and the development of crown-post roof structures, to the dating and identification of standing medieval peasant houses, particularly those built using cruck construction.
A Deliberate Practice Approach to Teaching Phylogenetic Analysis

PubMed Central

Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.

2013-01-01

One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or “one-shot,” in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts. PMID:24297294
Importance of physical and hydraulic characteristics to unionid mussels: A retrospective analysis in a reach of large river

USGS Publications Warehouse

Zigler, S.J.; Newton, T.J.; Steuer, J.J.; Bartsch, M.R.; Sauer, J.S.

2008-01-01

Interest in understanding physical and hydraulic factors that might drive distribution and abundance of freshwater mussels has been increasing due to their decline throughout North America. We assessed whether the spatial distribution of unionid mussels could be predicted from physical and hydraulic variables in a reach of the Upper Mississippi River. Classification and regression tree (CART) models were constructed using mussel data compiled from various sources and explanatory variables derived from GIS coverages. Prediction success of CART models for presence-absence of mussels ranged from 71 to 76% across three gears (brail, sled-dredge, and dive-quadrat) and 51% of the deviance in abundance. Models were largely driven by shear stress and substrate stability variables, but interactions with simple physical variables, especially slope, were also important. Geospatial models, which were based on tree model results, predicted few mussels in poorly connected backwater areas (e.g., floodplain lakes) and the navigation channel, whereas main channel border areas with high geomorphic complexity (e.g., river bends, islands, side channel entrances) and small side channels were typically favorable to mussels. Moreover, bootstrap aggregation of discharge-specific regression tree models of dive-quadrat data indicated that variables measured at low discharge were about 25% more predictive (PMSE = 14.8) than variables measured at median discharge (PMSE = 20.4) with high discharge (PMSE = 17.1) variables intermediate. This result suggests that episodic events such as droughts and floods were important in structuring mussel distributions. Although the substantial mussel and ancillary data in our study reach is unusual, our approach to develop exploratory statistical and geospatial models should be useful even when data are more limited. ?? 2007 Springer Science+Business Media B.V.
snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

PubMed

Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M

2012-01-01

The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.
Distribution of cavity trees in midwestern old-growth and second-growth forests

Treesearch

Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

2003-01-01

We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Distribution of cavity trees in midwesternold-growth and second-growth forests

Treesearch

Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen

2003-01-01

We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
A hierarchical linear model for tree height prediction.

Treesearch

Vicente J. Monleon

2003-01-01

Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...
Modeling individual tree survial

Treesearch

Quang V. Cao

2016-01-01

Information provided by growth and yield models is the basis for forest managers to makeÂ decisions on how to manage their forests. Among different types of growth models, whole-stand models offerÂ predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-knownÂ logistic regression is commonly used to predict tree...
Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling

PubMed Central

Chesters, Douglas

2017-01-01

Abstract Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications. PMID:27798407
Binary space partitioning trees and their uses

NASA Technical Reports Server (NTRS)

Bell, Bradley N.

1989-01-01

Binary Space Partitioning (BSP) trees have some qualities that make them useful in solving many graphics related problems. The purpose is to describe what a BSP tree is, and how it can be used to solve the problem of hidden surface removal, and constructive solid geometry. The BSP tree is based on the idea that a plane acting as a divider subdivides space into two parts with one being on the positive side and the other on the negative. A polygonal solid is then represented as the volume defined by the collective interior half spaces of the solid's bounding surfaces. The nature of how the tree is organized lends itself well for sorting polygons relative to an arbitrary point in 3 space. The speed at which the tree can be traversed for depth sorting is fast enough to provide hidden surface removal at interactive speeds. The fact that a BSP tree actually represents a polygonal solid as a bounded volume also makes it quite useful in performing the boolean operations used in constructive solid geometry. Due to the nature of the BSP tree, polygons can be classified as they are subdivided. The ability to classify polygons as they are subdivided can enhance the simplicity of implementing constructive solid geometry.
ANTLR Tree Grammar Generator and Extensions

NASA Technical Reports Server (NTRS)

Craymer, Loring

2005-01-01

A computer program implements two extensions of ANTLR (Another Tool for Language Recognition), which is a set of software tools for translating source codes between different computing languages. ANTLR supports predicated- LL(k) lexer and parser grammars, a notation for annotating parser grammars to direct tree construction, and predicated tree grammars. [ LL(k) signifies left-right, leftmost derivation with k tokens of look-ahead, referring to certain characteristics of a grammar.] One of the extensions is a syntax for tree transformations. The other extension is the generation of tree grammars from annotated parser or input tree grammars. These extensions can simplify the process of generating source-to-source language translators and they make possible an approach, called "polyphase parsing," to translation between computing languages. The typical approach to translator development is to identify high-level semantic constructs such as "expressions," "declarations," and "definitions" as fundamental building blocks in the grammar specification used for language recognition. The polyphase approach is to lump ambiguous syntactic constructs during parsing and then disambiguate the alternatives in subsequent tree transformation passes. Polyphase parsing is believed to be useful for generating efficient recognizers for C++ and other languages that, like C++, have significant ambiguities.
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

PubMed

Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

2016-03-01

Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Log and tree sawing times for hardwood mills

Treesearch

Everette D. Rast

1974-01-01

Data on 6,850 logs and 1,181 trees were analyzed to predict sawing times. For both logs and trees, regression equations were derived that express (in minutes) sawing time per log or tree and per Mbf. For trees, merchantable height is expressed in number of logs as well as in feet. One of the major uses for the tables of average sawing times is as a bench mark against...
An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

PubMed Central

2014-01-01

Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates. PMID:24678701
An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines.

PubMed

DeGiorgio, Michael; Syring, John; Eckert, Andrew J; Liston, Aaron; Cronn, Richard; Neale, David B; Rosenberg, Noah A

2014-03-29

As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each "strategy" for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.
Big cat phylogenies, consensus trees, and computational thinking.

PubMed

Sul, Seung-Jin; Williams, Tiffani L

2011-07-01

Phylogenetics seeks to deduce the pattern of relatedness between organisms by using a phylogeny or evolutionary tree. For a given set of organisms or taxa, there may be many evolutionary trees depicting how these organisms evolved from a common ancestor. As a result, consensus trees are a popular approach for summarizing the shared evolutionary relationships in a group of trees. We examine these consensus techniques by studying how the pantherine lineage of cats (clouded leopard, jaguar, leopard, lion, snow leopard, and tiger) evolved, which is hotly debated. While there are many phylogenetic resources that describe consensus trees, there is very little information, written for biologists, regarding the underlying computational techniques for building them. The pantherine cats provide us with a small, relevant example to explore the computational techniques (such as sorting numbers, hashing functions, and traversing trees) for constructing consensus trees. Our hope is that life scientists enjoy peeking under the computational hood of consensus tree construction and share their positive experiences with others in their community.
Logistic regression trees for initial selection of interesting loci in case-control studies

PubMed Central

Nickolov, Radoslav Z; Milanov, Valentin B

2007-01-01

Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
Rapid Leaf Deployment Strategies in a Deciduous Savanna

PubMed Central

2016-01-01

Deciduous plants avoid the costs of maintaining leaves in the unfavourable season, but carry the costs of constructing new leaves every year. Deciduousness is therefore expected in ecological situations with pronounced seasonality and low costs of leaf construction. In our study system, a seasonally dry tropical savanna, many trees are deciduous, suggesting that leaf construction costs must be low. Previous studies have, however, shown that nitrogen is limiting in this system, suggesting that leaf construction costs are high. Here we examine this conundrum using a time series of soil moisture availability, leaf phenology and nitrogen distribution in the tree canopy to illustrate how trees resorb nitrogen before leaf abscission and use stored reserves of nitrogen and carbon to construct new leaves at the onset of the growing season. Our results show that trees deployed leaves shortly before and in anticipation of the first rains with its associated pulse of nitrogen mineralisation. Our results also show that trees rapidly constructed a full canopy of leaves within two weeks of the first rains. We detected an increase in leaf nitrogen content that corresponded with the first rains and with the movement of nitrogen to more distal branches, suggesting that stored nitrogen reserves are used to construct leaves. Furthermore the stable carbon isotope ratios (δ13C) of these leaves suggest the use of stored carbon for leaf construction. Our findings suggest that the early deployment of leaves using stored nitrogen and carbon reserves is a strategy that is integrally linked with the onset of the first rains. This strategy may confer a competitive advantage over species that deploy leaves at or after the onset of the rains. PMID:27310398
Regression modeling and mapping of coniferous forest basal area and tree density from discrete-return lidar and multispectral data

Treesearch

Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan

2006-01-01

We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...

Assessing College Student Interest in Math and/or Computer Science in a Cross-National Sample Using Classification and Regression Trees

ERIC Educational Resources Information Center

Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas

2012-01-01

The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Automated rule-base creation via CLIPS-Induce

NASA Technical Reports Server (NTRS)

Murphy, Patrick M.

1994-01-01

Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Identification of extremely premature infants at high risk of rehospitalization.

PubMed

Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D

2011-11-01

Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
Identification of Extremely Premature Infants at High Risk of Rehospitalization

PubMed Central

Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.

2011-01-01

OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016
Modeling Tree Mortality Following Wildfire in Pinus ponderosa Forests in the Central Sierra Nevada of California

Treesearch

Jon C. Regelbrugge

1993-01-01

Abstract. We modeled tree mortality occurring two years following wildfire in Pinus ponderosa forests using data from 1275 trees in 25 stands burned during the 1987 Stanislaus Complex fires. We used logistic regression analysis to develop models relating the probability of wildfire-induced mortality with tree size and fire severity for Pinus ponderosa, Calocedrus...
Inferring patterns in mitochondrial DNA sequences through hypercube independent spanning trees.

PubMed

Silva, Eduardo Sant Ana da; Pedrini, Helio

2016-03-01

Given a graph G, a set of spanning trees rooted at a vertex r of G is said vertex/edge independent if, for each vertex v of G, v≠r, the paths of r to v in any pair of trees are vertex/edge disjoint. Independent spanning trees (ISTs) provide a number of advantages in data broadcasting due to their fault tolerant properties. For this reason, some studies have addressed the issue by providing mechanisms for constructing independent spanning trees efficiently. In this work, we investigate how to construct independent spanning trees on hypercubes, which are generated based upon spanning binomial trees, and how to use them to predict mitochondrial DNA sequence parts through paths on the hypercube. The prediction works both for inferring mitochondrial DNA sequences comprised of six bases as well as infer anomalies that probably should not belong to the mitochondrial DNA standard. Copyright © 2016 Elsevier Ltd. All rights reserved.
Finding structure in data using multivariate tree boosting

PubMed Central

Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.

2016-01-01

Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
Modeling time-to-event (survival) data using classification tree analysis.

PubMed

Linden, Ariel; Yarnold, Paul R

2017-12-01

Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.

PubMed

Chesters, Douglas

2017-05-01

Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications. [Data integration; data mining; insects; phylogenomics; phyloinformatics; tree of life.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
[Application of regression tree in analyzing the effects of climate factors on NDVI in loess hilly area of Shaanxi Province].

PubMed

Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding

2010-05-01

Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.
Partitioning sources of variation in vertebrate species richness

USGS Publications Warehouse

Boone, R.B.; Krohn, W.B.

2000-01-01

Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.
Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.

PubMed

Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan

2015-03-01

A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.
Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China.

PubMed

Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

2016-01-01

Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia.
Nine Hundred Years of Weekly Streamflows: Stochastic Downscaling of Ensemble Tree-Ring Reconstructions

NASA Astrophysics Data System (ADS)

Sauchyn, David; Ilich, Nesa

2017-11-01

We combined the methods and advantages of stochastic hydrology and paleohydrology to estimate 900 years of weekly flows for the North and South Saskatchewan Rivers at Edmonton and Medicine Hat, Alberta, respectively. Regression models of water-year streamflow were constructed using historical naturalized flow data and a pool of 196 tree-ring (earlywood, latewood, and annual) ring-width chronologies from 76 sites. The tree-ring models accounted for up to 80% of the interannual variability in historical naturalized flows. We developed a new algorithm for generating stochastic time series of weekly flows constrained by the statistical properties of both the historical record and proxy streamflow data, and by the necessary condition that weekly flows correlate between the end of a year and the start of the next. A second innovation, enabled by the density of our tree-ring network, is to derive the paleohydrology from an ensemble of 100 statistically significant reconstructions at each gauge. Using paleoclimatic data to generate long series of weekly flow estimates augments the short historical record with an expanded range of hydrologic variability, including sequences of wet and dry years of greater length and severity. This unique hydrometric time series will enable evaluation of the reliability of current water supply and management systems given the range of hydroclimatic variability and extremes contained in the stochastic paleohydrology. It also could inform evaluation of the uncertainty in climate model projections, given that internal hydroclimatic variability is the dominant source of uncertainty.
Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China

PubMed Central

Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

2016-01-01

Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia. PMID:27002822
Environmental impacts of forest road construction on mountainous terrain.

PubMed

Caliskan, Erhan

2013-03-15

Forest roads are the base infrastructure foundation of forestry operations. These roads entail a complex engineering effort because they can cause substantial environmental damage to forests and include a high-cost construction. This study was carried out in four sample sites of Giresun, Trabzon(2) and Artvin Forest Directorate, which is in the Black Sea region of Turkey. The areas have both steep terrain (30-50% gradient) and very steep terrain (51-80% gradient). Bulldozers and hydraulic excavators were determined to be the main machines for forest road construction, causing environmental damage and cross sections in mountainous areas.As a result of this study, the percent damage to forests was determined as follows: on steep terrain, 21% of trees were damaged by excavators and 33% of trees were damaged by bulldozers during forest road construction, and on very steep terrain, 27% of trees were damaged by excavators and 44% of trees were damaged by bulldozers during forest road construction. It was also determined that on steep terrain, when excavators were used, 12.23% less forest area was destroyed compared with when bulldozers were used and 16.13% less area was destroyed by excavators on very steep terrain. In order to reduce the environmental damage on the forest ecosystem, especially in steep terrains, hydraulic excavators should replace bulldozers in forest road construction activities.
Modeling Caribbean tree stem diameters from tree height and crown width measurements

Treesearch

Thomas Brandeis; KaDonna Randolph; Mike Strub

2009-01-01

Regression models to predict diameter at breast height (DBH) as a function of tree height and maximum crown radius were developed for Caribbean forests based on data collected by the U.S. Forest Service in the Commonwealth of Puerto Rico and Territory of the U.S. Virgin Islands. The model predicting DBH from tree height fit reasonably well (R2 = 0.7110), with...
Technology transfer by means of fault tree synthesis

NASA Astrophysics Data System (ADS)

Batzias, Dimitris F.

2012-12-01

Since Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering, it forms a common technique for good industrial practice. On the contrary, fault tree synthesis (FTS) refers to the methodology of constructing complex trees either from dentritic modules built ad hoc or from fault tress already used and stored in a Knowledge Base. In both cases, technology transfer takes place in a quasi-inductive mode, from partial to holistic knowledge. In this work, an algorithmic procedure, including 9 activity steps and 3 decision nodes is developed for performing effectively this transfer when the fault under investigation occurs within one of the latter stages of an industrial procedure with several stages in series. The main parts of the algorithmic procedure are: (i) the construction of a local fault tree within the corresponding production stage, where the fault has been detected, (ii) the formation of an interface made of input faults that might occur upstream, (iii) the fuzzy (to count for uncertainty) multicriteria ranking of these faults according to their significance, and (iv) the synthesis of an extended fault tree based on the construction of part (i) and on the local fault tree of the first-ranked fault in part (iii). An implementation is presented, referring to 'uneven sealing of Al anodic film', thus proving the functionality of the developed methodology.
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

PubMed Central

Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

2016-01-01

When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924

Digression and Value Concatenation to Enable Privacy-Preserving Regression.

PubMed

Li, Xiao-Bai; Sarkar, Sumit

2014-09-01

Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.
Exploration on Construction of Hospital "Talent Tree" Project.

PubMed

Yi, Lihua; Wei, Lei; Hao, Aimin; Hu, Minmin; Xu, Xinzhou

2015-05-01

Talent is the core competitive force of a hospital's development. Wuxi No. 2 People's Hospital followed the characteristics that medical talents mature slowly and their growth requires a long period. The innovated "talent tree" project, trained classified talents corresponding to "base-trunk-crown" of a tree, formed an individualized professional training plan with different levels and at different periods. We carried out a relay of the "talent tree" to bring their initiative into play. In practice, we gradually found this as a unique way of the talent construction, which conforms to our hospital's condition. This guarantees sustained development and innovative force of the hospital.
Decision trees in epidemiological research.

PubMed

Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

2017-01-01

In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Indicators of Terrorism Vulnerability in Africa

DTIC Science & Technology

2015-03-26

the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description
The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

PubMed

Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

2013-02-01

The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic-algorithm-based neural network IAQ models outperformed the traditional ANN methods of the back-propagation and the radial basis function networks. The novelty of this research is the development of a novel approach to modeling vehicular indoor air quality by integration of the advanced methods of genetic algorithms, regression trees, and the analysis of variance for the monitored in-vehicle gaseous and particulate matter contaminants, and comparing the results obtained from using the developed approach with conventional artificial intelligence techniques of back propagation networks and radial basis function networks. This study validated the newly developed approach using holdout and threefold cross-validation methods. These results are of great interest to scientists, researchers, and the public in understanding the various aspects of modeling an indoor microenvironment. This methodology can easily be extended to other fields of study also.
Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

PubMed

Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

2003-01-01

Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Developing Models to Forcast Sales of Natural Christmas Trees

Treesearch

Lawrence D. Garrett; Thomas H. Pendleton

1977-01-01

A study of practices for marketing Christmas trees in Winston-Salem, North Carolina, and Denver, Colorado, revealed that such factors as retail lot competition, tree price, consumer traffic, and consumer income were very important in determining a particular retailer's sales. Analyses of 4 years of market data were used in developing regression models for...
Comprehensive database of diameter-based biomass regressions for North American tree species

Treesearch

Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey

2004-01-01

A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...
A method to study response of large trees to different amounts of available soil water

Treesearch

D.H. Marx; Shi-Jean S. Sung; J.S. Cunningham; M.D. Thompson; L.M. White

1995-01-01

A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25% less thrufall, normal thrufall, or 25% additional thrufall.Undercanopy construction in these plots moderately...
A Method to Study Response of Large Trees to Different Amounts of Available Soil Water

Treesearch

Donald H. Marx; Shi-jean S. Sung; James S. Cunningham; Michael D. Thompson; Linda M. White

1995-01-01

A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25 percent less thrufall, normal tbrufall,or 25 percent additional thrufall. Undercanopy construction in these plots...
Analysis of occlusal variables, dental attrition, and age for distinguishing healthy controls from female patients with intracapsular temporomandibular disorders.

PubMed

Seligman, D A; Pullinger, A G

2000-01-01

Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.
Towards lidar-based mapping of tree age at the Arctic forest tundra ecotone.

NASA Astrophysics Data System (ADS)

Jensen, J.; Maguire, A.; Oelkers, R.; Andreu-Hayles, L.; Boelman, N.; D'Arrigo, R.; Griffin, K. L.; Jennewein, J. S.; Hiers, E.; Meddens, A. J.; Russell, M.; Vierling, L. A.; Eitel, J.

2017-12-01

Climate change may cause spatial shifts in the forest-tundra ecotone (FTE). To improve our ability to study these spatial shifts, information on tree demography along the FTE is needed. The objective of this study was to assess the suitability of lidar derived tree heights as a surrogate for tree age. We calculated individual tree age from 48 tree cores collected at basal height from white spruce (Picea glauca) within the FTE in northern Alaska. Tree height was obtained from terrestrial lidar scans (<1cm spatial resolution). The relationship between age and height was examined using a linear regression model forced through the origin. We found a very strong predictive relationship between tree height and age (R2 = 0.90, RMSE = 19.34 years) for trees that ranged between 14 to 230 years. Separate regression models were also developed for small (height < 3 m) and large trees (height >= 3 m), yielding strong predictive relationships between height and age (R2 = 0.86, RMSE 12.21 years, and R2 = 0.93, RMSE = 25.16 years, respectively). The slope coefficient for small and large tree models (16.83 and 12.98 years/m, respectively) indicate that small trees grow 1.3 times faster than large trees at these FTE study sites. Although a strong, predictive relationship between age and height is uncommon in light-limited forest environments, our findings suggest that the sparseness of trees within the FTE may explain the strong tree height-age relationships found herein. Further analysis of 36 additional tree cores recently collected within the FTE near Inuvik, Canada will be performed. Our preliminary analysis suggests that lidar derived tree height could be a reliable proxy for tree age at the FTE, thereby establishing a new technique for scaling tree structure and demographics across larger portions of this sensitive ecotone.
Landscape-scale consequences of differential tree mortality from catastrophic wind disturbance in the Amazon.

PubMed

Rifai, Sami W; Urquiza Muñoz, José D; Negrón-Juárez, Robinson I; Ramírez Arévalo, Fredy R; Tello-Espinoza, Rodil; Vanderwel, Mark C; Lichstein, Jeremy W; Chambers, Jeffrey Q; Bohlman, Stephanie A

2016-10-01

Wind disturbance can create large forest blowdowns, which greatly reduces live biomass and adds uncertainty to the strength of the Amazon carbon sink. Observational studies from within the central Amazon have quantified blowdown size and estimated total mortality but have not determined which trees are most likely to die from a catastrophic wind disturbance. Also, the impact of spatial dependence upon tree mortality from wind disturbance has seldom been quantified, which is important because wind disturbance often kills clusters of trees due to large treefalls killing surrounding neighbors. We examine (1) the causes of differential mortality between adult trees from a 300-ha blowdown event in the Peruvian region of the northwestern Amazon, (2) how accounting for spatial dependence affects mortality predictions, and (3) how incorporating both differential mortality and spatial dependence affect the landscape level estimation of necromass produced from the blowdown. Standard regression and spatial regression models were used to estimate how stem diameter, wood density, elevation, and a satellite-derived disturbance metric influenced the probability of tree death from the blowdown event. The model parameters regarding tree characteristics, topography, and spatial autocorrelation of the field data were then used to determine the consequences of non-random mortality for landscape production of necromass through a simulation model. Tree mortality was highly non-random within the blowdown, where tree mortality rates were highest for trees that were large, had low wood density, and were located at high elevation. Of the differential mortality models, the non-spatial models overpredicted necromass, whereas the spatial model slightly underpredicted necromass. When parameterized from the same field data, the spatial regression model with differential mortality estimated only 7.5% more dead trees across the entire blowdown than the random mortality model, yet it estimated 51% greater necromass. We suggest that predictions of forest carbon loss from wind disturbance are sensitive to not only the underlying spatial dependence of observations, but also the biological differences between individuals that promote differential levels of mortality. © 2016 by the Ecological Society of America.
Tree Morphologic Plasticity Explains Deviation from Metabolic Scaling Theory in Semi-Arid Conifer Forests, Southwestern USA

PubMed Central

O’Connor, Christopher D.; Lynch, Ann M.

2016-01-01

A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST. PMID:27391084
Tree Morphologic Plasticity Explains Deviation from Metabolic Scaling Theory in Semi-Arid Conifer Forests, Southwestern USA.

PubMed

Swetnam, Tyson L; O'Connor, Christopher D; Lynch, Ann M

2016-01-01

A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST.
The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data.

PubMed

O'Reilly, Joseph E; Donoghue, Philip C J

2018-03-01

Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.
The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data

PubMed Central

O’Reilly, Joseph E; Donoghue, Philip C J

2018-01-01

Abstract Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data. PMID:29106675
A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

PubMed Central

Redelings, Benjamin D.

2017-01-01

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub. PMID:28265520
Niche construction within riparian corridors. Part II: The unexplored role of positive intraspecific interactions in Salicaceae species

NASA Astrophysics Data System (ADS)

Corenblit, Dov; Garófano-Gómez, Virginia; González, Eduardo; Hortobágyi, Borbála; Julien, Frédéric; Lambs, Luc; Otto, Thierry; Roussel, Erwan; Steiger, Johannes; Tabacchi, Eric; Till-Bottraud, Irène

2018-03-01

Within riparian corridors, Salicaceae trees and shrubs affect hydrogeomorphic processes and lead to the formation of wooded fluvial landforms. These trees form dense stands and enhance plant anchorage, as grouped plants are less prone to be uprooted than free-standing individuals. This also enhances their role as ecosystem engineers through the trapping of sediment, organic matter, and nutrients. The landform formation caused by these wooded biogeomorphic landforms probably represents a positive niche construction, which ultimately leads, through facilitative processes, to an improved capacity of the individual trees to survive, exploit resources, and reach sexual maturity in the interval between destructive floods. The facilitative effects of riparian vegetation are well established; however, the nature and intensity of biotic interactions among trees of the same species forming dense woody stands and constructing the niche remain unclear. Our hypothesis is that the niche construction process also comprises more direct intraspecific interactions, such as cooperation or altruism. Our aim in this paper is to propose an original theoretical framework for positive intraspecific interactions among riparian Salicaceae species operating from establishment to sexual maturity. Within this framework, we speculate that (i) positive intraspecific interactions among trees are maximized in dynamic river reaches; (ii) during establishment, intraspecific facilitation (or helping) occurs among trees and this leads to the maintenance of a dense stand that improves survival and growth because saplings protect each other from shear stress and scour; (iii) in addition to the improved capacity to trap mineral and organic matter, individuals that constitute the dense stand can cooperate to mutually support a mycorrhizal network that will connect plants, soil, and groundwater and influence nutrient transfer, cycling, and storage within the shared constructed niche; (iv) during post-establishment, roots form functional grafts between neighbouring trees to increase biomechanical and physiological anchorage as well as nutrient acquisition and exchange; and (v) these stands remain dense on alluvial bars until a threshold of landform construction and hydrogeomorphic disconnection is reached. At this last stage, intraspecific competition for resources (light and nutrients) increases, inducing a density reduction in the aerial stand (i.e., self-thinning), but root systems of altruistic individuals could remain functional via root grafting. Finally, we suggest new methodological perspectives for testing our hypotheses related to the occurrence of positive intraspecific interactions among Salicaceae trees in fluvial landform and niche construction through in situ and ex situ experiments.

Category of trees in representation theory of quantum algebras

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moskaliuk, N. M.; Moskaliuk, S. S., E-mail: mss@bitp.kiev.ua

2013-10-15

New applications of categorical methods are connected with new additional structures on categories. One of such structures in representation theory of quantum algebras, the category of Kuznetsov-Smorodinsky-Vilenkin-Smirnov (KSVS) trees, is constructed, whose objects are finite rooted KSVS trees and morphisms generated by the transition from a KSVS tree to another one.
Supersonic air jets preserve tree roots in underground pipeline installation

Treesearch

Rob Gross; Michelle Julene

2002-01-01

Tree roots are often damaged during construction projects, particularly during trenching operations for pipeline installation. Although mechanical soil excavation using heavy equipment, such as an excavator or backhoe is considered the fastest the most economical method, it damages and destroys tree roots and can lead to unintentional tree loss, poor public relations,...
A review of tree root conflicts with sidewalks, curbs, and roads

Treesearch

T.B. Randrup; E.G. McPherson; L.R. Costello

2003-01-01

Literature relevant to tree root and urban infrastructure conflicts is reviewed. Although tree roots can conflict with many infrastructure elements, sidewalk and curb conflicts are the focus of this review. Construction protocols, urban soils, root growth, and causal factors (soil conditions, limited planting space, tree size, variation in root architecture, management...
Reconciliation of Gene and Species Trees

PubMed Central

Rusin, L. Y.; Lyubetskaya, E. V.; Gorbunov, K. Y.; Lyubetsky, V. A.

2014-01-01

The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i) inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii) trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii) can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree. PMID:24800245
Environmental impacts of forest road construction on mountainous terrain

PubMed Central

2013-01-01

Forest roads are the base infrastructure foundation of forestry operations. These roads entail a complex engineering effort because they can cause substantial environmental damage to forests and include a high-cost construction. This study was carried out in four sample sites of Giresun, Trabzon(2) and Artvin Forest Directorate, which is in the Black Sea region of Turkey. The areas have both steep terrain (30-50% gradient) and very steep terrain (51-80% gradient). Bulldozers and hydraulic excavators were determined to be the main machines for forest road construction, causing environmental damage and cross sections in mountainous areas. As a result of this study, the percent damage to forests was determined as follows: on steep terrain, 21% of trees were damaged by excavators and 33% of trees were damaged by bulldozers during forest road construction, and on very steep terrain, 27% of trees were damaged by excavators and 44% of trees were damaged by bulldozers during forest road construction. It was also determined that on steep terrain, when excavators were used, 12.23% less forest area was destroyed compared with when bulldozers were used and 16.13% less area was destroyed by excavators on very steep terrain. In order to reduce the environmental damage on the forest ecosystem, especially in steep terrains, hydraulic excavators should replace bulldozers in forest road construction activities. PMID:23497078
Unravelling the limits to tree height: a major role for water and nutrient trade-offs.

PubMed

Cramer, Michael D

2012-05-01

Competition for light has driven forest trees to grow exceedingly tall, but the lack of a single universal limit to tree height indicates multiple interacting environmental limitations. Because soil nutrient availability is determined by both nutrient concentrations and soil water, water and nutrient availabilities may interact in determining realised nutrient availability and consequently tree height. In SW Australia, which is characterised by nutrient impoverished soils that support some of the world's tallest forests, total [P] and water availability were independently correlated with tree height (r = 0.42 and 0.39, respectively). However, interactions between water availability and each of total [P], pH and [Mg] contributed to a multiple linear regression model of tree height (r = 0.72). A boosted regression tree model showed that maximum tree height was correlated with water availability (24%), followed by soil properties including total P (11%), Mg (10%) and total N (9%), amongst others, and that there was an interaction between water availability and total [P] in determining maximum tree height. These interactions indicated a trade-off between water and P availability in determining maximum tree height in SW Australia. This is enabled by a species assemblage capable of growing tall and surviving (some) disturbances. The mechanism for this trade-off is suggested to be through water enabling mass-flow and diffusive mobility of P, particularly of relatively mobile organic P, although water interactions with microbial activity could also play a role.
Perturbative Quantum Gravity from Gauge Theory

NASA Astrophysics Data System (ADS)

Carrasco, John Joseph

In this dissertation we present the graphical techniques recently developed in the construction of multi-loop scattering amplitudes using the method of generalized unitarity. We construct the three-loop and four-loop four-point amplitudes of N = 8 supergravity using these methods and the Kawaii, Lewellen and Tye tree-level relations which map tree-level gauge theory amplitudes to tree-level gravity theory amplitudes. We conclude by extending a tree-level duality between color and kinematics, generic to gauge theories, to a loop level conjecture, allowing the easy relation between loop-level gauge and gravity kinematics. We provide non-trivial evidence for this conjecture at three-loops in the particular case of maximal supersymmetry.
Multivariate regression model for partitioning tree volume of white oak into round-product classes

Treesearch

Daniel A. Yaussy; David L. Sonderman

1984-01-01

Describes the development of multivariate equations that predict the expected cubic volume of four round-product classes from independent variables composed of individual tree-quality characteristics. Although the model has limited application at this time, it does demonstrate the feasibility of partitioning total tree cubic volume into round-product classes based on...
A way forward for fire-caused tree mortality prediction: Modeling a physiological consequence of fire

Treesearch

Kathleen L. Kavanaugh; Matthew B. Dickinson; Anthony S. Bova

2010-01-01

Current operational methods for predicting tree mortality from fire injury are regression-based models that only indirectly consider underlying causes and, thus, have limited generality. A better understanding of the physiological consequences of tree heating and injury are needed to develop biophysical process models that can make predictions under changing or novel...
Height-age relationships for regeneration-size trees in the northern Rocky Mountains, USA

Treesearch

Dennis E. Ferguson; Clinton E. Carlson

2010-01-01

Regression equations were developed to predict heights of 10 conifer species inregenerating stands in central and northern Idaho, western Montana, and eastern Washington. Most sample trees were natural regeneration that became established after conventional harvest and site preparation methods. Heights are predicted as a function of tree age, residual overstory density...
Potential redistribution of tree species habitat under five climate change scenarios in the eastern US

Treesearch

Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad

2002-01-01

Global climate change could have profound effects on the Earth's biota, including large redistributions of tree species and forest types. We used DISTRIB, a deterministic regression tree analysis model, to examine environmental drivers related to current forest-species distributions and then model potential suitable habitat under five climate change scenarios...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less
Regionalization of meso-scale physically based nitrogen modeling outputs to the macro-scale by the use of regression trees

NASA Astrophysics Data System (ADS)

Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.

2012-06-01

In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).
ECOPASS - a multivariate model used as an index of growth performance of poplar clones

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ceulemans, R.; Impens, I.

The model (ECOlogical PASSport) reported was constructed by principal component analysis from a combination of biochemical, anatomical/morphological and ecophysiological gas exchange parameters measured on 5 fast growing poplar clones. Productivity data were 10 selected trees in 3 plantations in Belgium and given as m.a.i.(b.a.). The model is shown to be able to reflect not only genetic origin and the relative effects of the different parameters of the clones, but also their production potential. Multiple regression analysis of the 4 principal components showed a high cumulative correlation (96%) between the 3 components related to ecophysiological, biochemical and morphological parameters, and productivity;more » the ecophysiological component alone correlated 85% with productivity.« less
Predicting Diameter at Breast Height from Stump Diameters for Northeastern Tree Species

Treesearch

Eric H. Wharton; Eric H. Wharton

1984-01-01

Presents equations to predict diameter at breast height from stump diameter measurements for 17 northeastern tree species. Simple linear regression was used to develop the equations. Application of the equations is discussed.
Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.

PubMed

Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine

2017-04-01

In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.
Graph-associated entanglement cost of a multipartite state in exact and finite-block-length approximate constructions

NASA Astrophysics Data System (ADS)

Yamasaki, Hayata; Soeda, Akihito; Murao, Mio

2017-09-01

We introduce and analyze graph-associated entanglement cost, a generalization of the entanglement cost of quantum states to multipartite settings. We identify a necessary and sufficient condition for any multipartite entangled state to be constructible when quantum communication between the multiple parties is restricted to a quantum network represented by a tree. The condition for exact state construction is expressed in terms of the Schmidt ranks of the state defined with respect to edges of the tree. We also study approximate state construction and provide a second-order asymptotic analysis.
Combining logistic regression with classification and regression tree to predict quality of care in a home health nursing data set.

PubMed

Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun

2006-01-01

In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
Evaluation of acoustic tomography for tree decay detection

Treesearch

Shanquing Liang; Xiping Wang; Janice Wiedenbeck; Zhiyong Cai; Feng Fu

2008-01-01

In this study, the acoustic tomography technique was used to detect internal decay in high value black cherry (Prunus seratina) trees. Two-dimensional images of the cross sections of the tree samples were constructed using PiCUS Q70 software. The trees were felled following the field test, and a disc from each testing elevation was subsequently cut...
The balance of planting and mortality in a street tree population

Treesearch

Lara A. Roman; John J. Battles; Joe R. McBride

2013-01-01

Street trees have aesthetic, environmental, human health, and economic benefits in urban ecosystems. Street tree populations are constructed by cycles of planting, growth, death, removal and replacement. The goals of this study were to understand how tree mortality and planting rates affect net population growth, evaluate the shape of the mortality curve, and assess...

Detection of Single Tree Stems in Forested Areas from High Density ALS Point Clouds Using 3d Shape Descriptors

NASA Astrophysics Data System (ADS)

Amiri, N.; Polewski, P.; Yao, W.; Krzystek, P.; Skidmore, A. K.

2017-09-01

Airborne Laser Scanning (ALS) is a widespread method for forest mapping and management purposes. While common ALS techniques provide valuable information about the forest canopy and intermediate layers, the point density near the ground may be poor due to dense overstory conditions. The current study highlights a new method for detecting stems of single trees in 3D point clouds obtained from high density ALS with a density of 300 points/m2. Compared to standard ALS data, due to lower flight height (150-200 m) this elevated point density leads to more laser reflections from tree stems. In this work, we propose a three-tiered method which works on the point, segment and object levels. First, for each point we calculate the likelihood that it belongs to a tree stem, derived from the radiometric and geometric features of its neighboring points. In the next step, we construct short stem segments based on high-probability stem points, and classify the segments by considering the distribution of points around them as well as their spatial orientation, which encodes the prior knowledge that trees are mainly vertically aligned due to gravity. Finally, we apply hierarchical clustering on the positively classified segments to obtain point sets corresponding to single stems, and perform ℓ1-based orthogonal distance regression to robustly fit lines through each stem point set. The ℓ1-based method is less sensitive to outliers compared to the least square approaches. From the fitted lines, the planimetric tree positions can then be derived. Experiments were performed on two plots from the Hochficht forest in Oberösterreich region located in Austria.We marked a total of 196 reference stems in the point clouds of both plots by visual interpretation. The evaluation of the automatically detected stems showed a classification precision of 0.86 and 0.85, respectively for Plot 1 and 2, with recall values of 0.7 and 0.67.
Estimation of carbon storage based on individual tree detection in Pinus densiflora stands using a fusion of aerial photography and LiDAR data.

PubMed

Kim, So-Ra; Kwak, Doo-Ahn; Lee, Woo-Kyun; oLee, Woo-Kyun; Son, Yowhan; Bae, Sang-Won; Kim, Choonsig; Yoo, Seongjin

2010-07-01

The objective of this study was to estimate the carbon storage capacity of Pinus densiflora stands using remotely sensed data by combining digital aerial photography with light detection and ranging (LiDAR) data. A digital canopy model (DCM), generated from the LiDAR data, was combined with aerial photography for segmenting crowns of individual trees. To eliminate errors in over and under-segmentation, the combined image was smoothed using a Gaussian filtering method. The processed image was then segmented into individual trees using a marker-controlled watershed segmentation method. After measuring the crown area from the segmented individual trees, the individual tree diameter at breast height (DBH) was estimated using a regression function developed from the relationship observed between the field-measured DBH and crown area. The above ground biomass of individual trees could be calculated by an image-derived DBH using a regression function developed by the Korea Forest Research Institute. The carbon storage, based on individual trees, was estimated by simple multiplication using the carbon conversion index (0.5), as suggested in guidelines from the Intergovernmental Panel on Climate Change. The mean carbon storage per individual tree was estimated and then compared with the field-measured value. This study suggested that the biomass and carbon storage in a large forest area can be effectively estimated using aerial photographs and LiDAR data.
Application of decision tree for prediction of cutaneous leishmaniasis incidence based on environmental and topographic factors in Isfahan Province, Iran.

PubMed

Ramezankhani, Roghieh; Sajjadi, Nooshin; Nezakati Esmaeilzadeh, Roya; Jozi, Seyed Ali; Shirzadi, Mohammad Reza

2018-05-08

Cutaneous Leishmaniasis (CL) is a neglected tropical disease that continues to be a health problem in Iran. Nearly 350 million people are thought to be at risk. We investigated the impact of the environmental factors on CL incidence during the period 2007- 2015 in a known endemic area for this disease in Isfahan Province, Iran. After collecting data with regard to the climatic, topographic, vegetation coverage and CL cases in the study area, a decision tree model was built using the classification and regression tree algorithm. CL data for the years 2007 until 2012 were used for model construction and the data for the years 2013 until 2015 were used for testing the model. The Root Mean Square error and the correlation factor were used to evaluate the predictive performance of the decision tree model. We found that wind speeds less than 14 m/s, altitudes between 1234 and 1810 m above the mean sea level, vegetation coverage according to the normalized difference vegetation index (NDVI) less than 0.12, rainfall less than 1.6 mm and air temperatures higher than 30°C would correspond to a seasonal incidence of 163.28 per 100,000 persons, while if wind speed is less than 14 m/s, altitude less than 1,810 m and NDVI higher than 0.12, then the mean seasonal incidence of the disease would be 2.27 per 100,000 persons. Environmental factors were found to be important predictive variables for CL incidence and should be considered in surveillance and prevention programmes for CL control.
Modeling potential future individual tree-species distributions in the eastern United States under a climate change scenario: a case study with Pinus virginiana

Treesearch

Louis R. Iverson; Anantha Prasad; Mark W. Schwartz; Mark W. Schwartz

1999-01-01

We are using a deterministic regression tree analysis model (DISTRIB) and a stochastic migration model (SHIFT) to examine potential distributions of ~66 individual species of eastern US trees under a 2 x CO2 climate change scenario. This process is demonstrated for Virginia pine (Pinus virginiana).
Potential Changes in Tree Species Richness and Forest Community Types following Climate Change

Treesearch

Louis R. Iverson; Anantha M. Prasad

2001-01-01

Potential changes in tree species richness and forest community types were evaluated for the eastern United States according to five scenarios of future climate change resulting from a doubling of atmospheric carbon dioxide (CO2). DISTRIB, an empirical model that uses a regression tree analysis approach, was used to generate suitable habitat, or potential future...
Estimating tree crown widths for the primary Acadian species in Maine

Treesearch

Matthew B. Russell; Aaron R. Weiskittel

2012-01-01

In this analysis, data for seven conifer and eight hardwood species were gathered from across the state of Maine for estimating tree crown widths. Maximum and largest crown width equations were developed using tree diameter at breast height as the primary predicting variable. Quantile regression techniques were used to estimate the maximum crown width and a constrained...
[A site index model for Larix principis-rupprechtii plantation in Saihanba, north China].

PubMed

Wang, Dong-zhi; Zhang, Dong-yan; Jiang, Feng-ling; Bai, Ye; Zhang, Zhi-dong; Huang, Xuan-rui

2015-11-01

It is often difficult to estimate site indices for different types of plantation by using an ordinary site index model. The objective of this paper was to establish a site index model for plantations in varied site conditions, and assess the site qualities. In this study, a nonlinear mixed site index model was constructed based on data from the second class forest resources inventory and 173 temporary sample plots. The results showed that the main limiting factors for height growth of Larix principis-rupprechtii were elevation, slope, soil thickness and soil type. A linear regression model was constructed for the main constraining site factors and dominant tree height, with the coefficient of determination being 0.912, and the baseline age of Larix principis-rupprechtii determined as 20 years. The nonlinear mixed site index model parameters for the main site types were estimated (R2 > 0.85, the error between the predicted value and the actual value was in the range of -0.43 to 0.45, with an average root mean squared error (RMSE) in the range of 0.907 to 1.148). The estimation error between the predicted value and the actual value of dominant tree height for the main site types was in the confidence interval of [-0.95, 0.95]. The site quality of the high altitude-shady-sandy loam-medium soil layer was the highest and that of low altitude-sunny-sandy loam-medium soil layer was the lowest, while the other two sites were moderate.
A Comparison of Two Tree Construction Methods for Obtaining Proximity Measures among Words. Number 47.

ERIC Educational Resources Information Center

Rapoport, Amnon

The prediction that two different methods of constructing linear, tree graphs will yield the same formal structure of semantic space and measurement of word proximity was tested by comparing the distribution of node degree, the distribution of the number of pairs of nodes connected y times, and the distribution of adjective degree in trees…
78 FR 5797 - Missisquoi Associates; Notice of Application for Amendment of License and Soliciting Comments...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-01-28

... to allow its affiliate, EGP Solar 1, LLC, to construct and maintain a 2.2 megawatt solar photovoltaic... solar array would be constructed on both sides of Heather Lane (the project's access road), but public... be used for the solar array is currently devoid of trees, although some grading and tree cutting is...
Further Effects of Phylogenetic Tree Style on Student Comprehension in an Introductory Biology Course.

PubMed

Dees, Jonathan; Bussard, Caitlin; Momsen, Jennifer L

2018-06-01

Phylogenetic trees have become increasingly important across the life sciences, and as a result, learning to interpret and reason from these diagrams is now an essential component of biology education. Unfortunately, students often struggle to understand phylogenetic trees. Style (i.e., diagonal or bracket) is one factor that has been observed to impact how students interpret phylogenetic trees, and one goal of this research was to investigate these style effects across an introductory biology course. In addition, we investigated the impact of instruction that integrated diagonal and bracket phylogenetic trees equally. Before instruction, students were significantly more accurate with the bracket style for a variety of interpretation and construction tasks. After instruction, however, students were significantly more accurate only for construction tasks and interpretations involving taxa relatedness when using the bracket style. Thus, instruction that used both styles equally mitigated some, but not all, style effects. These results inform the development of research-based instruction that best supports student understanding of phylogenetic trees.
Automated construction of arterial and venous trees in retinal images.

PubMed

Hu, Qiao; Abràmoff, Michael D; Garvin, Mona K

2015-10-01

While many approaches exist to segment retinal vessels in fundus photographs, only a limited number focus on the construction and disambiguation of arterial and venous trees. Previous approaches are local and/or greedy in nature, making them susceptible to errors or limiting their applicability to large vessels. We propose a more global framework to generate arteriovenous trees in retinal images, given a vessel segmentation. In particular, our approach consists of three stages. The first stage is to generate an overconnected vessel network, named the vessel potential connectivity map (VPCM), consisting of vessel segments and the potential connectivity between them. The second stage is to disambiguate the VPCM into multiple anatomical trees, using a graph-based metaheuristic algorithm. The third stage is to classify these trees into arterial or venous (A/V) trees. We evaluated our approach with a ground truth built based on a public database, showing a pixel-wise classification accuracy of 88.15% using a manual vessel segmentation as input, and 86.11% using an automatic vessel segmentation as input.
Disentangling Environmental and Anthropogenic Impacts on the Distribution of Unintentionally Introduced Invasive Alien Insects in Mainland China

PubMed Central

Zhao, Cai-Yun; Xu, Jing; Liu, Xiao-Yan

2017-01-01

Abstract Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. PMID:28973576
Tree tensor network approach to simulating Shor's algorithm

NASA Astrophysics Data System (ADS)

Dumitrescu, Eugene

2017-12-01

Constructively simulating quantum systems furthers our understanding of qualitative and quantitative features which may be analytically intractable. In this paper, we directly simulate and explore the entanglement structure present in the paradigmatic example for exponential quantum speedups: Shor's algorithm. To perform our simulation, we construct a dynamic tree tensor network which manifestly captures two salient circuit features for modular exponentiation. These are the natural two-register bipartition and the invariance of entanglement with respect to permutations of the top-register qubits. Our construction help identify the entanglement entropy properties, which we summarize by a scaling relation. Further, the tree network is efficiently projected onto a matrix product state from which we efficiently execute the quantum Fourier transform. Future simulation of quantum information states with tensor networks exploiting circuit symmetries is discussed.
Wnt signal transduction pathways: modules, development and evolution.

PubMed

Nayak, Losiana; Bhattacharyya, Nitai P; De, Rajat K

2016-08-01

Wnt signal transduction pathway (Wnt STP) is a crucial intracellular pathway mainly due to its participation in important biological processes, functions, and diseases, i.e., embryonic development, stem-cell management, and human cancers among others. This is why Wnt STP is one of the highest researched signal transduction pathways. Study and analysis of its origin, expansion and gradual development to the present state as found in humans is one aspect of Wnt research. The pattern of development and evolution of the Wnt STP among various species is not clear till date. A phylogenetic tree created from Wnt STPs of multiple species may address this issue. In this respect, we construct a phylogenetic tree from modules of Wnt STPs of diverse species. We term it as the 'Module Tree'. A module is nothing but a self-sufficient minimally-dependent subset of the original Wnt STP. Authenticity of the module tree is tested by comparing it with the two reference trees. The module tree performs better than an alternative phylogenetic tree constructed from pathway topology of Wnt STPs. Moreover, an evolutionary emergence pattern of the Wnt gene family is created and the module tree is tallied with it to showcase the significant resemblances.
Industrial and occupational ergonomics in the petrochemical process industry: a regression trees approach.

PubMed

Bevilacqua, M; Ciarapica, F E; Giacchetta, G

2008-07-01

This work is an attempt to apply classification tree methods to data regarding accidents in a medium-sized refinery, so as to identify the important relationships between the variables, which can be considered as decision-making rules when adopting any measures for improvement. The results obtained using the CART (Classification And Regression Trees) method proved to be the most precise and, in general, they are encouraging concerning the use of tree diagrams as preliminary explorative techniques for the assessment of the ergonomic, management and operational parameters which influence high accident risk situations. The Occupational Injury analysis carried out in this paper was planned as a dynamic process and can be repeated systematically. The CART technique, which considers a very wide set of objective and predictive variables, shows new cause-effect correlations in occupational safety which had never been previously described, highlighting possible injury risk groups and supporting decision-making in these areas. The use of classification trees must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into traditional types of analysis.
Acid rain, air pollution, and tree growth in southeastern New York

USGS Publications Warehouse

Puckett, L.J.

1982-01-01

Whether dendroecological analyses could be used to detect changes in the relationship of tree growth to climate that might have resulted from chronic exposure to components of the acid rain-air pollution complex was determined. Tree-ring indices of white pine (Pinus strobus L.), eastern hemlock (Tsuga canadensis (L.) Cart.), pitch pine (Pinus rigida Mill.), and chestnut oak (Quercus prinus L.) were regressed against orthogonally transformed values of temperature and precipitation in order to derive a response-function relationship. Results of the regression analyses for three time periods, 1901–1920, 1926–1945, and 1954–1973 suggest that the relationship of tree growth to climate has been altered. Statistical tests of the temperature and precipitation data suggest that this change was nonclimatic. Temporally, the shift in growth response appears to correspond with the suspected increase in acid rain and air pollution in the Shawangunk Mountain area of southeastern New York in the early 1950's. This change could be the result of physiological stress induced by components of the acid rain-air pollution complex, causing climatic conditions to be more limiting to tree growth.
Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

PubMed

Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

2007-10-01

A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

PubMed

Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

2016-01-01

Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Propensity score estimation: machine learning and classification methods as alternatives to logistic regression

PubMed Central

Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

2010-01-01

Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
Improving estimation of tree carbon stocks by harvesting aboveground woody biomass within airborne LiDAR flight areas

NASA Astrophysics Data System (ADS)

Colgan, M.; Asner, G. P.; Swemmer, A. M.

2011-12-01

The accurate estimation of carbon stored in a tree is essential to accounting for the carbon emissions due to deforestation and degradation. Airborne LiDAR (Light Detection and Ranging) has been successful in estimating aboveground carbon density (ACD) by correlating airborne metrics, such as canopy height, to field-estimated biomass. This latter step is reliant on field allometry which is applied to forest inventory quantities, such as stem diameter and height, to predict the biomass of a given tree stem. Constructing such allometry is expensive, time consuming, and requires destructive sampling. Consequently, the sample sizes used to construct such allometry are often small, and the largest tree sampled is often much smaller than the largest in the forest population. The uncertainty resulting from these sampling errors can lead to severe biases when the allometry is applied to stems larger than those harvested to construct the allometry, which is then subsequently propagated to airborne ACD estimates. The Kruger National Park (KNP) mission of maintaining biodiversity coincides with preserving ecosystem carbon stocks. However, one hurdle to accurately quantifying carbon density in savannas is that small stems are typically harvested to construct woody biomass allometry, yet they are not representative of Kruger's distribution of biomass. Consequently, these equations inadequately capture large tree variation in sapwood/hardwood composition, root/shoot/leaf allocation, branch fall, and stem rot. This study eliminates the "middleman" of field allometry by directly measuring, or harvesting, tree biomass within the extent of airborne LiDAR. This enables comparisons of field and airborne ACD estimates, and also enables creation of new airborne algorithms to estimate biomass at the scale of individual trees. A field campaign was conducted at Pompey Silica Mine 5km outside Kruger National Park, South Africa, in Mar-Aug 2010 to harvest and weigh tree mass. Since harvesting of trees is not possible within KNP, this was a unique opportunity to fell trees already scheduled to be cleared for mining operations. The area was first flown by the Carnegie Airborne Observatory in early May, prior to harvest, to enable correlation of LiDAR-measured tree height and crown diameter to harvested tree mass. Results include over 4,000 harvested stems and 13 species-specific biomass equations, including seven Kruger woody species previously without allometry. We found existing biomass stem allometry over-estimates ACD in the field, whereas airborne estimates based on harvest data avoid this bias while maintaining similar precision to field-based estimates. Lastly, a new airborne algorithm estimating biomass at the tree-level reduced error from tree canopies "leaning" into field plots but whose stems are outside plot boundaries. These advances pave the way to better understanding of savanna and forest carbon density at landscape and regional scales.

A new serial pooling method of shifted tree ring blocks to construct millennia long tree ring isotope chronologies with annual resolution.

PubMed

Boettger, Tatjana; Friedrich, Michael

2009-03-01

The study presents a new serial pooling method of shifted tree ring blocks for the building of isotope chronologies. This method combines the advantages of traditional 'serial' and 'intertree' pooling, and can be recommended for the construction of sub-regional long isotope chronologies with sufficient replication, and on annual resolution, especially for the case of extremely narrow tree rings. For Scots pines (Pinus sylvestris L., Khibiny Low Mountains, NW Russia) and Silver firs (Abies alba Mill., Franconia, Southern Germany), serial pooling of five consecutive tree rings seems appropriate because the species- and site-specific particularities lead to blurs of climate linkages in their tree rings for the period up to ca. five years back. An equivalent to a five-year running means that curve gained on the base annual data sets of single trees can be derived from the analysis of yearly shifted five-year blocks of consecutive tree rings, and therefore, with approximately 20% of the expense. Good coherence of delta(13)C- and delta(18)O-values between calculated means of annual total rings or late wood data and means of five-year blocks of consecutive total tree rings analysed experimentally on most similar material confirms this assumption.
Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

Treesearch

Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards

2006-01-01

Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest'sÂ© See5 and Cubist (for binary and continuous responses,...
Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

Treesearch

E. Freeman; G. Moisen; J. Coulston; B. Wilson

2014-01-01

Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
Estimating probabilities of infestation and extent of damage by the roundheaded pine beetle in ponderosa pine in the Sacramento Mountains, New Mexico

Treesearch

Jose Negron

1997-01-01

Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...
STX--Fortran-4 program for estimates of tree populations from 3P sample-tree-measurements

Treesearch

L. R. Grosenbaugh

1967-01-01

Describes how to use an improved and greatly expanded version of an earlier computer program (1964) that converts dendrometer measurements of 3P-sample trees to population values in terms of whatever units user desires. Many new options are available, including that of obtaining a product-yield and appraisal report based on regression coefficients supplied by user....
Portable Language-Independent Adaptive Translation from OCR. Phase 1

DTIC Science & Technology

2009-04-01

including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
Alterations of chemical composition, construction cost and payback time in needles of Masson pine (Pinus massoniana L.) trees grown under pollution.

PubMed

Liu, Nan; Guan, Lan-Lan; Sun, Fang-Fang; Wen, Da-Zhi

2014-07-01

Previous studies show that Masson pine (Pinus massoniana L.) stands grown at the industrially-polluted site have experienced unprecedented growth decline, but the causal mechanisms are poorly understood. In this study, to understand the mechanisms of growth decline of Mason pine strands under pollution stresses, we determined the reactive oxygen species levels and chemical composition of the current-year (C) and one-year-old (C + 1) needles, and calculated the needle construction costs (CCmass) of Masson pine trees grown at an industrially-polluted site and an unpolluted remote site. Pine trees grown at the polluted site had significantly higher levels of hydroxyl radical and superoxide anion in their needles than those grown at the unpolluted site, and the former trees eventually exhibited needle early senescence. The contents of lipids, soluble phenolics and lignins in C and C + 1 needles were significantly higher at the polluted site than at the unpolluted site, but the total amounts of non-construction carbohydrates were lower in non-polluted needles than in polluted needles. Elevated levels of the reactive oxygen species and early senescence in polluted needles together led to significant increases in CCmass and a longer payback time. We infer that the lengthened payback time and needle early senescence under pollution stress may reduce the Masson pine tree growth and consequently accelerate tree decline.
Tools for valuing tree and park services

Treesearch

E.G. McPherson

2010-01-01

Arborists and urban foresters plan, design, construct, and manage trees and parks in cities throughout the world. These civic improvements create walkable, cool environments, save energy, reduce stormwater runoff, sequester carbon dioxide, and absorb air pollutants. The presence of trees and green spaces in cities is associated with increases in property values,...
Improving Cluster Analysis with Automatic Variable Selection Based on Trees

DTIC Science & Technology

2014-12-01

regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Tree growth and recruitment in a leveed floodplain forest in the Mississippi River Alluvial Valley, USA

USGS Publications Warehouse

Gee, Hugo K.W.; King, Sammy L.; Keim, Richard F.

2014-01-01

Flooding is a defining disturbance in floodplain forests affecting seed germination, seedling establishment, and tree growth. Globally, flood control, including artificial levees, dams, and channelization has altered flood regimes in floodplains. However, a paucity of data are available in regards to the long-term effects of levees on stand establishment and tree growth in floodplain forests. In this study, we used dendrochronological techniques to reconstruct tree recruitment and tree growth over a 90-year period at three stands within a ring levee in the Mississippi River Alluvial Valley (MAV) and to evaluate whether recruitment patterns and tree growth changed following levee construction. We hypothesized that: (1) sugarberry is increasing in dominance and overcup oak (Quercus lyrata) is becoming less dominant since the levee, and that changes in hydrology are playing a greater role than canopy disturbance in these changes in species dominance; and (2) that overcup oak growth has declined following construction of the levee and cessation of overbank flooding whereas that of sugarberry has increased. Recruitment patterns shifted from flood-tolerant overcup oak to flood-intolerant sugarberry (Celtis laevigata) after levee construction. None of the 122 sugarberry trees cored in this study established prior to the levee, but it was the most common species established after the levee. The mechanisms behind the compositional change are unknown, however, the cosmopolitan distribution of overcup oak during the pre-levee period and sugarberry during the post-levee period, the lack of sugarberry establishment in the pre-levee period, and the confinement of overcup oak regeneration to the lowest areas in each stand after harvest in the post-levee period indicate that species-specific responses to flooding and light availability are forcing recruitment patterns. Overcup oak growth was also affected by levee construction, but in contrast to our hypothesis, growth actually increased for several decades before declining during a drought in the late 1990s. We interpret this result as removal of flood stress following levee construction. This finding emphasizes the fact that flooding can be stressful to trees regardless of their flood tolerance and that growth in floodplain trees can be sustained provided adequate soil moisture is present, regardless of the source of soil moisture. However, future research efforts should focus on the long-term effect of hydrologic modification on stand development and on how hydrologic modifications, such as elimination of surface flooding and groundwater declines, affect the vulnerability of floodplain forests to drought.
Predicting U.S. Army Reserve Unit Manning Using Market Demographics

DTIC Science & Technology

2015-06-01

develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Undergraduate Students’ Initial Ability in Understanding Phylogenetic Tree

NASA Astrophysics Data System (ADS)

Sa'adah, S.; Hidayat, T.; Sudargo, Fransisca

2017-04-01

The Phylogenetic tree is a visual representation depicts a hypothesis about the evolutionary relationship among taxa. Evolutionary experts use this representation to evaluate the evidence for evolution. The phylogenetic tree is currently growing for many disciplines in biology. Consequently, learning about the phylogenetic tree has become an important part of biological education and an interesting area of biology education research. Skill to understanding and reasoning of the phylogenetic tree, (called tree thinking) is an important skill for biology students. However, research showed many students have difficulty in interpreting, constructing, and comparing among the phylogenetic tree, as well as experiencing a misconception in the understanding of the phylogenetic tree. Students are often not taught how to reason about evolutionary relationship depicted in the diagram. Students are also not provided with information about the underlying theory and process of phylogenetic. This study aims to investigate the initial ability of undergraduate students in understanding and reasoning of the phylogenetic tree. The research method is the descriptive method. Students are given multiple choice questions and an essay that representative by tree thinking elements. Each correct answer made percentages. Each student is also given questionnaires. The results showed that the undergraduate students’ initial ability in understanding and reasoning phylogenetic tree is low. Many students are not able to answer questions about the phylogenetic tree. Only 19 % undergraduate student who answered correctly on indicator evaluate the evolutionary relationship among taxa, 25% undergraduate student who answered correctly on indicator applying concepts of the clade, 17% undergraduate student who answered correctly on indicator determines the character evolution, and only a few undergraduate student who can construct the phylogenetic tree.
Simulation of land use change in the three gorges reservoir area based on CART-CA

NASA Astrophysics Data System (ADS)

Yuan, Min

2018-05-01

This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
CADDIS Volume 4. Data Analysis: Basic Analyses

EPA Pesticide Factsheets

Use of statistical tests to determine if an observation is outside the normal range of expected values. Details of CART, regression analysis, use of quantile regression analysis, CART in causal analysis, simplifying or pruning resulting trees.
Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach.

PubMed

Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J

2009-11-22

Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.
A study of Solar-Enso correlation with southern Brazil tree ring index (1955- 1991)

NASA Astrophysics Data System (ADS)

Rigozo, N.; Nordemann, D.; Vieira, L.; Echer, E.

The effects of solar activity and El Niño-Southern Oscillation on tree growth in Southern Brazil were studied by correlation analysis. Trees for this study were native Araucaria (Araucaria Angustifolia)from four locations in Rio Grande do Sul State, in Southern Brazil: Canela (29o18`S, 50o51`W, 790 m asl), Nova Petropolis (29o2`S, 51o10`W, 579 m asl), Sao Francisco de Paula (29o25`S, 50o24`W, 930 m asl) and Sao Martinho da Serra (29o30`S, 53o53`W, 484 m asl). From these four sites, an average tree ring Index for this region was derived, for the period 1955-1991. Linear correlations were made on annual and 10 year running averages of this tree ring Index, of sunspot number Rz and SOI. For annual averages, the correlation coefficients were low, and the multiple regression between tree ring and SOI and Rz indicates that 20% of the variance in tree rings was explained by solar activity and ENSO variability. However, when the 10 year running averages correlations were made, the coefficient correlations were much higher. A clear anticorrelation is observed between SOI and Index (r=-0.81) whereas Rz and Index show a positive correlation (r=0.67). The multiple regression of 10 year running averages indicates that 76% of the variance in tree ring INdex was explained by solar activity and ENSO. These results indicate that the effects of solar activity and ENSO on tree rings are better seen on long timescales.
Forest inventory predictions from individual tree crowns: regression modeling within a sample framework

Treesearch

James W. Flewelling

2009-01-01

Remotely sensed data can be used to make digital maps showing individual tree crowns (ITC) for entire forests. Attributes of the ITCs may include area, shape, height, and color. The crown map is sampled in a way that provides an unbiased linkage between ITCs and identifiable trees measured on the ground. Methods of avoiding edge bias are given. In an example from a...
The relationship between tree canopy and crime rates across an urban-rural gradient in the greater Baltimore region

Treesearch

Austin Troy; J. Morgan Grove; Jarlath O' Neill-Dunne

2012-01-01

The extent to which urban tree cover influences crime is in debate in the literature. This research took advantage of geocoded crime point data and high resolution tree canopy data to address this question in Baltimore City and County, MD, an area that includes a significant urban-rural gradient. Using ordinary least squares and spatially adjusted regression and...
[Predicting very early rebleeding after acute variceal bleeding based in classification and regression tree analysis (CRTA).].

PubMed

Altamirano, J; Augustin, S; Muntaner, L; Zapata, L; González-Angulo, A; Martínez, B; Flores-Arroyo, A; Camargo, L; Genescá, J

2010-01-01

Variceal bleeding (VB) is the main cause of death among cirrhotic patients. About 30-50% of early rebleeding is encountered few days after the acute episode of VB. It is necessary to stratify patients with high risk of very early rebleeding (VER) for more aggressive therapies. However, there are few and incompletely understood prognostic models for this purpose. To determine the risk factors associated with VER after an acute VB. Assessment and comparison of a novel prognostic model generated by Classification and Regression Tree Analysis (CART) with classic-used models (MELD and Child-Pugh [CP]). Sixty consecutive cirrhotic patients with acute variceal bleeding. CART analysis, MELD and Child-Pugh scores were performed at admission. Receiver operating characteristic (ROC) curves were constructed to evaluate the predictive performance of the models. Very early rebleeding rate was 13%. Variables associated with VER were: serum albumin (p = 0.027), creatinine (p = 0.021) and transfused blood units in the first 24 hrs (p = 0.05). The area under the ROC for MELD, CHILD-Pugh and CART were 0.46, 0.50 and 0.82, respectively. The value of cut analyzed by CART for the significant variables were: 1) Albumin 2.85 mg/dL, 2) Packed red cells 2 units and 3) Creatinine 1.65 mg/dL the ABC-ROC. Serum albumin, creatinine and number of transfused blood units were associated with VER. A simple CART algorithm combining these variables allows an accurate predictive assessment of VER after acute variceal bleeding. Key words: cirrhosis, variceal bleeding, esophageal varices, prognosis, portal hypertension.
Weight management behaviors in a sample of Iranian adolescent girls.

PubMed

Garousi, S; Garrusi, B; Baneshi, Mohammad Reza; Sharifi, Z

2016-09-01

Attempts to obtain the ideal body shape portrayed in advertising can result in behaviors that lead to an unhealthy reduction in weight. This study was designed to identify contributing factors that may be effective in changing the behavior of a sample of Iranian adolescents. Three hundred fifty adolescent girls from high schools in Kerman, Iran participated in a cross-sectional study based on a self-administered questionnaire. Multifactorial logistic regression modeling was used to identify the factors influencing each of the contributing factors for body management methods, and a decision tree model was constructed to identify individuals who were more or less likely to change their body shape. Approximately one-third of the adolescent girls had attempted dieting, and 37 % of them had exercised to lose weight. The logistic regression model showed that pressure from their mother and the media; father's education level; and body mass index (BMI) were important factors in dieting. BMI and perceived pressure from the media were risk factors for attempting exercise. BMI and perceived pressure from relatives, particularly mothers, and the media were important factors in attempts by adolescent girls to lose weight.

Spatial Assessment of Model Errors from Four Regression Techniques

Treesearch

Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

2005-01-01

Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Your Tree's Trouble May Be You

Treesearch

USDA Forest Service

1974-01-01

People spend much time, effort and money to plant and maintain trees around their homes, businesses, public buildings and parks. People are attracted by the scenic and recreational qualities of forest environments. Yet people who love trees the most may unknowingly cause them injury, directly or indirectly, as a result of: Building and road construction, Flooding ,Soil...
Modeling brook trout presence and absence from landscape variables using four different analytical methods

USGS Publications Warehouse

Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.

2006-01-01

As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

PubMed

Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

2011-01-01

Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Automated construction of arterial and venous trees in retinal images

PubMed Central

Hu, Qiao; Abràmoff, Michael D.; Garvin, Mona K.

2015-01-01

Abstract. While many approaches exist to segment retinal vessels in fundus photographs, only a limited number focus on the construction and disambiguation of arterial and venous trees. Previous approaches are local and/or greedy in nature, making them susceptible to errors or limiting their applicability to large vessels. We propose a more global framework to generate arteriovenous trees in retinal images, given a vessel segmentation. In particular, our approach consists of three stages. The first stage is to generate an overconnected vessel network, named the vessel potential connectivity map (VPCM), consisting of vessel segments and the potential connectivity between them. The second stage is to disambiguate the VPCM into multiple anatomical trees, using a graph-based metaheuristic algorithm. The third stage is to classify these trees into arterial or venous (A/V) trees. We evaluated our approach with a ground truth built based on a public database, showing a pixel-wise classification accuracy of 88.15% using a manual vessel segmentation as input, and 86.11% using an automatic vessel segmentation as input. PMID:26636114
[Prediction and spatial distribution of recruitment trees of natural secondary forest based on geographically weighted Poisson model].

PubMed

Zhang, Ling Yu; Liu, Zhao Gang

2017-12-01

Based on the data collected from 108 permanent plots of the forest resources survey in Maoershan Experimental Forest Farm during 2004-2016, this study investigated the spatial distribution of recruitment trees in natural secondary forest by global Poisson regression and geographically weighted Poisson regression (GWPR) with four bandwidths of 2.5, 5, 10 and 15 km. The simulation effects of the 5 regressions and the factors influencing the recruitment trees in stands were analyzed, a description was given to the spatial autocorrelation of the regression residuals on global and local levels using Moran's I. The results showed that the spatial distribution of the number of natural secondary forest recruitment was significantly influenced by stands and topographic factors, especially average DBH. The GWPR model with small scale (2.5 km) had high accuracy of model fitting, a large range of model parameter estimates was generated, and the localized spatial distribution effect of the model parameters was obtained. The GWPR model at small scale (2.5 and 5 km) had produced a small range of model residuals, and the stability of the model was improved. The global spatial auto-correlation of the GWPR model residual at the small scale (2.5 km) was the lowe-st, and the local spatial auto-correlation was significantly reduced, in which an ideal spatial distribution pattern of small clusters with different observations was formed. The local model at small scale (2.5 km) was much better than the global model in the simulation effect on the spatial distribution of recruitment tree number.
Observed Methods for Felling Hardwood Trees with Chain Saws

Treesearch

Jerry L. Koger

1983-01-01

The angles and lengths of the cutting surfaces made by chain saw operators on hardwood tree stumps are described by means, standard deviations, ranges, and regression equations. Recommended felling guidelines are compared with observed felling methods used by experienced timber cutters in the southern Appalachian Mountains.
Grassland and cropland net ecosystem production of the U.S. Great Plains: Regression tree model development and comparative analysis

USGS Publications Warehouse

Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy

2016-01-01

This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.
Disentangling Environmental and Anthropogenic Impacts on the Distribution of Unintentionally Introduced Invasive Alien Insects in Mainland China.

PubMed

Zhao, Cai-Yun; Li, Jun-Sheng; Xu, Jing; Liu, Xiao-Yan

2017-05-01

Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.
Influence of meteorological variables on rainfall partitioning for deciduous and coniferous tree species in urban area

NASA Astrophysics Data System (ADS)

Zabret, Katarina; Rakovec, Jože; Šraj, Mojca

2018-03-01

Rainfall partitioning is an important part of the ecohydrological cycle, influenced by numerous variables. Rainfall partitioning for pine (Pinus nigra Arnold) and birch (Betula pendula Roth.) trees was measured from January 2014 to June 2017 in an urban area of Ljubljana, Slovenia. 180 events from more than three years of observations were analyzed, focusing on 13 meteorological variables, including the number of raindrops, their diameter, and velocity. Regression tree and boosted regression tree analyses were performed to evaluate the influence of the variables on rainfall interception loss, throughfall, and stemflow in different phenoseasons. The amount of rainfall was recognized as the most influential variable, followed by rainfall intensity and the number of raindrops. Higher rainfall amount, intensity, and the number of drops decreased percentage of rainfall interception loss. Rainfall amount and intensity were the most influential on interception loss by birch and pine trees during the leafed and leafless periods, respectively. Lower wind speed was found to increase throughfall, whereas wind direction had no significant influence. Consideration of drop size spectrum properties proved to be important, since the number of drops, drop diameter, and median volume diameter were often recognized as important influential variables.
Regression models for estimating leaf area of seedlings and adult individuals of Neotropical rainforest tree species.

PubMed

Brito-Rocha, E; Schilling, A C; Dos Anjos, L; Piotto, D; Dalmolin, A C; Mielke, M S

2016-01-01

Individual leaf area (LA) is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length - L and width - W) of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx) with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx) had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.
Phylogenetic tree construction based on 2D graphical representation

NASA Astrophysics Data System (ADS)

Liao, Bo; Shan, Xinzhou; Zhu, Wen; Li, Renfa

2006-04-01

A new approach based on the two-dimensional (2D) graphical representation of the whole genome sequence [Bo Liao, Chem. Phys. Lett., 401(2005) 196.] is proposed to analyze the phylogenetic relationships of genomes. The evolutionary distances are obtained through measuring the differences among the 2D curves. The fuzzy theory is used to construct phylogenetic tree. The phylogenetic relationships of H5N1 avian influenza virus illustrate the utility of our approach.
Deciphering factors controlling groundwater arsenic spatial variability in Bangladesh

NASA Astrophysics Data System (ADS)

Tan, Z.; Yang, Q.; Zheng, C.; Zheng, Y.

2017-12-01

Elevated concentrations of geogenic arsenic in groundwater have been found in many countries to exceed 10 μg/L, the WHO's guideline value for drinking water. A common yet unexplained characteristic of groundwater arsenic spatial distribution is the extensive variability at various spatial scales. This study investigates factors influencing the spatial variability of groundwater arsenic in Bangladesh to improve the accuracy of models predicting arsenic exceedance rate spatially. A novel boosted regression tree method is used to establish a weak-learning ensemble model, which is compared to a linear model using a conventional stepwise logistic regression method. The boosted regression tree models offer the advantage of parametric interaction when big datasets are analyzed in comparison to the logistic regression. The point data set (n=3,538) of groundwater hydrochemistry with 19 parameters was obtained by the British Geological Survey in 2001. The spatial data sets of geological parameters (n=13) were from the Consortium for Spatial Information, Technical University of Denmark, University of East Anglia and the FAO, while the soil parameters (n=42) were from the Harmonized World Soil Database. The aforementioned parameters were regressed to categorical groundwater arsenic concentrations below or above three thresholds: 5 μg/L, 10 μg/L and 50 μg/L to identify respective controlling factors. Boosted regression tree method outperformed logistic regression methods in all three threshold levels in terms of accuracy, specificity and sensitivity, resulting in an improvement of spatial distribution map of probability of groundwater arsenic exceeding all three thresholds when compared to disjunctive-kriging interpolated spatial arsenic map using the same groundwater arsenic dataset. Boosted regression tree models also show that the most important controlling factors of groundwater arsenic distribution include groundwater iron content and well depth for all three thresholds. The probability of a well with iron content higher than 5mg/L to contain greater than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be more than 91%, 85% and 51%, respectively, while the probability of a well from depth more than 160m to contain more than 5 μg/L, 10 μg/L and 50 μg/L As is estimated to be less than 38%, 25% and 14%, respectively.
The photosynthesis - leaf nitrogen relationship at ambient and elevated atmospheric carbon dioxide: a meta-analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andrew G. Peterson; J. Timothy Ball; Yiqi Luo

1998-09-25

Estimation of leaf photosynthetic rate (A) from leaf nitrogen content (N) is both conceptually and numerically important in models of plant, ecosystem and biosphere responses to global change. The relationship between A and N has been studied extensively at ambient CO{sub 2} but much less at elevated CO{sub 2}. This study was designed to (1) assess whether the A-N relationship was more similar for species within than between community and vegetation types, and (2) examine how growth at elevated CO{sub 2} affects the A-N relationship. Data were obtained for 39 C{sub 3} species grown at ambient CO{sub 2} and 10more » C{sub 3} species grown at ambient and elevated CO{sub 2}. A regression model was applied to each species as well as to species pooled within different community and vegetation types. Cluster analysis of the regression coefficients indicated that species measured at ambient CO{sub 2} did not separate into distinct groups matching community or vegetation type. Instead, most community and vegetation types shared the same general parameter space for regression coefficients. Growth at elevated CO{sub 2} increased photosynthetic nitrogen use efficiency for pines and deciduous trees. When species were pooled by vegetation type, the A-N relationship for deciduous trees expressed on a leaf-mass bask was not altered by elevated CO{sub 2}, while the intercept increased for pines. When regression coefficients were averaged to give mean responses for different vegetation types, elevated CO{sub 2} increased the intercept and the slope for deciduous trees but increased only the intercept for pines. There were no statistical differences between the pines and deciduous trees for the effect of CO{sub 2}. Generalizations about the effect of elevated CO{sub 2} on the A-N relationship, and differences between pines and deciduous trees will be enhanced as more data become available.« less
Using LiDAR to Estimate Total Aboveground Biomass of Redwood Stands in the Jackson Demonstration State Forest, Mendocino, California

NASA Astrophysics Data System (ADS)

Rao, M.; Vuong, H.

2013-12-01

The overall objective of this study is to develop a method for estimating total aboveground biomass of redwood stands in Jackson Demonstration State Forest, Mendocino, California using airborne LiDAR data. LiDAR data owing to its vertical and horizontal accuracy are increasingly being used to characterize landscape features including ground surface elevation and canopy height. These LiDAR-derived metrics involving structural signatures at higher precision and accuracy can help better understand ecological processes at various spatial scales. Our study is focused on two major species of the forest: redwood (Sequoia semperirens [D.Don] Engl.) and Douglas-fir (Pseudotsuga mensiezii [Mirb.] Franco). Specifically, the objectives included linear regression models fitting tree diameter at breast height (dbh) to LiDAR derived height for each species. From 23 random points on the study area, field measurement (dbh and tree coordinate) were collected for more than 500 trees of Redwood and Douglas-fir over 0.2 ha- plots. The USFS-FUSION application software along with its LiDAR Data Viewer (LDV) were used to to extract Canopy Height Model (CHM) from which tree heights would be derived. Based on the LiDAR derived height and ground based dbh, a linear regression model was developed to predict dbh. The predicted dbh was used to estimate the biomass at the single tree level using Jenkin's formula (Jenkin et al 2003). The linear regression models were able to explain 65% of the variability associated with Redwood's dbh and 80% of that associated with Douglas-fir's dbh.
Dispersion patterns and sampling plans for Diaphorina citri (Hemiptera: Psyllidae) in citrus.

PubMed

Sétamou, Mamoudou; Flores, Daniel; French, J Victor; Hall, David G

2008-08-01

The abundance and spatial dispersion of Diaphorina citri Kuwayama (Hemiptera: Psyllidae) were studied in 34 grapefruit (Citrus paradisi Macfad.) and six sweet orange [Citrus sinensis (L.) Osbeck] orchards from March to August 2006 when the pest is more abundant in southern Texas. Although flush shoot infestation levels did not vary with host plant species, densities of D. citri eggs, nymphs, and adults were significantly higher on sweet orange than on grapefruit. D. citri immatures also were found in significantly higher numbers in the southeastern quadrant of trees than other parts of the canopy. The spatial distribution of D. citri nymphs and adults was analyzed using Iowa's patchiness regression and Taylor's power law. Taylor's power law fitted the data better than Iowa's model. Based on both regression models, the field dispersion patterns of D. citri nymphs and adults were aggregated among flush shoots in individual trees as indicated by the regression slopes that were significantly >1. For the average density of each life stage obtained during our surveys, the minimum number of flush shoots per tree needed to estimate D. citri densities varied from eight for eggs to four flush shoots for adults. Projections indicated that a sampling plan consisting of 10 trees and eight flush shoots per tree would provide density estimates of the three developmental stages of D. citri acceptable enough for population studies and management decisions. A presence-absence sampling plan with a fixed precision level was developed and can be used to provide a quick estimation of D. citri populations in citrus orchards.
Adapting and Evaluating a Tree of Life Group for Women with Learning Disabilities

ERIC Educational Resources Information Center

Randle-Phillips, Cathy; Farquhar, Sarah; Thomas, Sally

2016-01-01

Background: This study describes how a specific narrative therapy approach called 'the tree of life' was adapted to run a group for women with learning disabilities. The group consisted of four participants and ran for five consecutive weeks. Materials and Methods: Participants each constructed a tree to represent their lives and presented their…
Tree Owner's Manual for the Northeastern Midwestern United States

Treesearch

Jill Johnson; Gary Johnson; Maureen McDonough; Lisa Burban; Janette Monear

2008-01-01

One common issue facing our urban forests is the fact that trees are dying prematurely. Many are planted improperly, setting them up for failure. Many do not receive regular maintenance. And few are adequately protected during construction projects. To help remedy this issue, the Forest Service has created this Tree Owner's Manual. Just like the owner...
CART (Classification and Regression Trees) Program: The Implementation of the CART Program and Its Application to Estimating Attrition Rates.

DTIC Science & Technology

1985-12-01

consists of the node t and all descendants of t in T. (3) Definition 3. Pruning a branch Tt from a tree T con- sists of deleting from T all...The default is 1.0 so that actually, this keyword did not need to appear in the above file. (5) DELETE . This keyword does not appear in our example, but...when it is used associated with some variable names, it indicates that we want to delete these vari- ables from the regression. If this keyword is
Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

PubMed

Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

2016-09-01

Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.

How Fast Do Trees Grow? Using Tables and Graphs to Explore Slope

ERIC Educational Resources Information Center

Joram, Elana; Oleson, Vicki

2007-01-01

This article describes a lesson unit in which students constructed tables and graphs to represent the growth of different trees. Students then compared the graphs to develop an understanding of slope.
Error analysis of leaf area estimates made from allometric regression models

NASA Technical Reports Server (NTRS)

Feiveson, A. H.; Chhikara, R. S.

1986-01-01

Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).
Memory-Scalable GPU Spatial Hierarchy Construction.

PubMed

Qiming Hou; Xin Sun; Kun Zhou; Lauterbach, C; Manocha, D

2011-04-01

Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration's use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.
Association Analysis with One Scan of Databases

DTIC Science & Technology

2006-01-01

frequency list . 2. After the first and only scan of the database, we sort according to item supports. The restructure of the P- tree consists of similar...tree can be created in two steps: Step 1: Construct a P-tree and obtain the item frequency list . (1) Root (2) (3) For each transaction in...those infrequent items from item frequency list . Next, we prune the P-tree to exclude the infrequent nodes by checking the frequency of each node
Inferring gene regression networks with model trees

PubMed Central

2010-01-01

Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Comparing Phylogenetic Trees by Matching Nodes Using the Transfer Distance Between Partitions.

PubMed

Bogdanowicz, Damian; Giaro, Krzysztof

2017-05-01

Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson-Foulds distance. In this article, we define a new metric for rooted trees-the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events.
More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011

NASA Astrophysics Data System (ADS)

Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R.

2016-01-01

Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.
More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011.

PubMed

Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R

2016-01-01

Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.

PubMed

Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

2009-01-01

Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.
Sectional Pole for Measuring Tree Heights

Treesearch

R. H. Brendemuehl; James B. Baker

1965-01-01

A sectional aluminum pole designed by the Silviculture Laboratory at Marianna, Florida, has proved useful for measuring tree heights. It is more convenient than a sectional bamboo pole 1 or a telescoping fiberglass pole. A tree 5 to 30 feet in height can be measured to the nearest tenth of a foot in 30 seconds. The pole is constructed of low-cost, readily available...
Stress Wave Propagation in Larch Plantation Trees-Numerical Simulation

Treesearch

Fenglu Liu; Fang Jiang; Xiping Wang; Houjiang Zhang; Wenhua Yu

2015-01-01

In this paper, we attempted to simulate stress wave propagation in virtual tree trunks and construct two dimensional (2D) wave-front maps in the longitudinal-radial section of the trunk. A tree trunk was modeled as an orthotropic cylinder in which wood properties along the fiber and in each of the two perpendicular directions were different. We used the COMSOL...
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Towards organ printing: engineering an intra-organ branched vascular tree.

PubMed

Visconti, Richard P; Kasyanov, Vladimir; Gentile, Carmine; Zhang, Jing; Markwald, Roger R; Mironov, Vladimir

2010-03-01

Effective vascularization of thick three-dimensional engineered tissue constructs is a problem in tissue engineering. As in native organs, a tissue-engineered intra-organ vascular tree must be comprised of a network of hierarchically branched vascular segments. Despite this requirement, current tissue-engineering efforts are still focused predominantly on engineering either large-diameter macrovessels or microvascular networks. We present the emerging concept of organ printing or robotic additive biofabrication of an intra-organ branched vascular tree, based on the ability of vascular tissue spheroids to undergo self-assembly. The feasibility and challenges of this robotic biofabrication approach to intra-organ vascularization for tissue engineering based on organ-printing technology using self-assembling vascular tissue spheroids including clinically relevantly vascular cell sources are analyzed. It is not possible to engineer 3D thick tissue or organ constructs without effective vascularization. An effective intra-organ vascular system cannot be built by the simple connection of large-diameter vessels and microvessels. Successful engineering of functional human organs suitable for surgical implantation will require concomitant engineering of a 'built in' intra-organ branched vascular system. Organ printing enables biofabrication of human organ constructs with a 'built in' intra-organ branched vascular tree.
Beating the Odds: Trees to Success in Different Countries

ERIC Educational Resources Information Center

Finch, W. Holmes; Marchant, Gregory J.

2017-01-01

A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…
Geospatial relationships of tree species damage caused by Hurricane Katrina in south Mississippi

Treesearch

Mark W. Garrigues; Zhaofei Fan; David L. Evans; Scott D. Roberts; William H. Cooke III

2012-01-01

Hurricane Katrina generated substantial impacts on the forests and biological resources of the affected area in Mississippi. This study seeks to use classification tree analysis (CTA) to determine which variables are significant in predicting hurricane damage (shear or windthrow) in the Southeast Mississippi Institute for Forest Inventory District. Logistic regressions...
Using Classification Trees to Predict Alumni Giving for Higher Education

ERIC Educational Resources Information Center

Weerts, David J.; Ronca, Justin M.

2009-01-01

As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…
Updated generalized biomass equations for North American tree species

Treesearch

David C. Chojnacky; Linda S. Heath; Jennifer C. Jenkins

2014-01-01

Historically, tree biomass at large scales has been estimated by applying dimensional analysis techniques and field measurements such as diameter at breast height (dbh) in allometric regression equations. Equations often have been developed using differing methods and applied only to certain species or isolated areas. We previously had compiled and combined (in meta-...
The microcomputer scientific software series 5: the BIOMASS user's guide.

Treesearch

George E. Host; Stephen C. Westin; William G. Cole; Kurt S. Pregitzer

1989-01-01

BIOMASS is an interactive microcomputer program that uses allometric regression equations to calculate aboveground biomass of common tree species of the Lake States. The equations are species-specific and most use both diameter and height as independent variables. The program accommodates fixed area and variable radius sample designs and produces both individual tree...
Northern Arkansas Spring Precipitation Reconstructed from Tree Rings, 1023-1992 A.D.

Treesearch

Malcolm K. Cleaveland

2001-01-01

Three baldcypress (Taxodium distichum (L.) Rich.) tree-ring chronologies in northeastern Arkansas and southeastern Missouri respond strongly to April-June (spring) rainfall in northern Arkansas. I used regression to reconstruct an average of spring rainfall in the three climatic divisions of northern Arkansas since 1023 A.D. The reconstruction was...
Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

PubMed

Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak

2006-06-06

To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.

Weather Impact on Airport Arrival Meter Fix Throughput

NASA Technical Reports Server (NTRS)

Wang, Yao

2017-01-01

Time-based flow management provides arrival aircraft schedules based on arrival airport conditions, airport capacity, required spacing, and weather conditions. In order to meet a scheduled time at which arrival aircraft can cross an airport arrival meter fix prior to entering the airport terminal airspace, air traffic controllers make regulations on air traffic. Severe weather may create an airport arrival bottleneck if one or more of airport arrival meter fixes are partially or completely blocked by the weather and the arrival demand has not been reduced accordingly. Under these conditions, aircraft are frequently being put in holding patterns until they can be rerouted. A model that predicts the weather impacted meter fix throughput may help air traffic controllers direct arrival flows into the airport more efficiently, minimizing arrival meter fix congestion. This paper presents an analysis of air traffic flows across arrival meter fixes at the Newark Liberty International Airport (EWR). Several scenarios of weather impacted EWR arrival fix flows are described. Furthermore, multiple linear regression and regression tree ensemble learning approaches for translating multiple sector Weather Impacted Traffic Indexes (WITI) to EWR arrival meter fix throughputs are examined. These weather translation models are developed and validated using the EWR arrival flight and weather data for the period of April-September in 2014. This study also compares the performance of the regression tree ensemble with traditional multiple linear regression models for estimating the weather impacted throughputs at each of the EWR arrival meter fixes. For all meter fixes investigated, the results from the regression tree ensemble weather translation models show a stronger correlation between model outputs and observed meter fix throughputs than that produced from multiple linear regression method.
Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

NASA Astrophysics Data System (ADS)

Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

2018-03-01

Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Development of a Prognostic Marker for Lung Cancer Using Analysis of Tumor Evolution

DTIC Science & Technology

2016-08-01

construct evolutionary trees , the characteristics of which will be used to predict whether a tumor will metastasize or not. We established a procedure for...of populations, the evolution of tumor cells within a tumor can be diagrammed on a phylogenetic tree . The more diverse a tumor’s phylogenetic tree ...individual tumor cells from the tumors of a training set of patients (half early stage, half late stage). We will reconstruct each tumor’s phylogenetic tree
Personal Database Management System I TRIAS

NASA Astrophysics Data System (ADS)

Yamamoto, Yoneo; Kashihara, Akihiro; Kawagishi, Keisuke

The current paper provides TRIAS (TRIple Associative System) which is a database management system for a personal use. In order to implement TRIAS, we have developed an associative database, whose format is (e,a,v) : e for entity, a for attribute, v for value. ML-TREE is used to construct (e,a,v). ML-TREE is a reversion of B+-tree that is multiway valanced tree. The paper focuses mainly on the usage of associative database, demonstrating how to use basic commands, primary functions and applcations.
A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees.

PubMed

Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa

2017-01-01

Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.
The limits to tree height.

PubMed

Koch, George W; Sillett, Stephen C; Jennings, Gregory M; Davis, Stephen D

2004-04-22

Trees grow tall where resources are abundant, stresses are minor, and competition for light places a premium on height growth. The height to which trees can grow and the biophysical determinants of maximum height are poorly understood. Some models predict heights of up to 120 m in the absence of mechanical damage, but there are historical accounts of taller trees. Current hypotheses of height limitation focus on increasing water transport constraints in taller trees and the resulting reductions in leaf photosynthesis. We studied redwoods (Sequoia sempervirens), including the tallest known tree on Earth (112.7 m), in wet temperate forests of northern California. Our regression analyses of height gradients in leaf functional characteristics estimate a maximum tree height of 122-130 m barring mechanical damage, similar to the tallest recorded trees of the past. As trees grow taller, increasing leaf water stress due to gravity and path length resistance may ultimately limit leaf expansion and photosynthesis for further height growth, even with ample soil moisture.
Bialgebra deformations and algebras of trees

NASA Technical Reports Server (NTRS)

Grossman, Robert; Radford, David

1991-01-01

Let A denote a bialgebra over a field k and let A sub t = A((t)) denote the ring of formal power series with coefficients in A. Assume that A is also isomorphic to a free, associative algebra over k. A simple construction is given which makes A sub t a bialgebra deformation of A. In typical applications, A sub t is neither commutative nor cocommutative. In the terminology of Drinfeld, (1987), A sub t is a quantum group. This construction yields quantum groups associated with families of trees.
Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping

EPA Science Inventory

This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...
"Mad or bad?": burden on caregivers of patients with personality disorders.

PubMed

Bauer, Rita; Döring, Antje; Schmidt, Tanja; Spießl, Hermann

2012-12-01

The burden on caregivers of patients with personality disorders is often greatly underestimated or completely disregarded. Possibilities for caregiver support have rarely been assessed. Thirty interviews were conducted with caregivers of such patients to assess illness-related burden. Responses were analyzed with a mixed method of qualitative and quantitative analysis in a sequential design. Patient and caregiver data, including sociodemographic and disease-related variables, were evaluated with regression analysis and regression trees. Caregiver statements (n = 404) were summarized into 44 global statements. The most frequent global statements were worries about the burden on other family members (70.0%), poor cooperation with clinical centers and other institutions (60.0%), financial burden (56.7%), worry about the patient's future (53.3%), and dissatisfaction with the patient's treatment and rehabilitation (53.3%). Linear regression and regression tree analysis identified predictors for more burdened caregivers. Caregivers of patients with personality disorders experience a variety of burdens, some disorder specific. Yet these caregivers often receive little attention or support.
Automatic localization of bifurcations and vessel crossings in digital fundus photographs using location regression

NASA Astrophysics Data System (ADS)

Niemeijer, Meindert; Dumitrescu, Alina V.; van Ginneken, Bram; Abrámoff, Michael D.

2011-03-01

Parameters extracted from the vasculature on the retina are correlated with various conditions such as diabetic retinopathy and cardiovascular diseases such as stroke. Segmentation of the vasculature on the retina has been a topic that has received much attention in the literature over the past decade. Analysis of the segmentation result, however, has only received limited attention with most works describing methods to accurately measure the width of the vessels. Analyzing the connectedness of the vascular network is an important step towards the characterization of the complete vascular tree. The retinal vascular tree, from an image interpretation point of view, originates at the optic disc and spreads out over the retina. The tree bifurcates and the vessels also cross each other. The points where this happens form the key to determining the connectedness of the complete tree. We present a supervised method to detect the bifurcations and crossing points of the vasculature of the retina. The method uses features extracted from the vasculature as well as the image in a location regression approach to find those locations of the segmented vascular tree where the bifurcation or crossing occurs (from here, POI, points of interest). We evaluate the method on the publicly available DRIVE database in which an ophthalmologist has marked the POI.
A Metric on Phylogenetic Tree Shapes

PubMed Central

Plazzotta, G.

2018-01-01

Abstract The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees’ branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. PMID:28472435
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

PubMed

Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

2010-07-01

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
Hardwood tree growth on amended mine soils in west virginia.

PubMed

Wilson-Kokes, Lindsay; Delong, Curtis; Thomas, Calene; Emerson, Paul; O'Dell, Keith; Skousen, Jeff

2013-09-01

Each year surface mining in Appalachia disrupts large areas of forested land. The Surface Mining Control and Reclamation Act requires coal mine operators to establish a permanent vegetative cover after mining, and current practice emphasizes soil compaction and planting of competitive forage grasses to stabilize the site and control erosion. These practices hinder recolonization of native hardwood trees on these reclaimed sites. Recently reclamation scientists and regulators have encouraged re-establishment of hardwood forests on surface mined land through careful selection and placement of rooting media and proper selection and planting of herbaceous and tree species. To evaluate the effect of rooting media and soil amendments, a 2.8-ha experimental plot was established, with half of the plot being constructed of weathered brown sandstone and half constructed of unweathered gray sandstone. Bark mulch was applied to an area covering both sandstone types, and the ends of the plot were hydroseeded with a tree-compatible herbaceous seed mix, resulting in eight soil treatments. Twelve hardwood tree species were planted, and soil chemical properties and tree growth were measured annually from 2007 to 2012. After six growing seasons, average tree volume index was higher for trees grown on brown sandstone (5333 cm) compared with gray sandstone (3031 cm). Trees planted in mulch outperformed trees on nonmulched treatments (volume index of 6187 cm vs. 4194 cm). Hydroseeding with a tree-compatible mix produced greater ground cover (35 vs. 15%) and resulted in greater tree volume index than nonhydroseed areas (5809 vs. 3403 cm). Soil chemical properties were improved by mulch and improved tree growth, especially on gray sandstone. The average pH of brown sandstone was 5.0 to 5.4, and gray sandstone averaged pH 6.9 to 7.7. The mulch treatment on gray sandstone resulted in tree growth similar to brown sandstone alone and with mulch. After 6 yr, tree growth on brown sandstone was about double the tree growth on gray sandstone, and mulch was a successful amendment to improve tree growth. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.
The Sooner the Better? How Symptom Interval Correlates With Outcome in Children and Adolescents With Solid Tumors: Regression Tree Analysis of the Findings of a Prospective Study.

PubMed

Ferrari, Andrea; Lo Vullo, Salvatore; Giardiello, Daniele; Veneroni, Laura; Magni, Chiara; Clerici, Carlo Alfredo; Chiaravalli, Stefano; Casanova, Michela; Luksch, Roberto; Terenziani, Monica; Spreafico, Filippo; Meazza, Cristina; Catania, Serena; Schiavello, Elisabetta; Biassoni, Veronica; Podda, Marta; Bergamaschi, Luca; Puma, Nadia; Massimino, Maura; Mariani, Luigi

2016-03-01

The potential impact of diagnostic delays on patients' outcomes is a debated issue in pediatric oncology and discordant results have been published so far. We attempted to tackle this issue by analyzing a prospective series of 351 consecutive children and adolescents with solid malignancies using innovative statistical tools. To address the nonlinear complexity of the association between symptom interval and overall survival (OS), a regression tree algorithm was constructed with sequential binary splitting rules and used to identify homogeneous patient groups vis-à-vis functional relationship between diagnostic delay and OS. Three different groups were identified: group A, with localized disease and good prognosis (5-year OS 85.4%); group B, with locally or regionally advanced, or metastatic disease and intermediate prognosis (5-year OS 72.9%), including neuroblastoma, Wilms tumor, non-rhabdomyosarcoma soft tissue sarcoma, and germ cell tumor; and group C, with locally or regionally advanced, or metastatic disease and poor prognosis (5-year OS 45%), including brain tumors, rhabdomyosarcoma, and bone sarcoma. The functional relationship between symptom interval and mortality risk differed between the three subgroups, there being no association in group A (hazard ratio [HR]: 0.96), a positive linear association in group B (HR: 1.48), and a negative linear association in group C (HR: 0.61). Our analysis suggests that at least a subset of patients can benefit from an earlier diagnosis in terms of survival. For others, intrinsic aggressiveness may mask the potential effect of diagnostic delays. Based on these findings, early diagnosis should remain a goal for pediatric cancer patients. © 2015 Wiley Periodicals, Inc.
Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios

PubMed Central

Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang

2014-01-01

Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553
GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models.

PubMed

Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin

2018-09-01

The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.
Using artificial intelligence to predict the risk for posterior capsule opacification after phacoemulsification.

PubMed

Mohammadi, Seyed-Farzad; Sabbaghi, Mostafa; Z-Mehrjardi, Hadi; Hashemi, Hassan; Alizadeh, Somayeh; Majdi, Mercede; Taee, Farough

2012-03-01

To apply artificial intelligence models to predict the occurrence of posterior capsule opacification (PCO) after phacoemulsification. Farabi Eye Hospital, Tehran, Iran. Clinical-based cross-sectional study. The posterior capsule status of eyes operated on for age-related cataract and the need for laser capsulotomy were determined. After a literature review, data polishing, and expert consultation, 10 input variables were selected. The QUEST algorithm was used to develop a decision tree. Three back-propagation artificial neural networks were constructed with 4, 20, and 40 neurons in 2 hidden layers and trained with the same transfer functions (log-sigmoid and linear transfer) and training protocol with randomly selected eyes. They were then tested on the remaining eyes and the networks compared for their performance. Performance indices were used to compare resultant models with the results of logistic regression analysis. The models were trained using 282 randomly selected eyes and then tested using 70 eyes. Laser capsulotomy for clinically significant PCO was indicated or had been performed 2 years postoperatively in 40 eyes. A sample decision tree was produced with accuracy of 50% (likelihood ratio 0.8). The best artificial neural network, which showed 87% accuracy and a positive likelihood ratio of 8, was achieved with 40 neurons. The area under the receiver-operating-characteristic curve was 0.71. In comparison, logistic regression reached accuracy of 80%; however, the likelihood ratio was not measurable because the sensitivity was zero. A prototype artificial neural network was developed that predicted posterior capsule status (requiring capsulotomy) with reasonable accuracy. No author has a financial or proprietary interest in any material or method mentioned. Copyright © 2012 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.
Constructing Phylogenies.

ERIC Educational Resources Information Center

Bilardello, Nicholas; Valdes, Linda

1998-01-01

Introduces a method for constructing phylogenies using molecular traits and elementary graph theory. Discusses analyzing molecular data and using weighted graphs, minimum-weight spanning trees, and rooted cube phylogenies to display the data. (DDR)
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.

PubMed

Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung

2015-01-01

To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Merchantable height of trees in Oregona comparison of current logging practice and volume table specifications.

Treesearch

Don Minore; Donald R. Gedney

1960-01-01

A large proportion of present-day timber cruising is done by measuring or estimating three tree dimensions: diameter at breast height, form class, and merchantable height. Tree volumes are then determined from tables which equate volume to the varying combinations of height, d.b.h., and form class. Assumptions concerning merchantable height were made in constructing...

Probability of Damage to Sidewalks and Curbs by Street Trees in the Tropics

Treesearch

John K. Francis; Bernard R. Parresol; Juana Marin de Patino

1996-01-01

For 75 trees each of 12 species growing along streets in San Juan, Puerto Rico and Merida, Mexico, diameter at breast height and distance to sidewalk or curb was measured and damage (cracking or raising) was evaluated. Logistic analysis was used to construct a model to predict probability of damage to sidewalk or curb. Distance to the pavement, diameter of the tree,...
Wood quality for longleaf pines: a spacing, thinning and pruning study on the Kisatchie National Forest

Treesearch

Chi-Leung So; Thomas L. Eberhardt; Daniel J. Leduc; Leslie H. Groom; Jeffery C. G. Goelz

2010-01-01

Twenty 70-year-old longleaf pine (Pinus palustris Mill.) trees were harvested from a spacing, thinning, and pruning study on the Kisatchie National Forest, LA. Tree property mapping was used to show the property variation within and between three of the trees. The construction of such maps is both time consuming and cost prohibitive using traditional...
Estimating forest crown area removed by selection cutting: a linked regression-GIS approach based on stump diameters

USGS Publications Warehouse

Anderson, S.C.; Kupfer, J.A.; Wilson, R.R.; Cooper, R.J.

2000-01-01

The purpose of this research was to develop a model that could be used to provide a spatial representation of uneven-aged silvicultural treatments on forest crown area. We began by developing species-specific linear regression equations relating tree DBH to crown area for eight bottomland tree species at White River National Wildlife Refuge, Arkansas, USA. The relationships were highly significant for all species, with coefficients of determination (r(2)) ranging from 0.37 for Ulmus crassifolia to nearly 0.80 for Quercus nuttalliii and Taxodium distichum. We next located and measured the diameters of more than 4000 stumps from a single tree-group selection timber harvest. Stump locations were recorded with respect to an established gl id point system and entered into a Geographic Information System (ARC/INFO). The area occupied by the crown of each logged individual was then estimated by using the stump dimensions (adjusted to DBHs) and the regression equations relating tree DBH to crown area. Our model projected that the selection cuts removed roughly 300 m(2) of basal area from the logged sites resulting in the loss of approximate to 55 000 m(2) of crown area. The model developed in this research represents a tool that can be used in conjunction with remote sensing applications to assist in forest inventory and management, as well as to estimate the impacts of selective timber harvest on wildlife.
A cross-sectional study for predicting tail biting risk in pig farms using classification and regression tree analysis.

PubMed

Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A

2017-10-01

Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Annual Tree Growth Predictions From Periodic Measurements

Treesearch

Quang V. Cao

2004-01-01

Data from annual measurements of a loblolly pine (Pinus taeda L.) plantation were available for this study. Regression techniques were employed to model annual changes of individual trees in terms of diameters, heights, and survival probabilities. Subsets of the data that include measurements every 2, 3, 4, 5, and 6 years were used to fit the same...
Understory response following varying levels of overstory removal in mixed conifer stands

Treesearch

Fabian C.C. Uzoh; Leroy K. Dolph; John R. Anstead

1997-01-01

Diameter growth rates of understory trees were measured for periods both before and after overstory removal on six study areas in northern California. All the species responded with increased diameter growth after adjusting to their new environments. Linear regression equations that predict post treatment diameter growth increment of the residual trees are presented...
Delayed conifer tree mortality following fire in California

Treesearch

Sharon M. Hood; Sheri L. Smith; Daniel R. Cluck

2007-01-01

Fire injury was characterized and survival monitored for 5,246 trees from five wildfires in California that occurred between 1999 and 2002. Logistic regression models for predicting the probability of mortality were developed for incense-cedar, Jeffrey pine, ponderosa pine, red fir and white fir. Two-year post-fire preliminary models were developed for incense-cedar,...
Estimating leaf area and leaf biomass of open-grown deciduous urban trees

Treesearch

David J. Nowak

1996-01-01

Logarithmic regression equations were developed to predict leaf area and leaf biomass for open-grown deciduous urban trees based on stem diameter and crown parameters. Equations based on crown parameters produced more reliable estimates. The equations can be used to help quantify forest structure and functions, particularly in urbanizing and urban/suburban areas.
Post-fire tree establishment patterns at the alpine treeline ecotone: Mount Rainier National Park, Washington, USA

Treesearch

Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth

2009-01-01

We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.
Biomass of Yellow-Poplar in Natural Stands in Western North Carolina

Treesearch

Alexander Clark; James G. Schroeder

1977-01-01

Aboveground biomass was determined for yellow-poplar(Liriodendron tulipifera L.) trees 6 to 28 inches d. b. h. growingin natural, uneven-aged mountaincovestandsin western North Carolina.Specific gravity, moisture content, and green weight per cubic foot are presented for the total tree and its components. Tables developed from regression equations show weight and...
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis

PubMed Central

Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

2009-01-01

Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288
Decision tree modeling using R.

PubMed

Zhang, Zhongheng

2016-08-01

In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Estimating extent of mortality associated with the Douglas-fir beetle in the Central and Northern Rockies

Treesearch

Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini

1999-01-01

Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
Polynomial-Time Algorithms for Building a Consensus MUL-Tree

PubMed Central

Cui, Yun; Jansson, Jesper

2012-01-01

Abstract A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host–parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. PMID:22963134
Polynomial-time algorithms for building a consensus MUL-tree.

PubMed

Cui, Yun; Jansson, Jesper; Sung, Wing-Kin

2012-09-01

A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.
Standards for Standardized Logistic Regression Coefficients

ERIC Educational Resources Information Center

Menard, Scott

2011-01-01

Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Complexity of major UK companies between 2006 and 2010: Hierarchical structure method approach

NASA Astrophysics Data System (ADS)

Ulusoy, Tolga; Keskin, Mustafa; Shirvani, Ayoub; Deviren, Bayram; Kantar, Ersin; Çaǧrı Dönmez, Cem

2012-11-01

This study reports on topology of the top 40 UK companies that have been analysed for predictive verification of markets for the period 2006-2010, applying the concept of minimal spanning tree and hierarchical tree (HT) analysis. Construction of the minimal spanning tree (MST) and the hierarchical tree (HT) is confined to a brief description of the methodology and a definition of the correlation function between a pair of companies based on the London Stock Exchange (LSE) index in order to quantify synchronization between the companies. A derivation of hierarchical organization and the construction of minimal-spanning and hierarchical trees for the 2006-2008 and 2008-2010 periods have been used and the results validate the predictive verification of applied semantics. The trees are known as useful tools to perceive and detect the global structure, taxonomy and hierarchy in financial data. From these trees, two different clusters of companies in 2006 were detected. They also show three clusters in 2008 and two between 2008 and 2010, according to their proximity. The clusters match each other as regards their common production activities or their strong interrelationship. The key companies are generally given by major economic activities as expected. This work gives a comparative approach between MST and HT methods from statistical physics and information theory with analysis of financial markets that may give new valuable and useful information of the financial market dynamics.
Biomass expansion factor and root-to-shoot ratio for Pinus in Brazil.

PubMed

Sanquetta, Carlos R; Corte, Ana Pd; da Silva, Fernando

2011-09-24

The Biomass Expansion Factor (BEF) and the Root-to-Shoot Ratio (R) are variables used to quantify carbon stock in forests. They are often considered as constant or species/area specific values in most studies. This study aimed at showing tree size and age dependence upon BEF and R and proposed equations to improve forest biomass and carbon stock. Data from 70 sample Pinus spp. grown in southern Brazil trees in different diameter classes and ages were used to demonstrate the correlation between BEF and R, and forest inventory data, such as DBH, tree height and age. Total dry biomass, carbon stock and CO2 equivalent were simulated using the IPCC default values of BEF and R, corresponding average calculated from data used in this study, as well as the values estimated by regression equations. The mean values of BEF and R calculated in this study were 1.47 and 0.17, respectively. The relationship between BEF and R and the tree measurement variables were inversely related with negative exponential behavior. Simulations indicated that use of fixed values of BEF and R, either IPCC default or current average data, may lead to unreliable estimates of carbon stock inventories and CDM projects. It was concluded that accounting for the variations in BEF and R and using regression equations to relate them to DBH, tree height and age, is fundamental in obtaining reliable estimates of forest tree biomass, carbon sink and CO2 equivalent.
A Distance Measure for Genome Phylogenetic Analysis

NASA Astrophysics Data System (ADS)

Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

PubMed

Ishwaran, Hemant; Lu, Min

2018-06-04

Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.

Modeling of quantitative relationships between physicochemical properties of active pharmaceutical ingredients and tensile strength of tablets using a boosted tree.

PubMed

Hayashi, Yoshihiro; Oishi, Takuya; Shirotori, Kaede; Marumo, Yuki; Kosugi, Atsushi; Kumada, Shungo; Hirai, Daijiro; Takayama, Kozo; Onuki, Yoshinori

2018-07-01

The aim of this study was to explore the potential of boosted tree (BT) to develop a correlation model between active pharmaceutical ingredient (API) characteristics and a tensile strength (TS) of tablets as critical quality attributes. First, we evaluated 81 kinds of API characteristics, such as particle size distribution, bulk density, tapped density, Hausner ratio, moisture content, elastic recovery, molecular weight, and partition coefficient. Next, we prepared tablets containing 50% API, 49% microcrystalline cellulose, and 1% magnesium stearate using direct compression at 6, 8, and 10 kN, and measured TS. Then, we applied BT to our dataset to develop a correlation model. Finally, the constructed BT model was validated using k-fold cross-validation. Results showed that the BT model achieved high-performance statistics, whereas multiple regression analysis resulted in poor estimations. Sensitivity analysis of the BT model revealed that diameter of powder particles at the 10th percentile of the cumulative percentage size distribution was the most crucial factor for TS. In addition, the influences of moisture content, partition coefficients, and modal diameter were appreciably meaningful factors. This study demonstrates that BT model could provide comprehensive understanding of the latent structure underlying APIs and TS of tablets.
Modeling potential distribution of Oligoryzomys longicaudatus, the Andes virus (Genus: Hantavirus) reservoir, in Argentina.

PubMed

Andreo, Verónica; Glass, Gregory; Shields, Timothy; Provensal, Cecilia; Polop, Jaime

2011-09-01

We constructed a model to predict the potential distribution of Oligoryzomys longicaudatus, the reservoir of Andes virus (Genus: Hantavirus), in Argentina. We developed an extensive database of occurrence records from published studies and our own surveys and compared two methods to model the probability of O. longicaudatus presence; logistic regression and MaxEnt algorithm. The environmental variables used were tree, grass and bare soil cover from MODIS imagery and, altitude and 19 bioclimatic variables from WorldClim database. The models performances were evaluated and compared both by threshold dependent and independent measures. The best models included tree and grass cover, mean diurnal temperature range, and precipitation of the warmest and coldest seasons. The potential distribution maps for O. longicaudatus predicted the highest occurrence probabilities along the Andes range, from 32°S and narrowing southwards. They also predicted high probabilities for the south-central area of Argentina, reaching the Atlantic coast. The Hantavirus Pulmonary Syndrome cases coincided with mean occurrence probabilities of 95 and 77% for logistic and MaxEnt models, respectively. HPS transmission zones in Argentine Patagonia matched the areas with the highest probability of presence. Therefore, colilargos presence probability may provide an approximate risk of transmission and act as an early tool to guide control and prevention plans.
Fire frequency in the Interior Columbia River Basin: Building regional models from fire history data

USGS Publications Warehouse

McKenzie, D.; Peterson, D.L.; Agee, James K.

2000-01-01

Fire frequency affects vegetation composition and successional pathways; thus it is essential to understand fire regimes in order to manage natural resources at broad spatial scales. Fire history data are lacking for many regions for which fire management decisions are being made, so models are needed to estimate past fire frequency where local data are not yet available. We developed multiple regression models and tree-based (classification and regression tree, or CART) models to predict fire return intervals across the interior Columbia River basin at 1-km resolution, using georeferenced fire history, potential vegetation, cover type, and precipitation databases. The models combined semiqualitative methods and rigorous statistics. The fire history data are of uneven quality; some estimates are based on only one tree, and many are not cross-dated. Therefore, we weighted the models based on data quality and performed a sensitivity analysis of the effects on the models of estimation errors that are due to lack of cross-dating. The regression models predict fire return intervals from 1 to 375 yr for forested areas, whereas the tree-based models predict a range of 8 to 150 yr. Both types of models predict latitudinal and elevational gradients of increasing fire return intervals. Examination of regional-scale output suggests that, although the tree-based models explain more of the variation in the original data, the regression models are less likely to produce extrapolation errors. Thus, the models serve complementary purposes in elucidating the relationships among fire frequency, the predictor variables, and spatial scale. The models can provide local managers with quantitative information and provide data to initialize coarse-scale fire-effects models, although predictions for individual sites should be treated with caution because of the varying quality and uneven spatial coverage of the fire history database. The models also demonstrate the integration of qualitative and quantitative methods when requisite data for fully quantitative models are unavailable. They can be tested by comparing new, independent fire history reconstructions against their predictions and can be continually updated, as better fire history data become available.
Live phylogeny with polytomies: Finding the most compact parsimonious trees.

PubMed

Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G

2017-08-01

Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Developing a dengue forecast model using machine learning: A case study in China.

PubMed

Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun

2017-10-01

In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?

PubMed

Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M

2018-05-01

Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
A Metric on Phylogenetic Tree Shapes.

PubMed

Colijn, C; Plazzotta, G

2018-01-01

The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees' branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
National scale biomass estimators for United States tree species

Treesearch

Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey

2003-01-01

Estimates of national-scale forest carbon (C) stocks and fluxes are typically based on allometric regression equations developed using dimensional analysis techniques. However, the literature is inconsistent and incomplete with respect to large-scale forest C estimation. We compiled all available diameter-based allometric regression equations for estimating total...
Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

PubMed Central

Reaz, Rezwana; Bayzid, Md. Shamsuzzoha; Rahman, M. Sohel

2014-01-01

Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. PMID:25117474
A Trap For Capturing Arthropods Crawling up Tree Boles

Treesearch

James L. Hanula; Kirsten C.P. New

1996-01-01

A simple trap is described that captures arthropods as they crawl up tree boles. Constructed from metal funnels, plastic sandwich containers, and specimen cups, the traps can be assembled by one person at a rate of 5 to 6 per hour and installed in 2 to 3 minutes. Specimen collection required 15 to 20 seconds per trap. In 1993, three traps were placed on each tree. In...
An improved spatial contour tree constructed method

NASA Astrophysics Data System (ADS)

Zheng, Yi; Zhang, Ling; Guilbert, Eric; Long, Yi

2018-05-01

Contours are important data to delineate the landform on a map. A contour tree provides an object-oriented description of landforms and can be used to enrich the topological information. The traditional contour tree is used to store topological relationships between contours in a hierarchical structure and allows for the identification of eminences and depressions as sets of nested contours. This research proposes an improved contour tree so-called spatial contour tree that contains not only the topological but also the geometric information. It can be regarded as a terrain skeleton in 3-dimention, and it is established based on the spatial nodes of contours which have the latitude, longitude and elevation information. The spatial contour tree is built by connecting spatial nodes from low to high elevation for a positive landform, and from high to low elevation for a negative landform to form a hierarchical structure. The connection between two spatial nodes can provide the real distance and direction as a Euclidean vector in 3-dimention. In this paper, the construction method is tested in the experiment, and the results are discussed. The proposed hierarchical structure is in 3-demintion and can show the skeleton inside a terrain. The structure, where all nodes have geo-information, can be used to distinguish different landforms and applied for contour generalization with consideration of geographic characteristics.
Predictors of Gleason Score (GS) upgrading on subsequent prostatectomy: a single Institution study in a cohort of patients with GS 6

PubMed Central

Mehta, Vikas; Rycyna, Kevin; Baesens, Bart MM; Barkan, Güliz A; Paner, Gladell P; Flanigan, Robert C; Wojcik, Eva M; Venkataraman, Girish

2012-01-01

Background Biopsy Gleason score (bGS) remains an important prognostic indicator for adverse outcomes in Prostate Cancer (PCA). In the light of recent studies purporting difference in prognostic outcomes for the subgroups of GS7 group (primary Gleason pattern 4 vs. 3), upgrading of a bGS of 6 to a GS≥7 has serious implications. We sought to identify pre-operative factors associated with upgrading in a cohort of GS6 patients who underwent prostatectomy. Design We identified 281 cases of GS6 PCA on biopsy with subsequent prostatectomies. Using data on pre-operative variables (age, PSA, biopsy pathology parameters), logistic regression models (LRM) were developed to identify factors that could be used to predict upgrading to GS≥7 on subsequent prostatectomy. A decision tree (DT) was constructed. Results 92 of 281 cases (32.7%) were upgraded on subsequent prostatectomy. LRM identified a model with two variables with statistically significant ability to predict upgrading, including pre-biopsy PSA (Odds Ratio 8.66; 2.03-37.49, 95% CI) and highest percentage of cancer at any single biopsy site (Odds Ratio 1.03, 1.01-1.05, 95% CI). This two-parameter model yielded an area under curve of 0.67. The decision tree was constructed using only 3 leave nodes; with a test set classification accuracy of 70%. Conclusions A simplistic model using clinical and biopsy data is able to predict the likelihood of upgrading of GS with an acceptable level of certainty. External validation of these findings along with development of a nomogram will aid in better stratifying the cohort of low risk patients as based on the GS. PMID:22949931
Towards organ printing: engineering an intra-organ branched vascular tree

PubMed Central

Visconti, Richard P; Kasyanov, Vladimir; Gentile, Carmine; Zhang, Jing; Markwald, Roger R; Mironov, Vladimir

2013-01-01

Importance of the field Effective vascularization of thick three-dimensional engineered tissue constructs is a problem in tissue engineering. As in native organs, a tissue-engineered intra-organ vascular tree must be comprised of a network of hierarchically branched vascular segments. Despite this requirement, current tissue-engineering efforts are still focused predominantly on engineering either large-diameter macrovessels or microvascular networks. Areas covered in this review We present the emerging concept of organ printing or robotic additive biofabrication of an intra-organ branched vascular tree, based on the ability of vascular tissue spheroids to undergo self-assembly. What the reader will gain The feasibility and challenges of this robotic biofabrication approach to intra-organ vascularization for tissue engineering based on organ-printing technology using self-assembling vascular tissue spheroids including clinically relevantly vascular cell sources are analyzed. Take home message It is not possible to engineer 3D thick tissue or organ constructs without effective vascularization. An effective intra-organ vascular system cannot be built by the simple connection of large-diameter vessels and microvessels. Successful engineering of functional human organs suitable for surgical implantation will require concomitant engineering of a ‘built in’ intra-organ branched vascular system. Organ printing enables biofabrication of human organ constructs with a ‘built in’ intra-organ branched vascular tree. PMID:20132061
A hydraulic-photosynthetic model based on extended HLH and its application to Coast redwood (Sequoia sempervirens).

PubMed

Du, Ning; Fan, Jintu; Chen, Shuo; Liu, Yang

2008-07-21

Although recent investigations [Ryan, M.G., Yoder, B.J., 1997. Hydraulic limits to tree height and tree growth. Bioscience 47, 235-242; Koch, G.W., Sillett, S.C.,Jennings, G.M.,Davis, S.D., 2004. The limits to tree height. Nature 428, 851-854; Niklas, K.J., Spatz, H., 2004. Growth and hydraulic (not mechanical) constraints govern the scaling of tree height and mass. Proc. Natl Acad. Sci. 101, 15661-15663; Ryan, M.G., Phillips, N., Bond, B.J., 2006. Hydraulic limitation hypothesis revisited. Plant Cell Environ. 29, 367-381; Niklas, K.J., 2007. Maximum plant height and the biophysical factors that limit it. Tree Physiol. 27, 433-440; Burgess, S.S.O., Dawson, T.E., 2007. Predicting the limits to tree height using statistical regressions of leaf traits. New Phytol. 174, 626-636] suggested that the hydraulic limitation hypothesis (HLH) is the most plausible theory to explain the biophysical limits to maximum tree height and the decline in tree growth rate with age, the analysis is largely qualitative or based on statistical regression. Here we present an integrated biophysical model based on the principle that trees develop physiological compensations (e.g. the declined leaf water potential and the tapering of conduits with heights [West, G.B., Brown, J.H., Enquist, B.J., 1999. A general model for the structure and allometry of plant vascular systems. Nature 400, 664-667]) to resist the increasing water stress with height, the classical HLH and the biochemical limitations on photosynthesis [von Caemmerer, S., 2000. Biochemical Models of Leaf Photosynthesis. CSIRO Publishing, Australia]. The model has been applied to the tallest trees in the world (viz. Coast redwood (Sequoia sempervirens)). Xylem water potential, leaf carbon isotope composition, leaf mass to area ratio at different heights derived from the model show good agreements with the experimental measurements of Koch et al. [2004. The limits to tree height. Nature 428, 851-854]. The model also well explains the universal trend of declining growth rate with age.
Estimating tree species diversity in the savannah using NDVI and woody canopy cover

NASA Astrophysics Data System (ADS)

Madonsela, Sabelo; Cho, Moses Azong; Ramoelo, Abel; Mutanga, Onisimo; Naidoo, Laven

2018-04-01

Remote sensing applications in biodiversity research often rely on the establishment of relationships between spectral information from the image and tree species diversity measured in the field. Most studies have used normalized difference vegetation index (NDVI) to estimate tree species diversity on the basis that it is sensitive to primary productivity which defines spatial variation in plant diversity. The NDVI signal is influenced by photosynthetically active vegetation which, in the savannah, includes woody canopy foliage and grasses. The question is whether the relationship between NDVI and tree species diversity in the savanna depends on the woody cover percentage. This study explored the relationship between woody canopy cover (WCC) and tree species diversity in the savannah woodland of southern Africa and also investigated whether there is a significant interaction between seasonal NDVI and WCC in the factorial model when estimating tree species diversity. To fulfil our aim, we followed stratified random sampling approach and surveyed tree species in 68 plots of 90 m × 90 m across the study area. Within each plot, all trees with diameter at breast height of >10 cm were sampled and Shannon index - a common measure of species diversity which considers both species richness and abundance - was used to quantify tree species diversity. We then extracted WCC in each plot from existing fractional woody cover product produced from Synthetic Aperture Radar (SAR) data. Factorial regression model was used to determine the interaction effect between NDVI and WCC when estimating tree species diversity. Results from regression analysis showed that (i) WCC has a highly significant relationship with tree species diversity (r2 = 0.21; p < 0.01), (ii) the interaction between the NDVI and WCC is not significant, however, the factorial model significantly reduced the error of prediction (RMSE = 0.47, p < 0.05) compared to NDVI (RMSE = 0.49) or WCC (RMSE = 0.49) model during the senescence period. The result justifies our assertion that combining NDVI with WCC will be optimal for biodiversity estimation during the senescence period.
A fully resolved consensus between fully resolved phylogenetic trees.

PubMed

Quitzau, José Augusto Amgarten; Meidanis, João

2006-03-31

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations.
Fault Tree Analysis: An Emerging Methodology for Instructional Science.

ERIC Educational Resources Information Center

Wood, R. Kent; And Others

1979-01-01

Describes Fault Tree Analysis, a tool for systems analysis which attempts to identify possible modes of failure in systems to increase the probability of success. The article defines the technique and presents the steps of FTA construction, focusing on its application to education. (RAO)
Risk Management in Complex Construction Projects that Apply Renewable Energy Sources: A Case Study of the Realization Phase of the Energis Educational and Research Intelligent Building

NASA Astrophysics Data System (ADS)

Krechowicz, Maria

2017-10-01

Nowadays, one of the characteristic features of construction industry is an increased complexity of a growing number of projects. Almost each construction project is unique, has its project-specific purpose, its own project structural complexity, owner’s expectations, ground conditions unique to a certain location, and its own dynamics. Failure costs and costs resulting from unforeseen problems in complex construction projects are very high. Project complexity drivers pose many vulnerabilities to a successful completion of a number of projects. This paper discusses the process of effective risk management in complex construction projects in which renewable energy sources were used, on the example of the realization phase of the ENERGIS teaching-laboratory building, from the point of view of DORBUD S.A., its general contractor. This paper suggests a new approach to risk management for complex construction projects in which renewable energy sources were applied. The risk management process was divided into six stages: gathering information, identification of the top, critical project risks resulting from the project complexity, construction of the fault tree for each top, critical risks, logical analysis of the fault tree, quantitative risk assessment applying fuzzy logic and development of risk response strategy. A new methodology for the qualitative and quantitative risk assessment for top, critical risks in complex construction projects was developed. Risk assessment was carried out applying Fuzzy Fault Tree analysis on the example of one top critical risk. Application of the Fuzzy sets theory to the proposed model allowed to decrease uncertainty and eliminate problems with gaining the crisp values of the basic events probability, common during expert risk assessment with the objective to give the exact risk score of each unwanted event probability.
Crown area equations for 13 species of trees and shrubs in northern California and southwestern Oregon

Treesearch

Fabian C.C. Uzoh; Martin W. Ritchie

1996-01-01

The equations presented predict crown area for 13 species of trees and shrubs which may be found growing in competition with commercial conifers during early stages of stand development. The equations express crown area as a function of basal area and height. Parameters were estimated for each species individually using weighted nonlinear least square regression.
Biomass equations for major tree species of the Northeast

Treesearch

Louise M. Tritton; James W. Hornbeck

1982-01-01

Regression equations are used in both forestry and ecosystem studies to estimate tree biomass from field measurements of dbh (diameter at breast height) or a combination of dbh and height. Literature on biomass is reviewed, and 178 sets of publish equation for 25 species common to the Northeastern Unites States are listed. On the basis of these equations, estimates of...

Stand basal-area and tree-diameter growth in red spruce-fir forests in Maine, 1960-80

Treesearch

S.J. Zarnoch; D.A. Gansner; D.S. Powell; T.A. Birch; T.A. Birch

1990-01-01

Stand basal-area change and individual surviving red spruce d.b.h. growth from 1960 to 1980 were analyzed for red spruce-fir stands in Maine. Regression modeling was used to relate these measures of growth to stand and tree conditions and to compare growth throughout the period. Results indicate a decline in growth.
Examination of the Arborsonic Decay Detector for Detecting Bacterial Wetwood in Red Oaks

Treesearch

Zicai Xu; Theodor D. Leininger; James G. Williams; Frank H. Tainter

2000-01-01

The Arborsonic Decay Detector (ADD; Fujikura Europe Limited, Wiltshire, England) was used to measure the time it took an ultrasound wave to cross 280 diameters in red oak trees with varying degrees of bacterial wetwood or heartwood decay. Linear regressions derived from the ADD readings of trees in Mississippi and South Carolina with wetwood and heartwood decay...
Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution

NASA Astrophysics Data System (ADS)

Kisi, Ozgur; Parmar, Kulwinder Singh

2016-03-01

This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

PubMed

Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

2018-07-01

In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Topography and crop management are key factors for the development of american leaf spot epidemics on coffee in costa rica.

PubMed

Avelino, Jacques; Cabut, Sandrine; Barboza, Bernardo; Barquero, Miguel; Alfaro, Ronny; Esquivel, César; Durand, Jean-François; Cilas, Christian

2007-12-01

ABSTRACT We monitored the development of American leaf spot of coffee, a disease caused by the gemmiferous fungus Mycena citricolor, in 57 plots in Costa Rica for 1 or 2 years in order to gain a clearer understanding of conditions conducive to the disease and improve its control. During the investigation, characteristics of the coffee trees, crop management, and the environment were recorded. For the analyses, we used partial least-squares regression via the spline functions (PLSS), which is a nonlinear extension to partial least-squares regression (PLS). The fungus developed well in areas located between approximately 1,100 and 1,550 m above sea level. Slopes were conducive to its development, but eastern-facing slopes were less affected than the others, probably because they were more exposed to sunlight, especially in the rainy season. The distance between planting rows, the shade percentage, coffee tree height, the type of shade, and the pruning system explained disease intensity due to their effects on coffee tree shading and, possibly, on the humidity conditions in the plot. Forest trees and fruit trees intercropped with coffee provided particularly propitious conditions. Apparently, fertilization was unfavorable for the disease, probably due to dilution phenomena associated with faster coffee tree growth. Finally, series of wet spells interspersed with dry spells, which were frequent in the middle of the rainy season, were critical for the disease, probably because they affected the production and release of gemmae and their viability. These results could be used to draw up a map of epidemic risks taking topographical factors into account. To reduce those risks and improve chemical control, our results suggested that farmers should space planting rows further apart, maintain light shading in the plantation, and prune their coffee trees.
Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

NASA Astrophysics Data System (ADS)

Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

2018-03-01

Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.
Mapping and spatial-temporal modeling of Bromus tectorum invasion in central Utah

NASA Astrophysics Data System (ADS)

Jin, Zhenyu

Cheatgrass, or Downy Brome, is an exotic winter annual weed native to the Mediterranean region. Since its introduction to the U.S., it has become a significant weed and aggressive invader of sagebrush, pinion-juniper, and other shrub communities, where it can completely out-compete native grasses and shrubs. In this research, remotely sensed data combined with field collected data are used to investigate the distribution of the cheatgrass in Central Utah, to characterize the trend of the NDVI time-series of cheatgrass, and to construct a spatially explicit population-based model to simulate the spatial-temporal dynamics of the cheatgrass. This research proposes a method for mapping the canopy closure of invasive species using remotely sensed data acquired at different dates. Different invasive species have their own distinguished phenologies and the satellite images in different dates could be used to capture the phenology. The results of cheatgrass abundance prediction have a good fit with the field data for both linear regression and regression tree models, although the regression tree model has better performance than the linear regression model. To characterize the trend of NDVI time-series of cheatgrass, a novel smoothing algorithm named RMMEH is presented in this research to overcome some drawbacks of many other algorithms. By comparing the performance of RMMEH in smoothing a 16-day composite of the MODIS NDVI time-series with that of two other methods, which are the 4253EH, twice and the MVI, we have found that RMMEH not only keeps the original valid NDVI points, but also effectively removes the spurious spikes. The reconstructed NDVI time-series of different land covers are of higher quality and have smoother temporal trend. To simulate the spatial-temporal dynamics of cheatgrass, a spatially explicit population-based model is built applying remotely sensed data. The comparison between the model output and the ground truth of cheatgrass closure demonstrates that the model could successfully simulate the spatial-temporal dynamics of cheatgrass in a simple cheatgrass-dominant environment. The simulation of the functional response of different prescribed fire rates also shows that this model is helpful to answer management questions like, "What are the effects of prescribed fire to invasive species?" It demonstrates that a medium fire rate of 10% can successfully prevent cheatgrass invasion.
A transportable magnetic resonance imaging system for in situ measurements of living trees: the Tree Hugger.

PubMed

Jones, M; Aptaker, P S; Cox, J; Gardiner, B A; McDonald, P J

2012-05-01

This paper presents the design of the 'Tree Hugger', an open access, transportable, 1.1 MHz (1)H nuclear magnetic resonance imaging system for the in situ analysis of living trees in the forest. A unique construction employing NdFeB blocks embedded in a reinforced carbon fibre frame is used to achieve access up to 210 mm and to allow the magnet to be transported. The magnet weighs 55 kg. The feasibility of imaging living trees in situ using the 'Tree Hugger' is demonstrated. Correlations are drawn between NMR/MRI measurements and other indicators such as relative humidity, soil moisture and net solar radiation. Copyright © 2012 Elsevier Inc. All rights reserved.
Chilling and heat requirements for flowering in temperate fruit trees

NASA Astrophysics Data System (ADS)

Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike

2014-08-01

Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut ( Castanea mollissima Blume) and jujube ( Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing's cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.
Chilling and heat requirements for flowering in temperate fruit trees.

PubMed

Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike

2014-08-01

Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut (Castanea mollissima Blume) and jujube (Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing’s cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.
A financial network perspective of financial institutions' systemic risk contributions

NASA Astrophysics Data System (ADS)

Huang, Wei-Qiang; Zhuang, Xin-Tian; Yao, Shuang; Uryasev, Stan

2016-08-01

This study considers the effects of the financial institutions' local topology structure in the financial network on their systemic risk contribution using data from the Chinese stock market. We first measure the systemic risk contribution with the Conditional Value-at-Risk (CoVaR) which is estimated by applying dynamic conditional correlation multivariate GARCH model (DCC-MVGARCH). Financial networks are constructed from dynamic conditional correlations (DCC) with graph filtering method of minimum spanning trees (MSTs). Then we investigate dynamics of systemic risk contributions of financial institution. Also we study dynamics of financial institution's local topology structure in the financial network. Finally, we analyze the quantitative relationships between the local topology structure and systemic risk contribution with panel data regression analysis. We find that financial institutions with greater node strength, larger node betweenness centrality, larger node closeness centrality and larger node clustering coefficient tend to be associated with larger systemic risk contributions.
Learning Instance-Specific Predictive Models

PubMed Central

Visweswaran, Shyam; Cooper, Gregory F.

2013-01-01

This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
Investigating Tree Thinking & Ancestry with Cladograms

ERIC Educational Resources Information Center

Davenport, K. D.; Milks, Kirstin Jane; Van Tassell, Rebecca

2015-01-01

Interpreting cladograms is a key skill for biological literacy. In this lesson, students interpret cladograms based on familial relationships and language relationships to build their understanding of tree thinking and to construct a definition of "common ancestor." These skills can then be applied to a true biological cladogram.
Spatial properties of snow cover in the Upper Merced River Basin: implications for a distributed snow measurement network

NASA Astrophysics Data System (ADS)

Bouffon, T.; Rice, R.; Bales, R.

2006-12-01

The spatial distributions of snow water equivalent (SWE) and snow depth within a 1, 4, and 16 km2 grid element around two automated snow pillows in a forested and open- forested region of the Upper Merced River Basin (2,800 km2) of Yosemite National Park were characterized using field observations and analyzed using binary regression trees. Snow surveys occurred at the forested site during the accumulation and ablation seasons, while at the open-forest site a survey was performed only during the accumulation season. An average of 130 snow depth and 7 snow density measurements were made on each survey, within the 4 km2 grid. Snow depth was distributed using binary regression trees and geostatistical methods using the physiographic parameters (e.g. elevation, slope, vegetation, aspect). Results in the forest region indicate that the snow pillow overestimated average SWE within the 1, 4, and 16 km2 areas by 34 percent during ablation, but during accumulation the snow pillow provides a good estimate of the modeled mean SWE grid value, however it is suspected that the snow pillow was underestimating SWE. However, at the open forest site, during accumulation, the snow pillow was 28 percent greater than the mean modeled grid element. In addition, the binary regression trees indicate that the independent variables of vegetation, slope, and aspect are the most influential parameters of snow depth distribution. The binary regression tree and multivariate linear regression models explain about 60 percent of the initial variance for snow depth and 80 percent for density, respectively. This short-term study provides motivation and direction for the installation of a distributed snow measurement network to fill the information gap in basin-wide SWE and snow depth measurements. Guided by these results, a distributed snow measurement network was installed in the Fall 2006 at Gin Flat in the Upper Merced River Basin with the specific objective of measuring accumulation and ablation across topographic variables with the aim of providing guidance for future larger scale observation network designs.
Environmental Impact Research Program. Restoration of Problem Soil Materials at Corps of Engineers Construction Sites.

DTIC Science & Technology

1985-05-01

the total weight of a given population of organisms. Browse: Twigs or shoots, with or without attached leaves, of shrubs , trees, or woody vines ...volunteer woody plants, or the fsuccessful establishment, later on, of planted shrubs , trees, and ground covers. 184. Some problem soils absolutely...properly prepared seedbed. Woody plants, such as shrubs and trees, are established by seedling transplants. However, some woody species can be seeded
TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.

PubMed

Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald

2018-01-01

Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Predictors of adherence with self-care guidelines among persons with type 2 diabetes: results from a logistic regression tree analysis.

PubMed

Yamashita, Takashi; Kart, Cary S; Noe, Douglas A

2012-12-01

Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.
Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

NASA Astrophysics Data System (ADS)

Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

2018-02-01

A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research

ERIC Educational Resources Information Center

He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne

2018-01-01

In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
A systematic risk management approach employed on the CloudSat project

NASA Technical Reports Server (NTRS)

Basilio, R. R.; Plourde, K. S.; Lam, T.

2000-01-01

The CloudSat Project has developed a simplified approach for fault tree analysis and probabilistic risk assessment. A system-level fault tree has been constructed to identify credible fault scenarios and failure modes leading up to a potential failure to meet the nominal mission success criteria.

Soy protein adhesives

Treesearch

Charles R. Frihart

2010-01-01

In the quest to manufacture and use building materials that are more environmentally friendly, soy adhesives can be an important component. Trees fix and store carbon dioxide in the atmosphere. After the trees are harvested, machinery converts the wood into strands, which are then bonded together with adhesives to form strandboard, used in constructing long-lasting...
Growing trees in child brains: graph theoretical analysis of electroencephalography-derived minimum spanning tree in 5- and 7-year-old children reflects brain maturation.

PubMed

Boersma, Maria; Smit, Dirk J A; Boomsma, Dorret I; De Geus, Eco J C; Delemarre-van de Waal, Henriette A; Stam, Cornelis J

2013-01-01

The child brain is a small-world network, which is hypothesized to change toward more ordered configurations with development. In graph theoretical studies, comparing network topologies under different conditions remains a critical point. Constructing a minimum spanning tree (MST) might present a solution, since it does not require setting a threshold and uses a fixed number of nodes and edges. In this study, the MST method is introduced to examine developmental changes in functional brain network topology in young children. Resting-state electroencephalography was recorded from 227 children twice at 5 and 7 years of age. Synchronization likelihood (SL) weighted matrices were calculated in three different frequency bands from which MSTs were constructed, which represent constructs of the most important routes for information flow in a network. From these trees, several parameters were calculated to characterize developmental change in network organization. The MST diameter and eccentricity significantly increased, while the leaf number and hierarchy significantly decreased in the alpha band with development. Boys showed significant higher leaf number, betweenness, degree and hierarchy and significant lower SL, diameter, and eccentricity than girls in the theta band. The developmental changes indicate a shift toward more decentralized line-like trees, which supports the previously hypothesized increase toward regularity of brain networks with development. Additionally, girls showed more line-like decentralized configurations, which is consistent with the view that girls are ahead of boys in brain development. MST provides an elegant method sensitive to capture subtle developmental changes in network organization without the bias of network comparison.
Vegetation Continuous Fields--Transitioning from MODIS to VIIRS

NASA Astrophysics Data System (ADS)

DiMiceli, C.; Townshend, J. R.; Sohlberg, R. A.; Kim, D. H.; Kelly, M.

2015-12-01

Measurements of fractional vegetation cover are critical for accurate and consistent monitoring of global deforestation rates. They also provide important parameters for land surface, climate and carbon models and vital background data for research into fire, hydrological and ecosystem processes. MODIS Vegetation Continuous Fields (VCF) products provide four complementary layers of fractional cover: tree cover, non-tree vegetation, bare ground, and surface water. MODIS VCF products are currently produced globally and annually at 250m resolution for 2000 to the present. Additionally, annual VCF products at 1/20° resolution derived from AVHRR and MODIS Long-Term Data Records are in development to provide Earth System Data Records of fractional vegetation cover for 1982 to the present. In order to provide continuity of these valuable products, we are extending the VCF algorithms to create Suomi NPP/VIIRS VCF products. This presentation will highlight the first VIIRS fractional cover product: global percent tree cover at 1 km resolution. To create this product, phenological and physiological metrics were derived from each complete year of VIIRS 8-day surface reflectance products. A supervised regression tree method was applied to the metrics, using training derived from Landsat data supplemented by high-resolution data from Ikonos, RapidEye and QuickBird. The regression tree model was then applied globally to produce fractional tree cover. In our presentation we will detail our methods for creating the VIIRS VCF product. We will compare the new VIIRS VCF product to our current MODIS VCF products and demonstrate continuity between instruments. Finally, we will outline future VIIRS VCF development plans.
Decision tree analysis to stratify risk of de novo non-melanoma skin cancer following liver transplantation.

PubMed

Tanaka, Tomohiro; Voigt, Michael D

2018-03-01

Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI < 40 who did not receive sirolimus, as high risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2 = 0.971, p < 0.0001). Cumulative incidence of NMSC in low, moderate and high risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p < 0.0001). The decision tree model accurately stratifies the risk of developing NMSC in the long-term after LT.
Drought impact functions as intermediate step towards drought damage assessment

NASA Astrophysics Data System (ADS)

Bachmair, Sophie; Svensson, Cecilia; Prosdocimi, Ilaria; Hannaford, Jamie; Helm Smith, Kelly; Svoboda, Mark; Stahl, Kerstin

2016-04-01

While damage or vulnerability functions for floods and seismic hazards have gained considerable attention, there is comparably little knowledge on drought damage or loss. On the one hand this is due to the complexity of the drought hazard affecting different domains of the hydrological cycle and different sectors of human activity. Hence, a single hazard indicator is likely not able to fully capture this multifaceted hazard. On the other hand, drought impacts are often non-structural and hard to quantify or monetize. Examples are impaired navigability of streams, restrictions on domestic water use, reduced hydropower production, reduced tree growth, and irreversible deterioration/loss of wetlands. Apart from reduced crop yield, data about drought damage or loss with adequate spatial and temporal resolution is scarce, making the development of drought damage functions difficult. As an intermediate step towards drought damage functions we exploit text-based reports on drought impacts from the European Drought Impact report Inventory and the US Drought Impact Reporter to derive surrogate information for drought damage or loss. First, text-based information on drought impacts is converted into timeseries of absence versus presence of impacts, or number of impact occurrences. Second, meaningful hydro-meteorological indicators characterizing drought intensity are identified. Third, different statistical models are tested as link functions relating drought hazard indicators with drought impacts: 1) logistic regression for drought impacts coded as binary response variable; and 2) mixture/hurdle models (zero-inflated/zero-altered negative binomial regression) and an ensemble regression tree approach for modeling the number of drought impact occurrences. Testing the predictability of (number of) drought impact occurrences based on cross-validation revealed a good agreement between observed and modeled (number of) impacts for regions at the scale of federal states or provinces with good data availability. Impact functions representing localized drought impacts are more challenging to construct given that less data is available, yet may provide information that more directly addresses stakeholders' needs. Overall, our study contributes insights into how drought intensity translates into ecological and socioeconomic impacts, and how such information may be used for enhancing drought monitoring and early warning.
Downscaling soil moisture over East Asia through multi-sensor data fusion and optimization of regression trees

NASA Astrophysics Data System (ADS)

Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung

2017-04-01

Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An optimization based on pruning of rules derived from the modified regression trees was conducted. Root Mean Square Error (RMSE) and Correlation coefficients (r) were used to optimize the rules, and finally 59 rules from modified regression trees were produced. The results show high validation r (0.79) and low validation RMSE (0.0556m3/m3). The 1 km downscaled soil moisture was evaluated using ground soil moisture data at 14 stations, and both soil moisture data showed similar temporal patterns (average r=0.51 and average RMSE=0.041). The spatial distribution of the 1 km downscaled soil moisture well corresponded with GLDAS soil moisture that caught both extremely dry and wet regions. Correlation between GLDAS and the 1 km downscaled soil moisture during growing season was positive (mean r=0.35) in most regions.
Gene selection for the reconstruction of stem cell differentiation trees: a linear programming approach.

PubMed

Ghadie, Mohamed A; Japkowicz, Nathalie; Perkins, Theodore J

2015-08-15

Stem cell differentiation is largely guided by master transcriptional regulators, but it also depends on the expression of other types of genes, such as cell cycle genes, signaling genes, metabolic genes, trafficking genes, etc. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering can organize cell types into a tree, but in general this tree is different from the differentiation hierarchy itself. Given the differentiation hierarchy and gene expression data at each node, we construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming approach to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a weighted Euclidean metric that uses just 175 genes. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. We then report on the selected genes and their biological functions. Our approach offers a new way to identify genes that may have important roles in stem cell differentiation. tperkins@ohri.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Vegetation succession and impacts of biointrusion on covers used to limit acid mine drainage.

PubMed

Smirnova, Evgeniya; Bussière, Bruno; Tremblay, Francine; Bergeron, Yves

2011-01-01

A cover with capillary barrier effects (CCBE) was constructed in 1998 on the abandoned Lorraine mine tailings impoundment to limit the generation of acid mine drainage. The Ministry of Natural Resources and Fauna of Quebec (MRNF) is responsible for the site and for all restoration works on it, including CCBE construction. The CCBE is made up of three layers: a 0.3-m layer of sand used as a support and capillary break layer; a moisture-retaining layer with a thickness of 0.5 m (this layer is constructed of a nonplastic silt); and a 0.3-m sand and gravel layer on top. The main objective of the CCBE is to maintain one (or more) of the layers at a high degree of water saturation to impede oxygen migration and acid generation. Vegetation succession on the Lorraine CCBE results in an improvement in soil conditions, leading to the installation of deep-rooted species, which could represent a risk to CCBE long-term performance. Hence, the characterization of vegetation succession is an important aspect of the monitoring strategy for the Lorraine CCBE. Species occurrence was documented, and depth of tree roots was measured by excavation on a regular basis. Eight functional groups of plants were identified; herbaceous plants were the most abundant ecological plant groups. Tree ring counts confirmed that tree colonization started the year of CCBE construction (1999). Of the 11 tree species identified, the most abundant were poplar (Populus spp.), paper birch (Betula payrifera Marsh.), black spruce (Picea mariana Mill.), and willow (Salix spp.). Significant differences in occurrence related to environmental conditions were observed for most functional groups. Root excavation showed that tree roots exceeded the depth of the protective layer and started to reach the moisture-retaining layer; in 2008, root average depth was 0.4 m and the maximal root depth was 1.7 m.
Diameter-growth model across shortleaf pine range using regression tree analysis

Treesearch

Daniel Yaussy; Louis Iverson; Anantha Prasad

1999-01-01

Diameter growth of a tree in most gap-phase models is limited by light, nutrients, moisture, and temperature. Growing-season temperature is represented by growing degree days (gdd), which is the sum of the average daily temperatures above a baseline temperature. Gap-phase models determine the north-south range of a species by the gdd limits at the north and south...
Applicability of predictive models of drought-induced tree mortality between the midwest and northeast United States

Treesearch

Eric J. Gustafson

2014-01-01

Regression models developed in the upper Midwest (United States) to predict drought-induced tree mortality from measures of drought (Palmer Drought Severity Index) were tested in the northeastern United States and found inadequate. The most likely cause of this result is that long drought events were rare in the Northeast during the period when inventory data were...
A Pilot Test of Indicator Species to Assess Uniqueness of Oak-Dominated Ecoregions in Central Tennessee

Treesearch

W. Henry McNab; David L. Loftis; Callie J. Schweitzer; Raymond Sheffield

2004-01-01

We used tree indicator species occurring on 438 plots in the Plateau counties of Tennessee to test the uniqueness of four conterminous ecoregions. Multinomial logistic regression indicated that the presence of 14 tree species allowed classification of sample plots according to ecoregion with an average overall accuracy of 75 percent (range 45 to 94 percent). Additional...
Dating tree mortality using log decay in the White Mountains of New Hampshire

Treesearch

Andrew J. Fast; Mark J. Ducey; Jeffrey H. Gove; William B. Leak

2008-01-01

Coarse woody material (CWM) is an important component of forest ecosystems. To meet specific CWM management objectives, it is important to understand rates of decay. We present results from a silvicultural trial at the Bartlett Experimental Forest, in which time of death is known for a large sample of trees. Either a simple table or regression equations that use...
Development of post-fire crown damage mortality thresholds in ponderosa pine

Treesearch

James F. Fowler; Carolyn Hull Sieg; Joel McMillin; Kurt K. Allen; Jose F. Negron; Linda L. Wadleigh; John A. Anhold; Ken E. Gibson

2010-01-01

Previous research has shown that crown scorch volume and crown consumption volume are the major predictors of post-fire mortality in ponderosa pine. In this study, we use piecewise logistic regression models of crown scorch data from 6633 trees in five wildfires from the Intermountain West to locate a mortality threshold at 88% scorch by volume for trees with no crown...
Equations relating compacted and uncompacted live crown ratio for common tree species in the South

Treesearch

KaDonna C. Randolph

2010-01-01

Species-specific equations to predict uncompacted crown ratio (UNCR) from compacted live crown ratio (CCR), tree length, and stem diameter were developed for 24 species and 12 genera in the southern United States. Using data from the US Forest Service Forest Inventory and Analysis program, nonlinear regression was used to model UNCR with a logistic function. Model...
Patterns of tree species diversity and composition in old-field successional forests in central Illinois

Treesearch

Scott M. Bretthauer; George Z. Gertner; Gary L. Rolfe; Jeffery O. Dawson

2003-01-01

Tree species diversity increases and dominance decreases with proximity to forest border in two 60-year-old successional forest stands developed on abandoned agricultural land in Piatt County, Illinois. A regression equation allowed us to quantify an increase in diversity with closeness to forest border for one of the forest stands. Shingle oak is the most dominant...
Regeneration of Douglas-fir cutblocks on the Six Rivers National Forest in northwestern California

Treesearch

R. O. Strothmann

1979-01-01

A survey of 61 cutblocks planted since 1964 evaluated stocking of conifers (trees 1 foot tall or taller) on 2-milacre quadrats. Overall stocking percentage averaged 42.2 and ranged from 15 to 8 1. Overall number of trees per acre averaged 396. In the regression model, based on 36 cutblocks, better stocking was associated with high site class, northerly aspect,...
Finding Minimum-Power Broadcast Trees for Wireless Networks

NASA Technical Reports Server (NTRS)

Arabshahi, Payman; Gray, Andrew; Das, Arindam; El-Sharkawi, Mohamed; Marks, Robert, II

2004-01-01

Some algorithms have been devised for use in a method of constructing tree graphs that represent connections among the nodes of a wireless communication network. These algorithms provide for determining the viability of any given candidate connection tree and for generating an initial set of viable trees that can be used in any of a variety of search algorithms (e.g., a genetic algorithm) to find a tree that enables the network to broadcast from a source node to all other nodes while consuming the minimum amount of total power. The method yields solutions better than those of a prior algorithm known as the broadcast incremental power algorithm, albeit at a slightly greater computational cost.
Genomewide Function Conservation and Phylogeny in the Herpesviridae

PubMed Central

Albà, M. Mar; Das, Rhiju; Orengo, Christine A.; Kellam, Paul

2001-01-01

The Herpesviridae are a large group of well-characterized double-stranded DNA viruses for which many complete genome sequences have been determined. We have extracted protein sequences from all predicted open reading frames of 19 herpesvirus genomes. Sequence comparison and protein sequence clustering methods have been used to construct herpesvirus protein homologous families. This resulted in 1692 proteins being clustered into 243 multiprotein families and 196 singleton proteins. Predicted functions were assigned to each homologous family based on genome annotation and published data and each family classified into seven broad functional groups. Phylogenetic profiles were constructed for each herpesvirus from the homologous protein families and used to determine conserved functions and genomewide phylogenetic trees. These trees agreed with molecular-sequence-derived trees and allowed greater insight into the phylogeny of ungulate and murine gammaherpesviruses. PMID:11156614
Protection of individual ash trees from emerald ash borer (Coleoptera: Buprestidae) with basal soil applications of imidacloprid.

PubMed

Smitley, D R; Rebek, E J; Royalty, R N; Davis, T W; Newhouse, K F

2010-02-01

We conducted field trials at five different locations over a period of 6 yr to investigate the efficacy of imidacloprid applied each spring as a basal soil drench for protection against emerald ash borer, Agrilus planipennis Fairmaire (Coleoptera: Buprestidae). Canopy thinning and emerald ash borer larval density were used to evaluate efficacy for 3-4 yr at each location while treatments continued. Test sites included small urban trees (5-15 cm diameter at breast height [dbh]), medium to large (15-65 cm dbh) trees at golf courses, and medium to large street trees. Annual basal drenches with imidacloprid gave complete protection of small ash trees for three years. At three sites where the size of trees ranged from 23 to 37 cm dbh, we successfully protected all ash trees beginning the test with <60% canopy thinning. Regression analysis of data from two sites reveals that tree size explains 46% of the variation in efficacy of imidacloprid drenches. The smallest trees (<30 cm dbh) remained in excellent condition for 3 yr, whereas most of the largest trees (>38 cm dbh) declined to a weakened state and undesirable appearance. The five-fold increase in trunk and branch surface area of ash trees as the tree dbh doubles may account for reduced efficacy on larger trees, and suggests a need to increase treatment rates for larger trees.
Construction and Maintenance of the Optimal Photosynthetic Systems of the Leaf, Herbaceous Plant and Tree: an Eco-developmental Treatise

PubMed Central

TERASHIMA, ICHIRO; ARAYA, TAKAO; MIYAZAWA, SHIN-ICHI; SONE, KOSEI; YANO, SATOSHI

2004-01-01

• Background and Aims The paper by Monsi and Saeki in 1953 (Japanese Journal of Botany 14: 22–52) was pioneering not only in mathematical modelling of canopy photosynthesis but also in eco-developmental studies of seasonal changes in leaf canopies. • Scope Construction and maintenance mechanisms of efficient photosynthetic systems at three different scaling levels—single leaves, herbaceous plants and trees—are reviewed mainly based on the nitrogen optimization theory. First, the nitrogen optimization theory with respect to the canopy and the single leaf is briefly introduced. Secondly, significance of leaf thickness in CO2 diffusion in the leaf and in leaf photosynthesis is discussed. Thirdly, mechanisms of adjustment of photosynthetic properties of the leaf within the herbaceous plant individual throughout its life are discussed. In particular, roles of sugar sensing, redox control and of cytokinin are highlighted. Finally, the development of a tree is considered. • Conclusions Various mechanisms contribute to construction and maintenance of efficient photosynthetic systems. Molecular backgrounds of these ecologically important mechanisms should be clarified. The construction mechanisms of the tree cannot be explained solely by the nitrogen optimization theory. It is proposed that the pipe model theory in its differential form could be a potential tool in future studies in this research area. PMID:15598701

Development of a Prognostic Marker for Lung Cancer Using Analysis of Tumor Evolution

DTIC Science & Technology

2017-08-01

SUPPLEMENTARY NOTES 14. ABSTRACT The goal of this project is to sequence the exomes of single tumor cells from tumors in order to construct evolutionary trees...dissociation, tumor cell isolation, whole genome amplification, and exome sequencing. We have begun to sequence the exomes of single cells and to...of populations, the evolution of tumor cells within a tumor can be diagrammed on a phylogenetic tree. The more diverse a tumor’s phylogenetic tree
A Trichosporonales genome tree based on 27 haploid and three evolutionarily conserved 'natural' hybrid genomes.

PubMed

Takashima, Masako; Sriswasdi, Sira; Manabe, Ri-Ichiroh; Ohkuma, Moriya; Sugita, Takashi; Iwasaki, Wataru

2018-01-01

To construct a backbone tree consisting of basidiomycetous yeasts, draft genome sequences from 25 species of Trichosporonales (Tremellomycetes, Basidiomycota) were generated. In addition to the hybrid genomes of Trichosporon coremiiforme and Trichosporon ovoides that we described previously, we identified an interspecies hybrid genome in Cutaneotrichosporon mucoides (formerly Trichosporon mucoides). This hybrid genome had a gene retention rate of ~55%, and its closest haploid relative was Cutaneotrichosporon dermatis. After constructing the C. mucoides subgenomes, we generated a phylogenetic tree using genome data from the 27 haploid species and the subgenome data from the three hybrid genome species. It was a high-quality tree with 100% bootstrap support for all of the branches. The genome-based tree provided superior resolution compared with previous multi-gene analyses. Although our backbone tree does not include all Trichosporonales genera (e.g. Cryptotrichosporon), it will be valuable for future analyses of genome data. Interest in interspecies hybrid fungal genomes has recently increased because they may provide a basis for new technologies. The three Trichosporonales hybrid genomes described in this study are different from well-characterized hybrid genomes (e.g. those of Saccharomyces pastorianus and Saccharomyces bayanus) because these hybridization events probably occurred in the distant evolutionary past. Hence, they will be useful for studying genome stability following hybridization and speciation events. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
RNA-Seq based phylogeny recapitulates previous phylogeny of the genus Flaveria (Asteraceae) with some modifications.

PubMed

Lyu, Ming-Ju Amy; Gowik, Udo; Kelly, Steve; Covshoff, Sarah; Mallmann, Julia; Westhoff, Peter; Hibberd, Julian M; Stata, Matt; Sage, Rowan F; Lu, Haorong; Wei, Xiaofeng; Wong, Gane Ka-Shu; Zhu, Xin-Guang

2015-06-18

The genus Flaveria has been extensively used as a model to study the evolution of C4 photosynthesis as it contains C3 and C4 species as well as a number of species that exhibit intermediate types of photosynthesis. The current phylogenetic tree of the genus Flaveria contains 21 of the 23 known Flaveria species and has been previously constructed using a combination of morphological data and three non-coding DNA sequences (nuclear encoded ETS, ITS and chloroplast encoded trnL-F). Here we developed a new strategy to update the phylogenetic tree of 16 Flaveria species based on RNA-Seq data. The updated phylogeny is largely congruent with the previously published tree but with some modifications. We propose that the data collection method provided in this study can be used as a generic method for phylogenetic tree reconstruction if the target species has no genomic information. We also showed that a "F. pringlei" genotype recently used in a number of labs may be a hybrid between F. pringlei (C3) and F. angustifolia (C3-C4). We propose that the new strategy of obtaining phylogenetic sequences outlined in this study can be used to construct robust trees in a larger number of taxa. The updated Flaveria phylogenetic tree also supports a hypothesis of stepwise and parallel evolution of C4 photosynthesis in the Flavaria clade.
Fuzzy tree automata and syntactic pattern recognition.

PubMed

Lee, E T

1982-04-01

An approach of representing patterns by trees and processing these trees by fuzzy tree automata is described. Fuzzy tree automata are defined and investigated. The results include that the class of fuzzy root-to-frontier recognizable ¿-trees is closed under intersection, union, and complementation. Thus, the class of fuzzy root-to-frontier recognizable ¿-trees forms a Boolean algebra. Fuzzy tree automata are applied to processing fuzzy tree representation of patterns based on syntactic pattern recognition. The grade of acceptance is defined and investigated. Quantitative measures of ``approximate isosceles triangle,'' ``approximate elongated isosceles triangle,'' ``approximate rectangle,'' and ``approximate cross'' are defined and used in the illustrative examples of this approach. By using these quantitative measures, a house, a house with high roof, and a church are also presented as illustrative examples. In addition, three fuzzy tree automata are constructed which have the capability of processing the fuzzy tree representations of ``fuzzy houses,'' ``houses with high roofs,'' and ``fuzzy churches,'' respectively. The results may have useful applications in pattern recognition, image processing, artificial intelligence, pattern database design and processing, image science, and pictorial information systems.
A Millennial-length Reconstruction of the Western Pacific Pattern with Associated Paleoclimate

NASA Astrophysics Data System (ADS)

Wright, W. E.; Guan, B. T.; Wei, K.

2010-12-01

The Western Pacific Pattern (WP) is a lesser known 500 hPa pressure pattern similar to the NAO or PNA. As defined, the poles of the WP index are centered on 60°N over the Kamchatka peninsula and the neighboring Pacific and on 32.5°N over the western north Pacific. However, the area of influence for the southern half of the dipole includes a wide swath from East Asia, across Taiwan, through the Philippine Sea, to the western north Pacific. Tree rings of Taiwanese Chamaecyparis obtusa var. formosana in this extended region show significant correlation with the WP, and with local temperature. The WP is also significantly correlated with atmospheric temperatures over Taiwan, especially at 850hPa and 700 hPa, pressure levels that bracket the tree site. Spectral analysis indicates that variations in the WP occur at relatively high frequency, with most power at less than 5 years. Simple linear regression against high frequency variants of the tree-ring chronology yielded the most significant correlation coefficients. Two reconstructions are presented. The first uses a tree-ring time series produced as the first intrinsic mode function (IMF) from an Ensemble Empirical Mode Decomposition (EEMD), based on the Hilbert-Huang Transform. The significance of the regression using the EEMD-derived time series was much more significant than time series produced using traditional high pass filtering. The second also uses the first IMF of a tree-ring time series, but the dataset was first sorted and partitioned at a specified quantile prior to EEMD decomposition, with the mean of the partitioned data forming the input to the EEMD. The partitioning was done to filter out the less climatically sensitive tree rings, a common problem with shade tolerant trees. Time series statistics indicate that the first reconstruction is reliable to 1241 of the Common Era. Reliability of the second reconstruction is dependent on the development of statistics related to the quantile partitioning, and the consequent reduction in sample depth. However, the correlation coefficients from regressions over the instrumental period greatly exceed those from any other method of chronology generation, and so the technique holds promise. Additional atmospheric parameters having significant correlations against the WPO and tree ring time series with similar spatial patterns are also presented. These include vertical wind shear (850hPa-700hPa) over the northern Philippines and the Philippine Sea, surface Omega and 850hPa v-winds over the East China Sea, Japan and Taiwan. Possible links to changes in the subtropical jet stream will also be discussed.
Merging Multi-model CMIP5/PMIP3 Past-1000 Ensemble Simulations with Tree Ring Proxy Data by Optimal Interpolation Approach

NASA Astrophysics Data System (ADS)

Chen, Xin; Luo, Yong; Xing, Pei; Nie, Suping; Tian, Qinhua

2015-04-01

Two sets of gridded annual mean surface air temperature in past millennia over the Northern Hemisphere was constructed employing optimal interpolation (OI) method so as to merge the tree ring proxy records with the simulations from CMIP5 (the fifth phase of the Climate Model Intercomparison Project). Both the uncertainties in proxy reconstruction and model simulations can be taken into account applying OI algorithm. For better preservation of physical coordinated features and spatial-temporal completeness of climate variability in 7 copies of model results, we perform the Empirical Orthogonal Functions (EOF) analysis to truncate the ensemble mean field as the first guess (background field) for OI. 681 temperature sensitive tree-ring chronologies are collected and screened from International Tree Ring Data Bank (ITRDB) and Past Global Changes (PAGES-2k) project. Firstly, two methods (variance matching and linear regression) are employed to calibrate the tree ring chronologies with instrumental data (CRUTEM4v) individually. In addition, we also remove the bias of both the background field and proxy records relative to instrumental dataset. Secondly, time-varying background error covariance matrix (B) and static "observation" error covariance matrix (R) are calculated for OI frame. In our scheme, matrix B was calculated locally, and "observation" error covariance are partially considered in R matrix (the covariance value between the pairs of tree ring sites that are very close to each other would be counted), which is different from the traditional assumption that R matrix should be diagonal. Comparing our results, it turns out that regional averaged series are not sensitive to the selection for calibration methods. The Quantile-Quantile plots indicate regional climatologies based on both methods are tend to be more agreeable with regional reconstruction of PAGES-2k in 20th century warming period than in little ice age (LIA). Lager volcanic cooling response over Asia and Europe in context of recent millennium are detected in our datasets than that revealed in regional reconstruction from PAGES-2k network. Verification experiments have showed that the merging approach really reconcile the proxy data and model ensemble simulations in an optimal way (with smaller errors than both of them). Further research is needed to improve the error estimation on them.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

NASA Astrophysics Data System (ADS)

Yadav, B.; Hatfield, K.

2017-12-01

We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Estimating sources of Valley Fever pathogen propagation in southern Arizona: A remote sensing approach

NASA Astrophysics Data System (ADS)

Pianalto, Frederick S.

Coccidioidomycosis (Valley Fever) is an environmentally-mediated respiratory disease caused by the inhalation of airborne spores from the fungi Coccidioides spp. The fungi reside in arid and semi-arid soils of the Americas. The disease has increased epidemically in Arizona and other areas within the last two decades. Despite this increase, the ecology of the fungi remains obscure, and environmental antecedents of the disease are largely unstudied. Two sources of soil disturbance, hypothesized to affect soil ecology and initiate spore dissemination, are investigated. Nocturnal desert rodents interact substantially with the soil substrate. Rodents are hypothesized to act as a reservoir of coccidioidomycosis, a mediator of soil properties, and a disseminator of fungal spores. Rodent distributions are poorly mapped for the study area. We build automated multi-linear regression models and decision tree models for ten rodent species using rodent trapping data from the Organ Pipe Cactus National Monument (ORPI) in southwest Arizona with a combination of surface temperature, a vegetation index and its texture, and a suite of topographic rasters. Surface temperature, derived from Landsat TM thermal images, is the most widely selected predictive variable in both automated methods. Construction-related soil disturbance (e.g. road construction, trenching, land stripping, and earthmoving) is a significant source of fugitive dust, which decreases air quality and may carry soil pathogens. Annual differencing of Landsat Thematic Mapper (TM) mid-infrared images is used to create change images, and thresholded change areas are associated with coordinates of local dust inspections. The output metric identifies source areas of soil disturbance, and it estimates the annual amount of dust-producing surface area for eastern Pima County spanning 1994 through 2009. Spatially explicit construction-related soil disturbance and rodent abundance data are compared with coccidioidomycosis incidence data using rank order correlation and regression methods. Construction-related soil disturbance correlates strongly with annual county-wide incidence. It also correlates with Tucson periphery incidence aggregated to zip codes. Abundance values for the desert pocket mouse (Chaetodipus penicillatus), derived from a soil-adjusted vegetation index, aspect (northing) and thermal radiance, correlate with total study period incidence aggregated to zip code.
Coalescent histories for caterpillar-like families.

PubMed

Rosenberg, Noah A

2013-01-01

A coalescent history is an assignment of branches of a gene tree to branches of a species tree on which coalescences in the gene tree occur. The number of coalescent histories for a pair consisting of a labeled gene tree topology and a labeled species tree topology is important in gene tree probability computations, and more generally, in studying evolutionary possibilities for gene trees on species trees. Defining the Tr-caterpillar-like family as a sequence of n-taxon trees constructed by replacing the r-taxon subtree of n-taxon caterpillars by a specific r-taxon labeled topology Tr, we examine the number of coalescent histories for caterpillar-like families with matching gene tree and species tree labeled topologies. For each Tr with size r≤8, we compute the number of coalescent histories for n-taxon trees in the Tr-caterpillar-like family. Next, as n→∞, we find that the limiting ratio of the numbers of coalescent histories for the Tr family and caterpillars themselves is correlated with the number of labeled histories for Tr. The results support a view that large numbers of coalescent histories occur when a tree has both a relatively balanced subtree and a high tree depth, contributing to deeper understanding of the combinatorics of gene trees and species trees.
Current and Potential Tree Locations in Tree Line Ecotone of Changbai Mountains, Northeast China: The Controlling Effects of Topography

PubMed Central

Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei

2014-01-01

Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone. PMID:25170918
Current and potential tree locations in tree line ecotone of Changbai Mountains, Northeast China: the controlling effects of topography.

PubMed

Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei

2014-01-01

Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone.
Comparison of Nine Statistical Model Based Warfarin Pharmacogenetic Dosing Algorithms Using the Racially Diverse International Warfarin Pharmacogenetic Consortium Cohort Database

PubMed Central

Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao

2015-01-01

Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568
Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

PubMed Central

Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.

2008-01-01

Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
Identification of sexually abused female adolescents at risk for suicidal ideations: a classification and regression tree analysis.

PubMed

Brabant, Marie-Eve; Hébert, Martine; Chagnon, François

2013-01-01

This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions.
Use of Bayesian event trees in semi-quantitative volcano eruption forecasting and hazard analysis

NASA Astrophysics Data System (ADS)

Wright, Heather; Pallister, John; Newhall, Chris

2015-04-01

Use of Bayesian event trees to forecast eruptive activity during volcano crises is an increasingly common practice for the USGS-USAID Volcano Disaster Assistance Program (VDAP) in collaboration with foreign counterparts. This semi-quantitative approach combines conceptual models of volcanic processes with current monitoring data and patterns of occurrence to reach consensus probabilities. This approach allows a response team to draw upon global datasets, local observations, and expert judgment, where the relative influence of these data depends upon the availability and quality of monitoring data and the degree to which the volcanic history is known. The construction of such event trees additionally relies upon existence and use of relevant global databases and documented past periods of unrest. Because relevant global databases may be underpopulated or nonexistent, uncertainty in probability estimations may be large. Our 'hybrid' approach of combining local and global monitoring data and expert judgment facilitates discussion and constructive debate between disciplines: including seismology, gas geochemistry, geodesy, petrology, physical volcanology and technology/engineering, where difference in opinion between response team members contributes to definition of the uncertainty in the probability estimations. In collaboration with foreign colleagues, we have created event trees for numerous areas experiencing volcanic unrest. Event trees are created for a specified time frame and are updated, revised, or replaced as the crisis proceeds. Creation of an initial tree is often prompted by a change in monitoring data, such that rapid assessment of probability is needed. These trees are intended as a vehicle for discussion and a way to document relevant data and models, where the target audience is the scientists themselves. However, the probabilities derived through the event-tree analysis can also be used to help inform communications with emergency managers and the public. VDAP trees evaluate probabilities of: magmatic intrusion, likelihood of eruption, magnitude of eruption, and types of associated hazardous events and their extents. In a few cases, trees have been extended to also assess and communicate vulnerability and relative risk.
Evaluation and prediction of shrub cover in coastal Oregon forests (USA)

Treesearch

Becky K. Kerns; Janet L. Ohmann

2004-01-01

We used data from regional forest inventories and research programs, coupled with mapped climatic and topographic information, to explore relationships and develop multiple linear regression (MLR) and regression tree models for total and deciduous shrub cover in the Oregon coastal province. Results from both types of models indicate that forest structure variables were...
An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

ERIC Educational Resources Information Center

Strobl, Carolin; Malley, James; Tutz, Gerhard

2009-01-01

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Weighted linear regression using D2H and D2 as the independent variables

Treesearch

Hans T. Schreuder; Michael S. Williams

1998-01-01

Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
Decision tree methods: applications for classification and prediction.

PubMed

Song, Yan-Yan; Lu, Ying

2015-04-25

Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Simple chained guide trees give high-quality protein multiple sequence alignments

PubMed Central

Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

2014-01-01

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495

Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey.

PubMed

Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T

2006-08-01

The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.
Tree biomass in the Swiss landscape: nationwide modelling for improved accounting for forest and non-forest trees.

PubMed

Price, B; Gomez, A; Mathys, L; Gardi, O; Schellenberger, A; Ginzler, C; Thürig, E

2017-03-01

Trees outside forest (TOF) can perform a variety of social, economic and ecological functions including carbon sequestration. However, detailed quantification of tree biomass is usually limited to forest areas. Taking advantage of structural information available from stereo aerial imagery and airborne laser scanning (ALS), this research models tree biomass using national forest inventory data and linear least-square regression and applies the model both inside and outside of forest to create a nationwide model for tree biomass (above ground and below ground). Validation of the tree biomass model against TOF data within settlement areas shows relatively low model performance (R 2 of 0.44) but still a considerable improvement on current biomass estimates used for greenhouse gas inventory and carbon accounting. We demonstrate an efficient and easily implementable approach to modelling tree biomass across a large heterogeneous nationwide area. The model offers significant opportunity for improved estimates on land use combination categories (CC) where tree biomass has either not been included or only roughly estimated until now. The ALS biomass model also offers the advantage of providing greater spatial resolution and greater within CC spatial variability compared to the current nationwide estimates.
Tree mortality following prescribed fire and a storm surge event in Slash Pine (pinus elliottii var. densa) forests in the Florida Keys, USA

USGS Publications Warehouse

Sah, Jay P.; Ross, Michael S.; Snyder, James R.; Ogurcak, Danielle E.

2010-01-01

In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated with tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.
Nuttall Oak Volume and Weight Tables

Treesearch

Bryce E. Schlaegel; Regan B. Willson

1983-01-01

Volume and weight tables were constructed from a 62-tree sample of Nuttall oak (Quercus nuttallii Palmer) taken in the Mississippi Delta. The tables present volume, green weight, and dry weight of bole wood, bole wood plus bark, and total tree above a one-foot stump as predicted from the nonlinear model Y = 0Db
An Apparatus for Pressure Injection of Solutions into Trees

Treesearch

Thomas W. Jones; Garold F. Gregory; Garold F. Gregory

1971-01-01

The construction and use of an apparatus for injecting solutions under pressure into trees is described. A unique and important feature of the apparatus is that it permits injection of solutions into the outermost sapwood. It has been used to inject dye solutions and solubilized benomyl into elm, oak, and maple.
Parks, Trees, and Environmental Justice: Field Notes from Washington, DC

ERIC Educational Resources Information Center

Buckley, Geoffrey L.; Whitmer, Ali; Grove, J. Morgan

2013-01-01

Students enrolled in a graduate seminar benefited in multiple ways from an intensive 3-day field trip to Washington, DC. Constructed around the theme of environmental justice, the trip gave students a chance to learn about street tree distribution, park quality, and racial segregation "up close." Working with personnel from the United…
Radiocarbon content in the annual tree rings during last 150 years and time variation of cosmic rays

NASA Technical Reports Server (NTRS)

Kocharov, G. E.; Metskvarishvili, R. Y.; Tsereteli, S. L.

1985-01-01

The results of the high accuracy measurements of radiocarbon abundance in precisely dated tree rings in the interval 1800 to 1950 yrs are discussed. Radiocarbon content caused by solar activity is established. The temporal dependence of cosmic rays is constructed, by use of radio abundance data.
The influence of tree morphology on stemflow generation in a tropical lowland rainforest

NASA Astrophysics Data System (ADS)

Uber, Magdalena; Levia, Delphis F.; Zimmermann, Beate; Zimmermann, Alexander

2014-05-01

Even though stemflow usually accounts for only a small proportion of rainfall, it is an important point source of water and ion input to forest floors and may, for instance, influence soil moisture patterns and groundwater recharge. Previous studies showed that the generation of stemflow depends on a multitude of meteorological and biological factors. Interestingly, despite the tremendous progress in stemflow research during the last decades it is still largely unknown which combination of tree characteristics determines stemflow volumes in species-rich tropical forests. This knowledge gap motivated us to analyse the influence of tree characteristics on stemflow volumes in a 1 hectare plot located in a Panamanian lowland rainforest. Our study comprised stemflow measurements in six randomly selected 10 m by 10 m subplots. In each subplot we measured stemflow of all trees with a diameter at breast height (DBH) > 5 cm on an event-basis for a period of six weeks. Additionally, we identified all tree species and determined a set of tree characteristics including DBH, crown diameter, bark roughness, bark furrowing, epiphyte coverage, tree architecture, stem inclination, and crown position. During the sampling period, we collected 985 L of stemflow (0.98 % of total rainfall). Based on regression analyses and comparisons among plant functional groups we show that palms were most efficient in yielding stemflow due to their large inclined fronds. Trees with large emergent crowns also produced relatively large amounts of stemflow. Due to their abundance, understory trees contribute much to stemflow yield not on individual but on the plot scale. Even though parameters such as crown diameter, branch inclination and position of the crown influence stemflow generation to some extent, these parameters explain less than 30 % of the variation in stemflow volumes. In contrast to published results from temperate forests, we did not detect a negative correlation between bark roughness and stemflow volume. This is because other parameters such as crown diameter obscured this relationship. Due to multicollinearity and poor correlations between single tree characteristics with stemflow volume, an assessment of stemflow volumes based on forest characteristics remains cumbersome in highly diverse ecosystems. Instead of relying on regression relationships, we therefore advocate a total sampling of trees in several plots to determine stand-scale stemflow yield in tropical forests.
If BZ medium did spanning trees these would be the same trees as Physarum built

NASA Astrophysics Data System (ADS)

Adamatzky, Andrew

2009-03-01

A sub-excitable Belousov-Zhabotinsky (BZ) medium exhibits self-localized wave-fragments which may travel for relatively long time preserving their shape. Using Oregonator model of the BZ medium we imitate foraging behavior of a true slime mold, Physarum polycephalum, on a nutrient-poor substrate. We show that given erosion post-processing operations the BZ medium can approximate a spanning tree of a planar set and thus is computationally equivalent to Physarum in the domain of proximity graph construction.
Structural system reliability calculation using a probabilistic fault tree analysis method

NASA Technical Reports Server (NTRS)

Torng, T. Y.; Wu, Y.-T.; Millwater, H. R.

1992-01-01

The development of a new probabilistic fault tree analysis (PFTA) method for calculating structural system reliability is summarized. The proposed PFTA procedure includes: developing a fault tree to represent the complex structural system, constructing an approximation function for each bottom event, determining a dominant sampling sequence for all bottom events, and calculating the system reliability using an adaptive importance sampling method. PFTA is suitable for complicated structural problems that require computer-intensive computer calculations. A computer program has been developed to implement the PFTA.
Infinite tension limit of the pure spinor superstring

NASA Astrophysics Data System (ADS)

Berkovits, Nathan

2014-03-01

Mason and Skinner recently constructed a chiral infinite tension limit of the Ramond-Neveu-Schwarz superstring which was shown to compute the Cachazo-He-Yuan formulae for tree-level d = 10 Yang-Mills amplitudes and the NS-NS sector of tree-level d = 10 supergravity amplitudes. In this letter, their chiral infinite tension limit is generalized to the pure spinor superstring which computes a d = 10 superspace version of the Cachazo-He-Yuan formulae for tree-level d = 10 super-Yang-Mills and supergravity amplitudes.
M5 model tree based predictive modeling of road accidents on non-urban sections of highways in India.

PubMed

Singh, Gyanendra; Sachdeva, S N; Pal, Mahesh

2016-11-01

This work examines the application of M5 model tree and conventionally used fixed/random effect negative binomial (FENB/RENB) regression models for accident prediction on non-urban sections of highway in Haryana (India). Road accident data for a period of 2-6 years on different sections of 8 National and State Highways in Haryana was collected from police records. Data related to road geometry, traffic and road environment related variables was collected through field studies. Total two hundred and twenty two data points were gathered by dividing highways into sections with certain uniform geometric characteristics. For prediction of accident frequencies using fifteen input parameters, two modeling approaches: FENB/RENB regression and M5 model tree were used. Results suggest that both models perform comparably well in terms of correlation coefficient and root mean square error values. M5 model tree provides simple linear equations that are easy to interpret and provide better insight, indicating that this approach can effectively be used as an alternative to RENB approach if the sole purpose is to predict motor vehicle crashes. Sensitivity analysis using M5 model tree also suggests that its results reflect the physical conditions. Both models clearly indicate that to improve safety on Indian highways minor accesses to the highways need to be properly designed and controlled, the service roads to be made functional and dispersion of speeds is to be brought down. Copyright © 2016 Elsevier Ltd. All rights reserved.
Presence of indicator plant species as a predictor of wetland vegetation integrity

USGS Publications Warehouse

Stapanian, Martin A.; Adams, Jean V.; Gara, Brian

2013-01-01

We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.
Automatic energy expenditure measurement for health science.

PubMed

Catal, Cagatay; Akbulut, Akhan

2018-04-01

It is crucial to predict the human energy expenditure in any sports activity and health science application accurately to investigate the impact of the activity. However, measurement of the real energy expenditure is not a trivial task and involves complex steps. The objective of this work is to improve the performance of existing estimation models of energy expenditure by using machine learning algorithms and several data from different sensors and provide this estimation service in a cloud-based platform. In this study, we used input data such as breathe rate, and hearth rate from three sensors. Inputs are received from a web form and sent to the web service which applies a regression model on Azure cloud platform. During the experiments, we assessed several machine learning models based on regression methods. Our experimental results showed that our novel model which applies Boosted Decision Tree Regression in conjunction with the median aggregation technique provides the best result among other five regression algorithms. This cloud-based energy expenditure system which uses a web service showed that cloud computing technology is a great opportunity to develop estimation systems and the new model which applies Boosted Decision Tree Regression with the median aggregation provides remarkable results. Copyright © 2018 Elsevier B.V. All rights reserved.
Phenological models to predict the main flowering phases of olive ( Olea europaea L.) along a latitudinal and longitudinal gradient across the Mediterranean region

NASA Astrophysics Data System (ADS)

Aguilera, Fátima; Fornaciari, Marco; Ruiz-Valenzuela, Luis; Galán, Carmen; Msallem, Monji; Dhiab, Ali Ben; la Guardia, Consuelo Díaz-de; del Mar Trigo, María; Bonofiglio, Tommaso; Orlandi, Fabio

2015-05-01

The aim of the present study was to develop pheno-meteorological models to explain and forecast the main olive flowering phenological phases within the Mediterranean basin, across a latitudinal and longitudinal gradient that includes Tunisia, Spain, and Italy. To analyze the aerobiological sampling points, study periods from 13 years (1999-2011) to 19 years (1993-2011) were used. The forecasting models were constructed using partial least-squares regression, considering both the flowering start and full-flowering dates as dependent variables. The percentages of variance explained by the full-flowering models (mean 84 %) were greater than those explained by the flowering start models (mean 77 %). Moreover, given the time lag from the North African areas to the central Mediterranean areas in the main olive flowering dates, the regional full-flowering predictive models are proposed as the most useful to improve the knowledge of the influence of climate on the olive tree floral phenology. The meteorological parameters related to the previous autumn and both the winter and the spring seasons, and above all the temperatures, regulate the reproductive phenology of olive trees in the Mediterranean area. The mean anticipation of flowering start and full flowering for the future period from 2081 to 2100 was estimated at 10 and 12 days, respectively. One question can be raised: Will the olive trees located in the warmest areas be northward displaced or will they be able to adapt their physiology in response to the higher temperatures? The present study can be considered as an approach to design more detailed future bioclimate research.
Use of GLM approach to assess the responses of tropical trees to urban air pollution in relation to leaf functional traits and tree characteristics.

PubMed

Mukherjee, Arideep; Agrawal, Madhoolika

2018-05-15

Responses of urban vegetation to air pollution stress in relation to their tolerance and sensitivity have been extensively studied, however, studies related to air pollution responses based on different leaf functional traits and tree characteristics are limited. In this paper, we have tried to assess combined and individual effects of major air pollutants PM 10 (particulate matter ≤ 10 µm), TSP (total suspended particulate matter), SO 2 (sulphur dioxide), NO 2 (nitrogen dioxide) and O 3 (ozone) on thirteen tropical tree species in relation to fifteen leaf functional traits and different tree characteristics. Stepwise linear regression a general linear modelling approach was used to quantify the pollution response of trees against air pollutants. The study was performed for six successive seasons for two years in three distinct urban areas (traffic, industrial and residential) of Varanasi city in India. At all the study sites, concentrations of air pollutants, specifically PM (particulate matter) and NO 2 were above the specified standards. Distinct variations were recorded in all the fifteen leaf functional traits with pollution load. Caesalpinia sappan was identified as most tolerant species followed by Psidium guajava, Dalbergia sissoo and Albizia lebbeck. Stepwise regression analysis identified maximum response of Eucalyptus citriodora and P. guajava to air pollutants explaining overall 59% and 58% variability's in leaf functional traits, respectively. Among leaf functional traits, maximum effect of air pollutants was observed on non-enzymatic antioxidants followed by photosynthetic pigments and leaf water status. Among the pollutants, PM was identified as the major stress factor followed by O 3 explaining 47% and 33% variability's in leaf functional traits. Tolerance and pollution response were regulated by different tree characteristics such as height, canopy size, leaf from, texture and nature of tree. Outcomes of this study will help in urban forest development by selection of specific pollutant tolerant tree species and leaf traits, which is suitable as air pollution mitigation measure. Copyright © 2018 Elsevier Inc. All rights reserved.
Constructing bald eagle nests with natural materials

Treesearch

T. G. Grubb

1995-01-01

A technique for using natural materials to build artificial nests for bald eagles (Haliaeetus leucocephalus) and other raptors is detailed. Properly constructed nests are as permanently secured to the nest tree or cliff substrate as any eagle-built nest or human-made platform. Construction normally requires about three hours and at least two people. This technique is...
Japanese flowering cherry tree as a woody plant candidate grown in space

NASA Astrophysics Data System (ADS)

Tomita-Yokotani, K.; Yoshida, S.; Hashimoto, H.; Nyunoya, H.; Funada, R.; Katayama, T.; Suzuki, T.; Honma, T.; Nagatomo, M.; Nakamura, T.

We are proposing to raise woody plant in space for several applications Japanese flowering cherry tree is a candidate to do wood science in space Mechanism of sensing gravity and controlling shape of tree has been studied quite extensively Cherry mutants associated with gravity are telling responsible plant hormones and molecular machinery for plant adaptation against action of gravity Space experiment using our wood model contribute to understand molecular and cellular process of gravitropism in plant Tree is considered to be an important member in space agriculture to produce excess oxygen wooden materials for constructing living environment and provide biomass for cultivating mushrooms and insects Furthermore trees and their flowers improve quality of life under stressful environment in outer space
Pattern Matcher for Trees Constructed from Lists

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

A software library has been developed that takes a high-level description of a pattern to be satisfied and applies it to a target. If the two match, it returns success; otherwise, it indicates a failure. The target is semantically a tree that is constructed from elements of terminal and non-terminal nodes represented through lists and symbols. Additionally, functionality is provided for finding the element in a set that satisfies a given pattern and doing a tree search, finding all occurrences of leaf nodes that match a given pattern. This process is valuable because it is a new algorithmic approach that significantly improves the productivity of the programmers and has the potential of making their resulting code more efficient by the introduction of a novel semantic representation language. This software has been used in many applications delivered to NASA and private industry, and the cost savings that have resulted from it are significant.
Boolean logic tree of graphene-based chemical system for molecular computation and intelligent molecular search query.

PubMed

Huang, Wei Tao; Luo, Hong Qun; Li, Nian Bing

2014-05-06

The most serious, and yet unsolved, problem of constructing molecular computing devices consists in connecting all of these molecular events into a usable device. This report demonstrates the use of Boolean logic tree for analyzing the chemical event network based on graphene, organic dye, thrombin aptamer, and Fenton reaction, organizing and connecting these basic chemical events. And this chemical event network can be utilized to implement fluorescent combinatorial logic (including basic logic gates and complex integrated logic circuits) and fuzzy logic computing. On the basis of the Boolean logic tree analysis and logic computing, these basic chemical events can be considered as programmable "words" and chemical interactions as "syntax" logic rules to construct molecular search engine for performing intelligent molecular search query. Our approach is helpful in developing the advanced logic program based on molecules for application in biosensing, nanotechnology, and drug delivery.

The origin of parasitism gene in nematodes: evolutionary analysis through the construction of domain trees.

PubMed

Yang, Yizi; Luo, Damin

2013-01-01

Inferring evolutionary history of parasitism genes is important to understand how evolutionary mechanisms affect the occurrences of parasitism genes. In this study, we constructed multiple domain trees for parasitism genes and genes under free-living conditions. Further analyses of horizontal gene transfer (HGT)-like phylogenetic incongruences, duplications, and speciations were performed based on these trees. By comparing these analyses, the contributions of pre-adaptations were found to be more important to the evolution of parasitism genes than those of duplications, and pre-adaptations are as crucial as previously reported HGTs to parasitism. Furthermore, speciation may also affect the evolution of parasitism genes. In addition, Pristionchus pacificus was suggested to be a common model organism for studies of parasitic nematodes, including root-knot species. These analyses provided information regarding mechanisms that may have contributed to the evolution of parasitism genes.
Comparing Phylogenetic Trees by Matching Nodes Using the Transfer Distance Between Partitions

PubMed Central

Giaro, Krzysztof

2017-01-01

Abstract Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson–Foulds distance. In this article, we define a new metric for rooted trees—the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events. PMID:28177699
Contaminant Gradients in Trees: Directional Tree Coring Reveals Boundaries of Soil and Soil-Gas Contamination with Potential Applications in Vapor Intrusion Assessment.

PubMed

Wilson, Jordan L; Samaranayake, V A; Limmer, Matthew A; Schumacher, John G; Burken, Joel G

2017-12-19

Contaminated sites pose ecological and human-health risks through exposure to contaminated soil and groundwater. Whereas we can readily locate, monitor, and track contaminants in groundwater, it is harder to perform these tasks in the vadose zone. In this study, tree-core samples were collected at a Superfund site to determine if the sample-collection location around a particular tree could reveal the subsurface location, or direction, of soil and soil-gas contaminant plumes. Contaminant-centroid vectors were calculated from tree-core data to reveal contaminant distributions in directional tree samples at a higher resolution, and vectors were correlated with soil-gas characterization collected using conventional methods. Results clearly demonstrated that directional tree coring around tree trunks can indicate gradients in soil and soil-gas contaminant plumes, and the strength of the correlations were directly proportionate to the magnitude of tree-core concentration gradients (spearman's coefficient of -0.61 and -0.55 in soil and tree-core gradients, respectively). Linear regression indicates agreement between the concentration-centroid vectors is significantly affected by in planta and soil concentration gradients and when concentration centroids in soil are closer to trees. Given the existing link between soil-gas and vapor intrusion, this study also indicates that directional tree coring might be applicable in vapor intrusion assessment.
Contaminant gradients in trees: Directional tree coring reveals boundaries of soil and soil-gas contamination with potential applications in vapor intrusion assessment

USGS Publications Warehouse

Wilson, Jordan L.; Samaranayake, V.A.; Limmer, Matthew A.; Schumacher, John G.; Burken, Joel G.

2017-01-01

Contaminated sites pose ecological and human-health risks through exposure to contaminated soil and groundwater. Whereas we can readily locate, monitor, and track contaminants in groundwater, it is harder to perform these tasks in the vadose zone. In this study, tree-core samples were collected at a Superfund site to determine if the sample-collection location around a particular tree could reveal the subsurface location, or direction, of soil and soil-gas contaminant plumes. Contaminant-centroid vectors were calculated from tree-core data to reveal contaminant distributions in directional tree samples at a higher resolution, and vectors were correlated with soil-gas characterization collected using conventional methods. Results clearly demonstrated that directional tree coring around tree trunks can indicate gradients in soil and soil-gas contaminant plumes, and the strength of the correlations were directly proportionate to the magnitude of tree-core concentration gradients (spearman’s coefficient of -0.61 and -0.55 in soil and tree-core gradients, respectively). Linear regression indicates agreement between the concentration-centroid vectors is significantly affected by in-planta and soil concentration gradients and when concentration centroids in soil are closer to trees. Given the existing link between soil-gas and vapor intrusion, this study also indicates that directional tree coring might be applicable in vapor intrusion assessment.
Above ground biomass and tree species richness estimation with airborne lidar in tropical Ghana forests

NASA Astrophysics Data System (ADS)

Vaglio Laurin, Gaia; Puletti, Nicola; Chen, Qi; Corona, Piermaria; Papale, Dario; Valentini, Riccardo

2016-10-01

Estimates of forest aboveground biomass are fundamental for carbon monitoring and accounting; delivering information at very high spatial resolution is especially valuable for local management, conservation and selective logging purposes. In tropical areas, hosting large biomass and biodiversity resources which are often threatened by unsustainable anthropogenic pressures, frequent forest resources monitoring is needed. Lidar is a powerful tool to estimate aboveground biomass at fine resolution; however its application in tropical forests has been limited, with high variability in the accuracy of results. Lidar pulses scan the forest vertical profile, and can provide structure information which is also linked to biodiversity. In the last decade the remote sensing of biodiversity has received great attention, but few studies focused on the use of lidar for assessing tree species richness in tropical forests. This research aims at estimating aboveground biomass and tree species richness using discrete return airborne lidar in Ghana forests. We tested an advanced statistical technique, Multivariate Adaptive Regression Splines (MARS), which does not require assumptions on data distribution or on the relationships between variables, being suitable for studying ecological variables. We compared the MARS regression results with those obtained by multilinear regression and found that both algorithms were effective, but MARS provided higher accuracy either for biomass (R2 = 0.72) and species richness (R2 = 0.64). We also noted strong correlation between biodiversity and biomass field values. Even if the forest areas under analysis are limited in extent and represent peculiar ecosystems, the preliminary indications produced by our study suggest that instrument such as lidar, specifically useful for pinpointing forest structure, can also be exploited as a support for tree species richness assessment.
Node degree distribution in spanning trees

NASA Astrophysics Data System (ADS)

Pozrikidis, C.

2016-03-01

A method is presented for computing the number of spanning trees involving one link or a specified group of links, and excluding another link or a specified group of links, in a network described by a simple graph in terms of derivatives of the spanning-tree generating function defined with respect to the eigenvalues of the Kirchhoff (weighted Laplacian) matrix. The method is applied to deduce the node degree distribution in a complete or randomized set of spanning trees of an arbitrary network. An important feature of the proposed method is that the explicit construction of spanning trees is not required. It is shown that the node degree distribution in the spanning trees of the complete network is described by the binomial distribution. Numerical results are presented for the node degree distribution in square, triangular, and honeycomb lattices.
Red-shouldered hawk nesting habitat preference in south Texas

USGS Publications Warehouse

Strobel, Bradley N.; Boal, Clint W.

2010-01-01

We examined nesting habitat preference by red-shouldered hawks Buteo lineatus using conditional logistic regression on characteristics measured at 27 occupied nest sites and 68 unused sites in 2005–2009 in south Texas. We measured vegetation characteristics of individual trees (nest trees and unused trees) and corresponding 0.04-ha plots. We evaluated the importance of tree and plot characteristics to nesting habitat selection by comparing a priori tree-specific and plot-specific models using Akaike's information criterion. Models with only plot variables carried 14% more weight than models with only center tree variables. The model-averaged odds ratios indicated red-shouldered hawks selected to nest in taller trees and in areas with higher average diameter at breast height than randomly available within the forest stand. Relative to randomly selected areas, each 1-m increase in nest tree height and 1-cm increase in the plot average diameter at breast height increased the probability of selection by 85% and 10%, respectively. Our results indicate that red-shouldered hawks select nesting habitat based on vegetation characteristics of individual trees as well as the 0.04-ha area surrounding the tree. Our results indicate forest management practices resulting in tall forest stands with large average diameter at breast height would benefit red-shouldered hawks in south Texas.
Improved FTA methodology and application to subsea pipeline reliability design.

PubMed

Lin, Jing; Yuan, Yongbo; Zhang, Mingyuan

2014-01-01

An innovative logic tree, Failure Expansion Tree (FET), is proposed in this paper, which improves on traditional Fault Tree Analysis (FTA). It describes a different thinking approach for risk factor identification and reliability risk assessment. By providing a more comprehensive and objective methodology, the rather subjective nature of FTA node discovery is significantly reduced and the resulting mathematical calculations for quantitative analysis are greatly simplified. Applied to the Useful Life phase of a subsea pipeline engineering project, the approach provides a more structured analysis by constructing a tree following the laws of physics and geometry. Resulting improvements are summarized in comparison table form.
Improved FTA Methodology and Application to Subsea Pipeline Reliability Design

PubMed Central

Lin, Jing; Yuan, Yongbo; Zhang, Mingyuan

2014-01-01

An innovative logic tree, Failure Expansion Tree (FET), is proposed in this paper, which improves on traditional Fault Tree Analysis (FTA). It describes a different thinking approach for risk factor identification and reliability risk assessment. By providing a more comprehensive and objective methodology, the rather subjective nature of FTA node discovery is significantly reduced and the resulting mathematical calculations for quantitative analysis are greatly simplified. Applied to the Useful Life phase of a subsea pipeline engineering project, the approach provides a more structured analysis by constructing a tree following the laws of physics and geometry. Resulting improvements are summarized in comparison table form. PMID:24667681
Intratumoral heterogeneity analysis reveals hidden associations between protein expression losses and patient survival in clear cell renal cell carcinoma

PubMed Central

Devarajan, Karthik; Parsons, Theodore; Wang, Qiong; O'Neill, Raymond; Solomides, Charalambos; Peiper, Stephen C.; Testa, Joseph R.; Uzzo, Robert; Yang, Haifeng

2017-01-01

Intratumoral heterogeneity (ITH) is a prominent feature of kidney cancer. It is not known whether it has utility in finding associations between protein expression and clinical parameters. We used ITH that is detected by immunohistochemistry (IHC) to aid the association analysis between the loss of SWI/SNF components and clinical parameters.160 ccRCC tumors (40 per tumor stage) were used to generate tissue microarray (TMA). Four foci from different regions of each tumor were selected. IHC was performed against PBRM1, ARID1A, SETD2, SMARCA4, and SMARCA2. Statistical analyses were performed to correlate biomarker losses with patho-clinical parameters. Categorical variables were compared between groups using Fisher's exact tests. Univariate and multivariable analyses were used to correlate biomarker changes and patient survivals. Multivariable analyses were performed by constructing decision trees using the classification and regression trees (CART) methodology. IHC detected widespread ITH in ccRCC tumors. The statistical analysis of the “Truncal loss” (root loss) found additional correlations between biomarker losses and tumor stages than the traditional “Loss in tumor (total)”. Losses of SMARCA4 or SMARCA2 significantly improved prognosis for overall survival (OS). Losses of PBRM1, ARID1A or SETD2 had the opposite effect. Thus “Truncal Loss” analysis revealed hidden links between protein losses and patient survival in ccRCC. PMID:28445125
Static terrestrial laser scanning of juvenile understory trees for field phenotyping

NASA Astrophysics Data System (ADS)

Wang, Huanhuan; Lin, Yi

2014-11-01

This study was to attempt the cutting-edge 3D remote sensing technique of static terrestrial laser scanning (TLS) for parametric 3D reconstruction of juvenile understory trees. The data for test was collected with a Leica HDS6100 TLS system in a single-scan way. The geometrical structures of juvenile understory trees are extracted by model fitting. Cones are used to model trunks and branches. Principal component analysis (PCA) is adopted to calculate their major axes. Coordinate transformation and orthogonal projection are used to estimate the parameters of the cones. Then, AutoCAD is utilized to simulate the morphological characteristics of the understory trees, and to add secondary branches and leaves in a random way. Comparison of the reference values and the estimated values gives the regression equation and shows that the proposed algorithm of extracting parameters is credible. The results have basically verified the applicability of TLS for field phenotyping of juvenile understory trees.
Spectral analysis of white ash response to emerald ash borer infestations

NASA Astrophysics Data System (ADS)

Calandra, Laura

The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.
Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

PubMed

Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

2015-08-01

Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
Technology Tips: Using the Iterate Command to Construct Recursive Geometric Sketches

ERIC Educational Resources Information Center

Harper, Suzanne R.; Driskell, Shannon

2006-01-01

How to iterate geometric shapes to construct Baravelle spirals and Pythagorean trees is demonstrated in this article. The "Surfing Note" sends readers to a site with applets that will generate fractals such as the Sierpinski gasket or the Koch snowflake.
A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

ERIC Educational Resources Information Center

Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

2010-01-01

The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Coping with Multicollinearity: An Example on Application of Principal Components Regression in Dendroecology

Treesearch

B. Desta Fekedulegn; J.J. Colbert; R.R., Jr. Hicks; Michael E. Schuckers

2002-01-01

The theory and application of principal components regression, a method for coping with multicollinearity among independent variables in analyzing ecological data, is exhibited in detail. A concrete example of the complex procedures that must be carried out in developing a diagnostic growth-climate model is provided. We use tree radial increment data taken from breast...
Regression methods for spatially correlated data: an example using beetle attacks in a seed orchard

Treesearch

Preisler Haiganoush; Nancy G. Rappaport; David L. Wood

1997-01-01

We present a statistical procedure for studying the simultaneous effects of observed covariates and unmeasured spatial variables on responses of interest. The procedure uses regression type analyses that can be used with existing statistical software packages. An example using the rate of twig beetle attacks on Douglas-fir trees in a seed orchard illustrates the...
Per capita community-level effects of an invasive grass, Microstegium vimineum, on vegetation in mesic forests in northern Mississippi (USA)

Treesearch

J. Stephen Brewer

2010-01-01

Quantifying per capita impacts of invasive species on resident communities requires integrating regression analyses with experiments under natural conditions. Using multivariate and univariate approaches, I regressed the abundance of 105 resident species of groundcover plants and tree seedlings against the abundance and height of an invasive grass, Microstegium...
Predicting surface fuel models and fuel metrics using lidar and CIR imagery in a dense mixed conifer forest

Treesearch

Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly

2013-01-01

We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
A Tree Based Broadcast Scheme for (m, k)-firm Real-Time Stream in Wireless Sensor Networks.

PubMed

Park, HoSung; Kim, Beom-Su; Kim, Kyong Hoon; Shah, Babar; Kim, Ki-Il

2017-11-09

Recently, various unicast routing protocols have been proposed to deliver measured data from the sensor node to the sink node within the predetermined deadline in wireless sensor networks. In parallel with their approaches, some applications demand the specific service, which is based on broadcast to all nodes within the deadline, the feasible real-time traffic model and improvements in energy efficiency. However, current protocols based on either flooding or one-to-one unicast cannot meet the above requirements entirely. Moreover, as far as the authors know, there is no study for the real-time broadcast protocol to support the application-specific traffic model in WSN yet. Based on the above analysis, in this paper, we propose a new ( m , k )-firm-based Real-time Broadcast Protocol (FRBP) by constructing a broadcast tree to satisfy the ( m , k )-firm, which is applicable to the real-time model in resource-constrained WSNs. The broadcast tree in FRBP is constructed by the distance-based priority scheme, whereas energy efficiency is improved by selecting as few as nodes on a tree possible. To overcome the unstable network environment, the recovery scheme invokes rapid partial tree reconstruction in order to designate another node as the parent on a tree according to the measured ( m , k )-firm real-time condition and local states monitoring. Finally, simulation results are given to demonstrate the superiority of FRBP compared to the existing schemes in terms of average deadline missing ratio, average throughput and energy consumption.

Contact Trees: Network Visualization beyond Nodes and Edges

PubMed Central

Sallaberry, Arnaud; Fu, Yang-chih; Ho, Hwai-Chung; Ma, Kwan-Liu

2016-01-01

Node-Link diagrams make it possible to take a quick glance at how nodes (or actors) in a network are connected by edges (or ties). A conventional network diagram of a “contact tree” maps out a root and branches that represent the structure of nodes and edges, often without further specifying leaves or fruits that would have grown from small branches. By furnishing such a network structure with leaves and fruits, we reveal details about “contacts” in our ContactTrees upon which ties and relationships are constructed. Our elegant design employs a bottom-up approach that resembles a recent attempt to understand subjective well-being by means of a series of emotions. Such a bottom-up approach to social-network studies decomposes each tie into a series of interactions or contacts, which can help deepen our understanding of the complexity embedded in a network structure. Unlike previous network visualizations, ContactTrees highlight how relationships form and change based upon interactions among actors, as well as how relationships and networks vary by contact attributes. Based on a botanical tree metaphor, the design is easy to construct and the resulting tree-like visualization can display many properties at both tie and contact levels, thus recapturing a key ingredient missing from conventional techniques of network visualization. We demonstrate ContactTrees using data sets consisting of up to three waves of 3-month contact diaries over the 2004-2012 period, and discuss how this design can be applied to other types of datasets. PMID:26784350
Can tree species diversity be assessed with Landsat data in a temperate forest?

PubMed

Arekhi, Maliheh; Yılmaz, Osman Yalçın; Yılmaz, Hatice; Akyüz, Yaşar Feyza

2017-10-28

The diversity of forest trees as an indicator of ecosystem health can be assessed using the spectral characteristics of plant communities through remote sensing data. The objectives of this study were to investigate alpha and beta tree diversity using Landsat data for six dates in the Gönen dam watershed of Turkey. We used richness and the Shannon and Simpson diversity indices to calculate tree alpha diversity. We also represented the relationship between beta diversity and remotely sensed data using species composition similarity and spectral distance similarity of sampling plots via quantile regression. A total of 99 sampling units, each 20 m × 20 m, were selected using geographically stratified random sampling method. Within each plot, the tree species were identified, and all of the trees with a diameter at breast height (dbh) larger than 7 cm were measured. Presence/absence and abundance data (tree species number and tree species basal area) of tree species were used to determine the relationship between richness and the Shannon and Simpson diversity indices, which were computed with ground field data, and spectral variables derived (2 × 2 pixels and 3 × 3 pixels) from Landsat 8 OLI data. The Shannon-Weiner index had the highest correlation. For all six dates, NDVI (normalized difference vegetation index) was the spectral variable most strongly correlated with the Shannon index and the tree diversity variables. The Ratio of green to red (VI) was the spectral variable least correlated with the tree diversity variables and the Shannon basal area. In both beta diversity curves, the slope of the OLS regression was low, while in the upper quantile, it was approximately twice the lower quantiles. The Jaccard index is closed to one with little difference in both two beta diversity approaches. This result is due to increasing the similarity between the sampling plots when they are located close to each other. The intercept differences between two investigated beta diversity were strongly related to the development stage of a number of sampling plots in the tree species basal area method. To obtain beta diversity, the tree basal area method indicates better result than the tree species number method at representing similarity of regions which are located close together. In conclusion, NDVI is helpful for estimating the alpha diversity of trees over large areas when the vegetation is at the maximum growing season. Beta diversity could be obtained with the spectral heterogeneity of Landsat data. Future tree diversity studies using remote sensing data should select data sets when vegetation is at the maximum growing season. Also, forest tree diversity investigations can be identified by using higher-resolution remote sensing data such as ESA Sentinel 2 data which is freely available since June 2015.
Propagation of dry tropical forest trees in Mexico

Treesearch

Martha A. Cervantes Sanchez

2002-01-01

There is a distinct lack of technical information on the propagation of native tree species from the dry tropical forest ecosystem in Mexico. This ecosystem has come under heavy human pressures to obtain several products such as specialty woods for fuel, posts for fences and construction, forage, edible fruits, stakes for horticulture crops, and medicinal products. The...
Genetic linkage maps of white birches (Betula platyphylla Suk. and B. pendula Roth) based on RAPD and AFLP markers

USDA-ARS?s Scientific Manuscript database

Genetic linkage maps in plants are usually constructed using segregating populations obtained from crosses between two inbred lines such as rice, maize, or soybean. Such populations are generally not available for forest trees because of time constraints. But tree species have the property of outcro...
A Comparison of Height-Accumulation and Volume-Equation Methods for Estimating Tree and Stand Volumes

Treesearch

R.B. Ferguson; V. Clark Baldwin

1995-01-01

Estimating tree and stand volume in mature plantations is time consuming, involving much manpower and equipment; however, several sampling and volume-prediction techniques are available. This study showed that a well-constructed, volume-equation method yields estimates comparable to those of the often more time-consuming, height-accumulation method, even though the...
TESTING TREE-CLASSIFIER VARIANTS AND ALTERNATE MODELING METHODOLOGIES IN THE EAST GREAT BASIN MAPPING UNIT OF THE SOUTHWEST REGIONAL GAP ANALYSIS PROJECT (SW REGAP)

EPA Science Inventory

We tested two methods for dataset generation and model construction, and three tree-classifier variants to identify the most parsimonious and thematically accurate mapping methodology for the SW ReGAP project. Competing methodologies were tested in the East Great Basin mapping un...
Volume tables for red alder.

Treesearch

Floyd A. Johnson; R. M. Kallander; Paul G. Lauterbach

1949-01-01

The increasing importance of red alder as a commercial species in the Pacific Northwest has prompted the three agencies listed above to pool their tree measurement data for the construction of standard regional red alder volume tables. The tables included here were based on trees from a variety of sites and form classes. Approximately one quarter of the total number of...
Choosing appropriate subpopulations for modeling tree canopy cover nationwide

Treesearch

Gretchen G. Moisen; John W. Coulston; Barry T. Wilson; Warren B. Cohen; Mark V. Finco

2012-01-01

In prior national mapping efforts, the country has been divided into numerous ecologically similar mapping zones, and individual models have been constructed for each zone. Additionally, a hierarchical approach has been taken within zones to first mask out areas of nonforest, then target models of tree attributes within forested areas only. This results in many models...
Phylogeny and evolutionary histories of Pyrus L. revealed by phylogenetic trees and networks based on data from multiple DNA sequences

USDA-ARS?s Scientific Manuscript database

Reconstructing the phylogeny of Pyrus has been difficult due to the wide distribution of the genus and lack of informative data. In this study, we collected 110 accessions representing 25 Pyrus species and constructed both phylogenetic trees and phylogenetic networks based on multiple DNA sequence d...
Technical Manual for the Conceptual Learning and Development Assessment Series IV: Tree. Technical Report No. 437. Reprinted December 1977.

ERIC Educational Resources Information Center

DiLuzio, Geneva J.; And Others

This document accompanies the Conceptual Learning and Development Assessment Series III: Tree, a test constructed to chart the conceptual development of individuals. As a technical manual, it contains information on the rationale, development, standardization, and reliability of the test, as well as essential information and statistical data for…
Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches.

PubMed

Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

2013-09-01

The research aims to develop global modeling tools capable of categorizing structurally diverse chemicals in various toxicity classes according to the EEC and European Community directives, and to predict their acute toxicity in fathead minnow using set of selected molecular descriptors. Accordingly, artificial intelligence approach based classification and regression models, such as probabilistic neural networks (PNN), generalized regression neural networks (GRNN), multilayer perceptron neural network (MLPN), radial basis function neural network (RBFN), support vector machines (SVM), gene expression programming (GEP), and decision tree (DT) were constructed using the experimental toxicity data. Diversity and non-linearity in the chemicals' data were tested using the Tanimoto similarity index and Brock-Dechert-Scheinkman statistics. Predictive and generalization abilities of various models constructed here were compared using several statistical parameters. PNN and GRNN models performed relatively better than MLPN, RBFN, SVM, GEP, and DT. Both in two and four category classifications, PNN yielded a considerably high accuracy of classification in training (95.85 percent and 90.07 percent) and validation data (91.30 percent and 86.96 percent), respectively. GRNN rendered a high correlation between the measured and model predicted -log LC50 values both for the training (0.929) and validation (0.910) data and low prediction errors (RMSE) of 0.52 and 0.49 for two sets. Efficiency of the selected PNN and GRNN models in predicting acute toxicity of new chemicals was adequately validated using external datasets of different fish species (fathead minnow, bluegill, trout, and guppy). The PNN and GRNN models showed good predictive and generalization abilities and can be used as tools for predicting toxicities of structurally diverse chemical compounds. Copyright © 2013 Elsevier Inc. All rights reserved.
Trees grow on money: urban tree canopy cover and environmental justice.

PubMed

Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G; Zhou, Weiqi; McHale, Melissa; Grove, J Morgan; O'Neil-Dunne, Jarlath; McFadden, Joseph P; Buckley, Geoffrey L; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L

2015-01-01

This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman's correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns.
Constructing high-quality bounding volume hierarchies for N-body computation using the acceptance volume heuristic

NASA Astrophysics Data System (ADS)

Olsson, O.

2018-01-01

We present a novel heuristic derived from a probabilistic cost model for approximate N-body simulations. We show that this new heuristic can be used to guide tree construction towards higher quality trees with improved performance over current N-body codes. This represents an important step beyond the current practice of using spatial partitioning for N-body simulations, and enables adoption of a range of state-of-the-art algorithms developed for computer graphics applications to yield further improvements in N-body simulation performance. We outline directions for further developments and review the most promising such algorithms.
Tree-oriented interactive processing with an application to theorem-proving, appendix E

NASA Technical Reports Server (NTRS)

Hammerslag, David; Kamin, Samuel N.; Campbell, Roy H.

1985-01-01

The concept of unstructured structure editing and ted, an editor for unstructured trees, is described. Ted is used to manipulate hierarchies of information in an unrestricted manner. The tool was implemented and applied to the problem of organizing formal proofs. As a proof management tool, it maintains the validity of a proof and its constituent lemmas independently from the methods used to validate the proof. It includes an adaptable interface which may be used to invoke theorem provers and other aids to proof construction. Using ted, a user may construct, maintain, and verify formal proofs using a variety of theorem provers, proof checkers, and formatters.
Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

PubMed

Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J

2015-12-01

In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).
Using Classification and Regression Trees (CART) and Random Forests to Analyze Attrition: Results From Two Simulations

PubMed Central

Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.

2016-01-01

In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526
Mummy Lake: An unroofed ceremonial structure within a large-scale ritual landscape

USGS Publications Warehouse

Benson, Larry V.; Griffin, Eleanor R.; Stein, J.R.; Friedman, R. A.; Andrae, S. W.

2014-01-01

The structure at Mesa Verde National Park known historically as Mummy Lake and more recently as Far View Reservoir is not part of a water collection, impoundment, or redistribution system. We offer an alternative explanation for the function of Mummy Lake. We suggest that it is an unroofed ceremonial structure, and that it serves as an essential component of a Chacoan ritual landscape. A wide constructed avenue articulates Mummy Lake with Far View House and Pipe Shrine House. The avenue continues southward for approximately 6 km where it apparently divides connecting with Spruce Tree House and Sun Temple/Cliff Palace. The avenue has previously been interpreted as an irrigation ditch fed by water impounded at Mummy Lake; however, it conforms in every respect to alignments described as Chacoan roads. Tree-ring dates indicate that the construction of Spruce Tree House and Cliff Palace began about A.D. 1225, roughly coincident with the abandonment of the Far View community. This pattern of periodically relocating the focus of an Anasazi community by retiring existing ritual structures and linking them to newly constructed facilities by means of broad avenues was first documented by Fowler and Stein (1992) in Manuelito Canyon, New Mexico. Periods of intense drought appear to have contributed to the relocation of prehistoric Native Americans from the Far View group to Cliff Palace/Spruce Tree House in the mid-13th century and eventually to the abandonment of all Anasazi communities in southwestern Colorado in the late-13th century.
Improving ensemble decision tree performance using Adaboost and Bagging

NASA Astrophysics Data System (ADS)

Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

2015-12-01

Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Decision Tree Approach for Soil Liquefaction Assessment

PubMed Central

Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

2013-01-01

In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498
Decision tree approach for soil liquefaction assessment.

PubMed

Gandomi, Amir H; Fridline, Mark M; Roke, David A

2013-01-01

In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.

Dynamics of investor spanning trees around dot-com bubble.

PubMed

Ranganathan, Sindhuja; Kivelä, Mikko; Kanniainen, Juho

2018-01-01

We identify temporal investor networks for Nokia stock by constructing networks from correlations between investor-specific net-volumes and analyze changes in the networks around dot-com bubble. The analysis is conducted separately for households, financial, and non-financial institutions. Our results indicate that spanning tree measures for households reflected the boom and crisis: the maximum spanning tree measures had a clear upward tendency in the bull markets when the bubble was building up, and, even more importantly, the minimum spanning tree measures pre-reacted the burst of the bubble. At the same time, we find less clear reactions in the minimal and maximal spanning trees of non-financial and financial institutions around the bubble, which suggests that household investors can have a greater herding tendency around bubbles.
Dynamics of investor spanning trees around dot-com bubble

PubMed Central

Kivelä, Mikko; Kanniainen, Juho

2018-01-01

We identify temporal investor networks for Nokia stock by constructing networks from correlations between investor-specific net-volumes and analyze changes in the networks around dot-com bubble. The analysis is conducted separately for households, financial, and non-financial institutions. Our results indicate that spanning tree measures for households reflected the boom and crisis: the maximum spanning tree measures had a clear upward tendency in the bull markets when the bubble was building up, and, even more importantly, the minimum spanning tree measures pre-reacted the burst of the bubble. At the same time, we find less clear reactions in the minimal and maximal spanning trees of non-financial and financial institutions around the bubble, which suggests that household investors can have a greater herding tendency around bubbles. PMID:29897973
Tree Mortality following Prescribed Fire and a Storm Surge Event in Slash Pine ( Pinus elliottii var. densa ) Forests in the Florida Keys, USA

DOE PAGES

Sah, Jay P.; Ross, Michael S.; Snyder, James R.; ...

2010-01-01

In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated withmore » tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.« less
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

NASA Astrophysics Data System (ADS)

Galelli, S.; Castelletti, A.

2013-02-01

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

NASA Astrophysics Data System (ADS)

Galelli, S.; Castelletti, A.

2013-07-01

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

PubMed

Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

2017-10-25

Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey

PubMed Central

Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T

2006-01-01

Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446
Designing Predictive Models for Beta-Lactam Allergy Using the Drug Allergy and Hypersensitivity Database.

PubMed

Chiriac, Anca Mirela; Wang, Youna; Schrijvers, Rik; Bousquet, Philippe Jean; Mura, Thibault; Molinari, Nicolas; Demoly, Pascal

Beta-lactam antibiotics represent the main cause of allergic reactions to drugs, inducing both immediate and nonimmediate allergies. The diagnosis is well established, usually based on skin tests and drug provocation tests, but cumbersome. To design predictive models for the diagnosis of beta-lactam allergy, based on the clinical history of patients with suspicions of allergic reactions to beta-lactams. The study included a retrospective phase, in which records of patients explored for a suspicion of beta-lactam allergy (in the Allergy Unit of the University Hospital of Montpellier between September 1996 and September 2012) were used to construct predictive models based on a logistic regression and decision tree method; a prospective phase, in which we performed an external validation of the chosen models in patients with suspicion of beta-lactam allergy recruited from 3 allergy centers (Montpellier, Nîmes, Narbonne) between March and November 2013. Data related to clinical history and allergy evaluation results were retrieved and analyzed. The retrospective and prospective phases included 1991 and 200 patients, respectively, with a different prevalence of confirmed beta-lactam allergy (23.6% vs 31%, P = .02). For the logistic regression method, performances of the models were similar in both samples: sensitivity was 51% (vs 60%), specificity 75% (vs 80%), positive predictive value 40% (vs 57%), and negative predictive value 83% (vs 82%). The decision tree method reached a sensitivity of 29.5% (vs 43.5%), specificity of 96.4% (vs 94.9%), positive predictive value of 71.6% (vs 79.4%), and negative predictive value of 81.6% (vs 81.3%). Two different independent methods using clinical history predictors were unable to accurately predict beta-lactam allergy and replace a conventional allergy evaluation for suspected beta-lactam allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms.

PubMed

Barzegar, Rahim; Moghaddam, Asghar Asghari; Deo, Ravinesh; Fijani, Elham; Tziritis, Evangelos

2018-04-15

Constructing accurate and reliable groundwater risk maps provide scientifically prudent and strategic measures for the protection and management of groundwater. The objectives of this paper are to design and validate machine learning based-risk maps using ensemble-based modelling with an integrative approach. We employ the extreme learning machines (ELM), multivariate regression splines (MARS), M5 Tree and support vector regression (SVR) applied in multiple aquifer systems (e.g. unconfined, semi-confined and confined) in the Marand plain, North West Iran, to encapsulate the merits of individual learning algorithms in a final committee-based ANN model. The DRASTIC Vulnerability Index (VI) ranged from 56.7 to 128.1, categorized with no risk, low and moderate vulnerability thresholds. The correlation coefficient (r) and Willmott's Index (d) between NO 3 concentrations and VI were 0.64 and 0.314, respectively. To introduce improvements in the original DRASTIC method, the vulnerability indices were adjusted by NO 3 concentrations, termed as the groundwater contamination risk (GCR). Seven DRASTIC parameters utilized as the model inputs and GCR values utilized as the outputs of individual machine learning models were served in the fully optimized committee-based ANN-predictive model. The correlation indicators demonstrated that the ELM and SVR models outperformed the MARS and M5 Tree models, by virtue of a larger d and r value. Subsequently, the r and d metrics for the ANN-committee based multi-model in the testing phase were 0.8889 and 0.7913, respectively; revealing the superiority of the integrated (or ensemble) machine learning models when compared with the original DRASTIC approach. The newly designed multi-model ensemble-based approach can be considered as a pragmatic step for mapping groundwater contamination risks of multiple aquifer systems with multi-model techniques, yielding the high accuracy of the ANN committee-based model. Copyright © 2017 Elsevier B.V. All rights reserved.
Reliability database development for use with an object-oriented fault tree evaluation program

NASA Technical Reports Server (NTRS)

Heger, A. Sharif; Harringtton, Robert J.; Koen, Billy V.; Patterson-Hine, F. Ann

1989-01-01

A description is given of the development of a fault-tree analysis method using object-oriented programming. In addition, the authors discuss the programs that have been developed or are under development to connect a fault-tree analysis routine to a reliability database. To assess the performance of the routines, a relational database simulating one of the nuclear power industry databases has been constructed. For a realistic assessment of the results of this project, the use of one of existing nuclear power reliability databases is planned.
Tree-Based Unrooted Phylogenetic Networks.

PubMed

Francis, A; Huber, K T; Moulton, V

2018-02-01

Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.
Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer--a classification tree approach.

PubMed

Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven

2006-04-20

A critical choice facing breast cancer patients is which surgical treatment--mastectomy or breast conserving surgery (BCS)--is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of "propensity" is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients.
Optimizing a basal bark spray of dinotefuran to manage armored scales (Hemiptera: Diaspididae) in Christmas tree plantations.

PubMed

Cowles, Richard S

2010-10-01

The armored scales Fiorinia externa Ferris and Aspidiotus cryptomeriae Kuwana (Hemiptera: Diaspididae) are increasingly damaging to Christmas tree plantings in southern New England. The systemic insecticide dinotefuran was investigated for selectively suppressing armored scale populations relative to their natural enemies in cooperating growers' fields in 2008 and 2009. Banded soil application of dinotefuran resulted in poor control. However, a dinotefuran spray applied to the basal 25 cm of trunk resulted in its absorption through the bark, translocation to the foliage, and good efficacy. The basal bark spray did not significantly impact the activity of predators Chilocorus stigma (Say) or Cybocephalus nipponicus Enrody-Younga and in 2009 showed a dosage-dependent improvement in the percentage of scales parasitized by Encarsia citrina Craw. A field dosage-response factorial experiment revealed that a 0.25% (vol:vol) addition of a surfactant with dinotefuran did not enhance insecticidal effect. Probit-transformed scale population reduction relative to the untreated check was subjected to linear regression analysis; reduction of scale populations was proportional to the log of insecticide dosage, whereas basal bark spray efficacy declined in proportion to the cube of tree height. The regression equation can be used to optimize dosage relative to tree height. Excellent efficacy resulted from basal bark spray application dates of 28 April (prebud break) to mid-June, but earlier spray timing within that treatment window had fewer crawlers discoloring new growth with their short-lived feeding. A basal bark spray of dinotefuran is well suited for integration with natural enemies to manage armored scales in Christmas tree plantations.
External heart deformities in passerine birds exposed to environmental mixtures of polychlorinated biphenyls during development.

PubMed

DeWitt, Jamie C; Millsap, Deborah S; Yeager, Ronnie L; Heise, Steve S; Sparks, Daniel W; Henshel, Diane S

2006-02-01

Necropsy-observable cardiac deformities were evaluated from 283 nestling passerines collected from one reference site and five polychlorinated biphenyl (PCB)-contaminated sites around Bloomington and Bedford, Indiana, USA. Hearts were weighed and assessed on relative scales in three dimensions (height, length, and width) and for externally visible deformities. Heart weights normalized to body weight (heart somatic index) were decreased significantly at the more contaminated sites in both house wren (Troglodytes aedon) and tree swallow (Tachycineta bicolor). Heart somatic indices significantly correlated with log PCB concentrations in Carolina chickadee (Parus carolinesis) and tree swallow and with log 2,3,7,8-tetrachlorodibenzo-p-dioxin toxic equivalent values in tree swallow alone. Ventricular length was increased significantly in eastern bluebirds (Sialia sialis) and decreased significantly in Carolina chickadee and tree swallow from contaminated sites versus the reference site. Heart length regressed significantly against the log PCB concentrations (Carolina chickadee and tree swallow) or the square of the PCB concentrations (red-winged blackbird [Agelaius phoeniceus]) in a sibling bird. The deformities that were observed most at the contaminated sites included abnormal tips (pointed, rounded, or flattened), center rolls, macro- and microsurface roughness, ventricular indentations on the ventral or dorsal surface, lateral ventricular notches, visibly thin ventricular walls, and changes in overall heart shape. A pooled heart deformity index regressed significantly against the logged contaminant concentrations for all species except red-winged blackbird. These results indicate that developmental changes in heart morphometrics and shape abnormalities are quantifiable and may be sensitive and useful indicators of PCB-related developmental impacts across many avian species.
Plasticity in dendroclimatic response across the distribution range of Aleppo Pine (Pinus halepensis)

Treesearch

Martin de Luis; Katarina Cufar; Alfredo Di Filippo; Klemen Novak; Andreas Papadopoulos; Gianluca Piovesan; Cyrille B. K. Rathgeber; José Raventós; Miguel Angel Saz; Kevin T. Smith

2013-01-01

We investigated the variability of the climate-growth relationship of Aleppo pine across its distribution range in the Mediterranean Basin. We constructed a network of tree-ring index chronologies from 63 sites across the region. Correlation function analysis identified the relationships of tree-ring index to climate factors for each site. We also estimated the...
Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

PubMed

Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
A short note on the use of the red-black tree in Cartesian adaptive mesh refinement algorithms

NASA Astrophysics Data System (ADS)

Hasbestan, Jaber J.; Senocak, Inanc

2017-12-01

Mesh adaptivity is an indispensable capability to tackle multiphysics problems with large disparity in time and length scales. With the availability of powerful supercomputers, there is a pressing need to extend time-proven computational techniques to extreme-scale problems. Cartesian adaptive mesh refinement (AMR) is one such method that enables simulation of multiscale, multiphysics problems. AMR is based on construction of octrees. Originally, an explicit tree data structure was used to generate and manipulate an adaptive Cartesian mesh. At least eight pointers are required in an explicit approach to construct an octree. Parent-child relationships are then used to traverse the tree. An explicit octree, however, is expensive in terms of memory usage and the time it takes to traverse the tree to access a specific node. For these reasons, implicit pointerless methods have been pioneered within the computer graphics community, motivated by applications requiring interactivity and realistic three dimensional visualization. Lewiner et al. [1] provides a concise review of pointerless approaches to generate an octree. Use of a hash table and Z-order curve are two key concepts in pointerless methods that we briefly discuss next.
Data mining of tree-based models to analyze freeway accident frequency.

PubMed

Chang, Li-Yen; Chen, Wen-Chieh

2005-01-01

Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.
Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

PubMed

Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

2010-03-19

This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

PubMed

Sankari, E Siva; Manimegalai, D

2017-12-21

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.