Science.gov

Sample records for additive regression trees

  1. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    PubMed

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W

    2016-07-20

    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  2. Self-Adaptive Induction of Regression Trees.

    PubMed

    Fidalgo-Merino, Raúl; Núñez, Marlon

    2011-08-01

    A new algorithm for incremental construction of binary regression trees is presented. This algorithm, called SAIRT, adapts the induced model when facing data streams involving unknown dynamics, like gradual and abrupt function drift, changes in certain regions of the function, noise, and virtual drift. It also handles both symbolic and numeric attributes. The proposed algorithm can automatically adapt its internal parameters and model structure to obtain new patterns, depending on the current dynamics of the data stream. SAIRT can monitor the usefulness of nodes and can forget examples from selected regions, storing the remaining ones in local windows associated to the leaves of the tree. On these conditions, current regression methods need a careful configuration depending on the dynamics of the problem. Experimentation suggests that the proposed algorithm obtains better results than current algorithms when dealing with data streams that involve changes with different speeds, noise levels, sampling distribution of examples, and partial or complete changes of the underlying function. PMID:21263164

  3. Growth in Mathematics Achievement: Analysis with Classification and Regression Trees

    ERIC Educational Resources Information Center

    Ma, Xin

    2005-01-01

    A recently developed statistical technique, often referred to as classification and regression trees (CART), holds great potential for researchers to discover how student-level (and school-level) characteristics interactively affect growth in mathematics achievement. CART is a host of advanced statistical methods that statistically cluster…

  4. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    PubMed Central

    Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.

    2016-01-01

    Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P < 0.01). Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607

  5. Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones

    NASA Astrophysics Data System (ADS)

    Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.

    2015-11-01

    This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.

  6. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…

  7. Applications of tree-structured regression for regional precipitation prediction

    NASA Astrophysics Data System (ADS)

    Li, Xiangshang

    2000-11-01

    This thesis presents a Tree-Structured Regression (TSR) method to relate daily precipitation with a variety of free atmosphere variables. Historical data were used to identify distinct weather patterns associated with differing types of precipitation events. Models were developed using 67% of the data for training and the remaining data for model validation. Seasonal models were built for each of four U.S. sites; New Orleans Louisiana, San Antonio and Amarillo of Texas as well as San Francisco California. The average correlation by site between observed and simulated daily precipitation data series range from 0.69 to 0.79 for the training set, and 0.64 to 0.79 for the validation set. Relative humidity related variables were found to be the dominant variables in these TSR models. Output from an NCAR Climate System Model (CSM) transient simulation of climate change were then used to drive the TSR models for predicting precipitation characteristics under climate change. A preliminary screening of the GCM output variables for current climate, however, revealed significant problems for the New Orleans, San Antonio and Amarillo sites. Specifically, the CSM missed the annual trends in humidity for the grid cells containing these sites. CSM output for the San Francisco site was found to be much more reliable. Therefore, we present future precipitation estimates only for the San Francisco site. While both GCM and TSR predict very small change in overall annual precipitation, they differ significantly from season to season.

  8. Prediction of fishing effort distributions using boosted regression trees.

    PubMed

    Soykan, Candan U; Eguchi, Tomoharu; Kohin, Suzanne; Dewar, Heidi

    2014-01-01

    Concerns about bycatch of protected species have become a dominant factor shaping fisheries management. However, efforts to mitigate bycatch are often hindered by a lack of data on the distributions of fishing effort and protected species. One approach to overcoming this problem has been to overlay the distribution of past fishing effort with known locations of protected species, often obtained through satellite telemetry and occurrence data, to identify potential bycatch hotspots. This approach, however, generates static bycatch risk maps, calling into question their ability to forecast into the future, particularly when dealing with spatiotemporally dynamic fisheries and highly migratory bycatch species. In this study, we use boosted regression trees to model the spatiotemporal distribution of fishing effort for two distinct fisheries in the North Pacific Ocean, the albacore (Thunnus alalunga) troll fishery and the California drift gillnet fishery that targets swordfish (Xiphias gladius). Our results suggest that it is possible to accurately predict fishing effort using < 10 readily available predictor variables (cross-validated correlations between model predictions and observed data -0.6). Although the two fisheries are quite different in their gears and fishing areas, their respective models had high predictive ability, even when input data sets were restricted to a fraction of the full time series. The implications for conservation and management are encouraging: Across a range of target species, fishing methods, and spatial scales, even a relatively short time series of fisheries data may suffice to accurately predict the location of fishing effort into the future. In combination with species distribution modeling of bycatch species, this approach holds promise as a mitigation tool when observer data are limited. Even in data-rich regions, modeling fishing effort and bycatch may provide more accurate estimates of bycatch risk than partial observer coverage

  9. Estimation of adjusted rate differences using additive negative binomial regression.

    PubMed

    Donoghoe, Mark W; Marschner, Ian C

    2016-08-15

    Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27073156

  10. Building optimal regression tree by ant colony system-genetic algorithm: application to modeling of melting points.

    PubMed

    Hemmateenejad, Bahram; Shamsipur, Mojtaba; Zare-Shahabadi, Vali; Akhond, Morteza

    2011-10-17

    The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure. PMID:21907021

  11. Response of the regression tree model to high resolution remote sensing data for predicting percent tree cover in a Mediterranean ecosystem.

    PubMed

    Donmez, Cenk; Berberoglu, Suha; Erdogan, Mehmet Akif; Tanriover, Anil Akin; Cilek, Ahmet

    2015-02-01

    Percent tree cover is the percentage of the ground surface area covered by a vertical projection of the outermost perimeter of the plants. It is an important indicator to reveal the condition of forest systems and has a significant importance for ecosystem models as a main input. The aim of this study is to estimate the percent tree cover of various forest stands in a Mediterranean environment based on an empirical relationship between tree coverage and remotely sensed data in Goksu Watershed located at the Eastern Mediterranean coast of Turkey. A regression tree algorithm was used to simulate spatial fractions of Pinus nigra, Cedrus libani, Pinus brutia, Juniperus excelsa and Quercus cerris using multi-temporal LANDSAT TM/ETM data as predictor variables and land cover information. Two scenes of high resolution GeoEye-1 images were employed for training and testing the model. The predictor variables were incorporated in addition to biophysical variables estimated from the LANDSAT TM/ETM data. Additionally, normalised difference vegetation index (NDVI) was incorporated to LANDSAT TM/ETM band settings as a biophysical variable. Stepwise linear regression (SLR) was applied for selecting the relevant bands to employ in regression tree process. SLR-selected variables produced accurate results in the model with a high correlation coefficient of 0.80. The output values ranged from 0 to 100 %. The different tree species were mapped in 30 m resolution in respect to elevation. Percent tree cover map as a final output was derived using LANDSAT TM/ETM image over Goksu Watershed and the biophysical variables. The results were tested using high spatial resolution GeoEye-1 images. Thus, the combination of the RT algorithm and higher resolution data for percent tree cover mapping were tested and examined in a complex Mediterranean environment. PMID:25604062

  12. The identification of complex interactions in epidemiology and toxicology: a simulation study of boosted regression trees

    PubMed Central

    2014-01-01

    Background There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. Methods We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method’s ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data. Results The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set. Conclusions We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies. PMID:24993424

  13. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    PubMed Central

    Shin, Yoonseok

    2015-01-01

    Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227

  14. Prediction of radiation levels in residences: A methodological comparison of CART (Classification and Regression Tree Analysis) and conventional regression

    SciTech Connect

    Janssen, I.; Stebbings, J.H.

    1990-01-01

    In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and {approximately}200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs.

  15. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  16. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  17. Which Phylogenetic Networks are Merely Trees with Additional Arcs?

    PubMed

    Francis, Andrew R; Steel, Mike

    2015-09-01

    A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on "2-SAT") that efficiently determines whether or not any given network can be realized in this way. Moreover, the proof provides a polynomial-time algorithm for finding one or more trees (when they exist) on which the network can be based. A number of interesting consequences are presented as corollaries; these lead to some further relevant questions and observations, which we outline in the conclusion. PMID:26070685

  18. Evaluating multimedia chemical persistence: Classification and regression tree analysis

    SciTech Connect

    Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.

    2000-04-01

    For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Widely distributed persistent chemicals are impossible to remove from the environment and remediation by natural processes may take decades, which is problematic if adverse health or ecological effects are discovered after prolonged release into the environment. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as persistent or nonpersistent based on the chemical properties. In this approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robust for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and can be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.

  19. A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

    USGS Publications Warehouse

    Huang, C.; Townshend, J.R.G.

    2003-01-01

    A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.

  20. Data mining in psychological treatment research: a primer on classification and regression trees.

    PubMed

    King, Matthew W; Resick, Patricia A

    2014-10-01

    Data mining of treatment study results can reveal unforeseen but critical insights, such as who receives the most benefit from treatment and under what circumstances. The usefulness and legitimacy of exploratory data analysis have received relatively little recognition, however, and analytic methods well suited to the task are not widely known in psychology. With roots in computer science and statistics, statistical learning approaches offer a credible option: These methods take a more inductive approach to building a model than is done in traditional regression, allowing the data greater role in suggesting the correct relationships between variables rather than imposing them a priori. Classification and regression trees are presented as a powerful, flexible exemplar of statistical learning methods. Trees allow researchers to efficiently identify useful predictors of an outcome and discover interactions between predictors without the need to anticipate and specify these in advance, making them ideal for revealing patterns that inform hypotheses about treatment effects. Trees can also provide a predictive model for forecasting outcomes as an aid to clinical decision making. This primer describes how tree models are constructed, how the results are interpreted and evaluated, and how trees overcome some of the complexities of traditional regression. Examples are drawn from randomized clinical trial data and highlight some interpretations of particular interest to treatment researchers. The limitations of tree models are discussed, and suggestions for further reading and choices in software are offered. PMID:24588404

  1. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...

  2. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  3. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....

  4. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    ERIC Educational Resources Information Center

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  5. Use of Tree-Based Regression in the Analyses of L2 Reading Test Items

    ERIC Educational Resources Information Center

    Gao, Lingyun; Rogers, W. Todd

    2011-01-01

    The purpose of this study was to explore whether the results of Tree Based Regression (TBR) analyses, informed by a validated cognitive model, would enhance the interpretation of item difficulties in terms of the cognitive processes involved in answering the reading items included in two forms of the Michigan English Language Assessment Battery…

  6. Predicting the limits to tree height using statistical regressions of leaf traits.

    PubMed

    Burgess, Stephen S O; Dawson, Todd E

    2007-01-01

    Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized. PMID:17447917

  7. Nitrogen Addition Enhances Drought Sensitivity of Young Deciduous Tree Species

    PubMed Central

    Dziedek, Christoph; Härdtle, Werner; von Oheimb, Goddert; Fichtner, Andreas

    2016-01-01

    Understanding how trees respond to global change drivers is central to predict changes in forest structure and functions. Although there is evidence on the mode of nitrogen (N) and drought (D) effects on tree growth, our understanding of the interplay of these factors is still limited. Simultaneously, as mixtures are expected to be less sensitive to global change as compared to monocultures, we aimed to investigate the combined effects of N addition and D on the productivity of three tree species (Fagus sylvatica, Quercus petraea, Pseudotsuga menziesii) in relation to functional diverse species mixtures using data from a 4-year field experiment in Northwest Germany. Here we show that species mixing can mitigate the negative effects of combined N fertilization and D events, but the community response is mainly driven by the combination of certain traits rather than the tree species richness of a community. For beech, we found that negative effects of D on growth rates were amplified by N fertilization (i.e., combined treatment effects were non-additive), while for oak and fir, the simultaneous effects of N and D were additive. Beech and oak were identified as most sensitive to combined N+D effects with a strong size-dependency observed for beech, suggesting that the negative impact of N+D becomes stronger with time as beech grows larger. As a consequence, the net biodiversity effect declined at the community level, which can be mainly assigned to a distinct loss of complementarity in beech-oak mixtures. This pattern, however, was not evident in the other species-mixtures, indicating that neighborhood composition (i.e., trait combination), but not tree species richness mediated the relationship between tree diversity and treatment effects on tree growth. Our findings point to the importance of the qualitative role (‘trait portfolio’) that biodiversity play in determining resistance of diverse tree communities to environmental changes. As such, they provide

  8. Nitrogen Addition Enhances Drought Sensitivity of Young Deciduous Tree Species.

    PubMed

    Dziedek, Christoph; Härdtle, Werner; von Oheimb, Goddert; Fichtner, Andreas

    2016-01-01

    Understanding how trees respond to global change drivers is central to predict changes in forest structure and functions. Although there is evidence on the mode of nitrogen (N) and drought (D) effects on tree growth, our understanding of the interplay of these factors is still limited. Simultaneously, as mixtures are expected to be less sensitive to global change as compared to monocultures, we aimed to investigate the combined effects of N addition and D on the productivity of three tree species (Fagus sylvatica, Quercus petraea, Pseudotsuga menziesii) in relation to functional diverse species mixtures using data from a 4-year field experiment in Northwest Germany. Here we show that species mixing can mitigate the negative effects of combined N fertilization and D events, but the community response is mainly driven by the combination of certain traits rather than the tree species richness of a community. For beech, we found that negative effects of D on growth rates were amplified by N fertilization (i.e., combined treatment effects were non-additive), while for oak and fir, the simultaneous effects of N and D were additive. Beech and oak were identified as most sensitive to combined N+D effects with a strong size-dependency observed for beech, suggesting that the negative impact of N+D becomes stronger with time as beech grows larger. As a consequence, the net biodiversity effect declined at the community level, which can be mainly assigned to a distinct loss of complementarity in beech-oak mixtures. This pattern, however, was not evident in the other species-mixtures, indicating that neighborhood composition (i.e., trait combination), but not tree species richness mediated the relationship between tree diversity and treatment effects on tree growth. Our findings point to the importance of the qualitative role ('trait portfolio') that biodiversity play in determining resistance of diverse tree communities to environmental changes. As such, they provide further

  9. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087

  10. Hyperspectral Analysis of Soil Nitrogen, Carbon, Carbonate, and Organic Matter Using Regression Trees

    PubMed Central

    Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L. Monika

    2012-01-01

    The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620

  11. Hyperspectral analysis of soil nitrogen, carbon, carbonate, and organic matter using regression trees.

    PubMed

    Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L Monika

    2012-01-01

    The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R(2) 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R(2) 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R(2) 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R(2) 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620

  12. Regression trees for modeling geochemical data-An application to Late Jurassic carbonates (Ammonitico Rosso)

    NASA Astrophysics Data System (ADS)

    Coimbra, Rute; Rodriguez-Galiano, Victor; Olóriz, Federico; Chica-Olmo, Mario

    2014-12-01

    Research based on ancient carbonate geochemical records is often assisted by multivariate statistical analysis, among others, used for data mining. This contribution reports a complementary approach that can be applied to paleoenvironmental research. The choice to use a machine learning method, here regression trees (RT), relied in the ability to learn complex patterns, integrating multiple types of data with different statistical distributions to obtain a knowledge model of geochemical behavior along a paleo-platform. The Late Jurassic epioceanic deposits under scope are represented by six stratigraphic sections located in SE Spain and on the Majorca Island. The used database comprises a total of 1960 data points corresponding to eight variables (stable C and O isotopes, the elements Ca, Mg, Sr, Fe, Mn and skeletal content). This study uses RT models in which the predictive variables are the geochemical proxies, whilst skeletal content is used as a target variable. The resulting model is data driven, explaining variations in the target variable and providing additional information on the relative importance of each variable to each prediction, as well as its corresponding threshold values. The obtained RT revealed a structured distribution of samples, organized either by stratigraphic section or sets of nearby sections. Averaged estimated skeletal abundance confirmed the initial observations of higher skeletal content for the most distal sections with estimated values from 18% to 27%. In contrast, lower skeletal abundance from 5% to 15% is proposed for the remaining sections. The geochemical variable that best discriminates this major trend is δ18O, at a threshold value of -0.2‰, interpreted as evidence for separation of water-mass properties across the studied areas. Other four variables were considered relevant by the obtained decision tree: C isotopes, Ca, Sr and Mn, providing new insights for further differentiation between sets of samples.

  13. Modelling dissimilarity: generalizing ultrametric and additive tree representations.

    PubMed

    Hubert, L; Arabie, P; Meulman, J

    2001-05-01

    Methods for the hierarchical clustering of an object set produce a sequence of nested partitions such that object classes within each successive partition are constructed from the union of object classes present at the previous level. Any such sequence of nested partitions can in turn be characterized by an ultrametric. An approach to generalizing an (ultrametric) representation is proposed in which the nested character of the partition sequence is relaxed and replaced by the weaker requirement that the classes within each partition contain objects consecutive with respect to a fixed ordering of the objects. A method for fitting such a structure to a given proximity matrix is discussed, along with several alternative strategies for graphical representation. Using this same ultrametric extension, additive tree representations can also be generalized by replacing the ultrametric component in the decomposition of an additive tree (into an ultrametric and a centroid metric). A common numerical illustration is developed and maintained throughout the paper. PMID:11393895

  14. Prioritizing Highway Safety Manual's crash prediction variables using boosted regression trees.

    PubMed

    Saha, Dibakar; Alluri, Priyanka; Gan, Albert

    2015-06-01

    The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data. PMID:25823903

  15. Analysis of Maryland Poisoning Deaths Using Classification And Regression Tree (CART) Analysis

    PubMed Central

    Pamer, Carol; Serpi, Tracey; Finkelstein, Joseph

    2008-01-01

    Our study is a cross-sectional analysis of Maryland poisoning deaths for years 2003 and 2004. We used Classification and Regression Tree (CART) methodology to classify 1,204 Maryland undetermined intent poisoning deaths as either unintentional or suicidal poisonings. The predictive ability of the selected set of variables (i.e., poisoned in the home or workplace, location type where poisoned, place of death, poison type, victim race and age, year of death) was extremely good. Of the 301 test cases, only eight were misclassified by the CART regression tree. Of 1,204 undetermined intent poisoning deaths, CART classified 903 as suicides and 301 as unintentional deaths. The major strength of our study is the use of CART to differentiate with a high degree of accuracy between unintentional and suicidal poisoning deaths among Maryland undetermined intent poisoning deaths. PMID:18999168

  16. Estimating Basin Snow Volume Using Aerial LiDAR and Binary Regression Trees (Invited)

    NASA Astrophysics Data System (ADS)

    Shallcross, A. T.; McNamara, J. P.; Flores, A. N.; Marshall, H.; Marks, D. G.; Glenn, N. F.

    2010-12-01

    Snow cover derived from airborne LiDAR (Light Detection And Ranging) is combined with binary regression trees to improve the prediction of total basin snow volume for the Dry Creek Experimental Watershed (DCEW), ID. These methods are used to identify site-specific topographic controls on the spatial distribution of snow so that future point measurements of snow depth can be distributed through space efficiently. LiDAR is used to map snow cover by differencing the digital elevation models (DEMs) obtained from a snow-covered overflight and a snow-free overflight. Topographic parameters known to control snow distribution are calculated from the snow free LiDAR dataset. Here, mean vegetation height, slope, aspect, solar radiation, and elevation are used to predict snow depth via a binary regression tree using ten-fold cross-validation. The branches leading to the terminal nodes of the regression tree are used to segment the watershed into homogeneous snow distribution units. Preliminary results indicate that 23 statistically significant discrete units exist. Thus, during future field campaigns, point measurements of snow depth can be gathered and distributed throughout these units. Mean measured SWE/depth of each unit can be summed to determine the total basin snow volume. This method should decrease field time and improve the accuracy of basin snow volume estimates for watershed analyses.

  17. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    PubMed

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period. PMID:27400527

  18. Stressor-response modeling using the 2D water quality model and regression trees to predict chlorophyll-a in a reservoir system

    NASA Astrophysics Data System (ADS)

    Park, Yongeun; Pachepsky, Yakov A.; Cho, Kyung Hwa; Jeon, Dong Jin; Kim, Joon Ha

    2015-10-01

    To control algal blooms, the stressor-response relationships between water quality metrics, environmental variables, and algal growth need to be better understood and modeled. Machine-learning methods have been suggested as means to express the stressor-response relationships that are found when applying mechanistic water quality models. The objective of this work was to evaluate the efficiency of regression trees in the development of a stressor-response model for chlorophyll-a (Chl-a) concentrations, using the results from site-specific mechanistic water quality modeling. The 2-dimensional hydrodynamic and water quality model (CE-QUAL-W2) model was applied to simulate water quality using four-year observational data and additional scenarios of air temperature increases for the Yeongsan Reservoir in South Korea. Regression tree modeling was applied to the results of these simulations. Given the well-expressed seasonality in the simulated Chl-a dynamics, separate regression trees were developed for months from May to September. The regression trees provided a reasonably accurate representation of the stressor-response dependence generated by the CE-QUAL-W2 model. Different stressors were then selected as split variables for different months, and, in most cases, splits by the same stressor variable yielded the same correlation sign between the variable and the Chl-a concentration. Compared to physical variables, nutrient content appeared to better predict Chl-a responses. The highest Chl-a temperature sensitivities were found for May and June. Regression tree splits based on ammonium concentration resulted in a consistent trend of greater sensitivity in the groups of samples with higher ammonium concentrations. Regression tree models provided a transparent visual representation of the stressor-response relationships for Chl-a and its sensitivity. Overall, the representation of relationships using classification and regression tools can be considered a useful

  19. A regression tree approach to identifying subgroups with differential treatment effects.

    PubMed

    Loh, Wei-Yin; He, Xu; Man, Michael

    2015-05-20

    In the fight against hard-to-treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then, we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the generalized unbiased interaction detection and estimation (GUIDE) approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi-squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by-products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data. PMID:25656439

  20. A regression tree approach to identifying subgroups with differential treatment effects

    PubMed Central

    Loh, Wei-Yin; He, Xu; Man, Michael

    2015-01-01

    In the fight against hard-to-treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the GUIDE approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi-squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by-products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data. PMID:25656439

  1. Identifying Population Groups with Low Palliative Care Program Enrolment Using Classification and Regression Tree Analysis

    PubMed Central

    Gao, Jun; Lavergne, M. Ruth; McIntyre, Paul

    2013-01-01

    Classification and regression tree (CART) analysis was used to identify subpopulations with lower palliative care program (PCP) enrolment rates. CART analysis uses recursive partitioning to group predictors. The PCP enrolment rate was 72 percent for the 6,892 adults who died of cancer from 2000 and 2005 in two counties in Nova Scotia, Canada. The lowest PCP enrolment rates were for nursing home residents over 82 years (27 percent), a group residing more than 43 kilometres from the PCP (31 percent), and another group living less than two weeks after their cancer diagnosis (37 percent). The highest rate (86 percent) was for the 2,118 persons who received palliative radiation. Findings from multiple logistic regression (MLR) were provided for comparison. CART findings identified low PCP enrolment subpopulations that were defined by interactions among demographic, social, medical, and health system predictors. PMID:21805944

  2. Delaware River Streamflow Reconstruction using Tree Rings: Exploration of Hierarchical Bayesian Regression

    NASA Astrophysics Data System (ADS)

    Devineni, N.; Lall, U.; Cook, E.; Pederson, N.

    2011-12-01

    We present the application of a linear model in a Hierarchical Bayesian Regression (HBR) framework for reconstructing the summer seasonal averaged streamflow at five stations in the Delaware River Basin using eight newly developed regional tree ring chronologies. This technique directly provides estimates of the posterior probability distribution of each reconstructed streamflow value, considering model parameter uncertainty. The methodology also allows us to shrink the model parameters towards a common mean to incorporate the predictive ability of each tree chronology on multiple stations. We present the results from HBR analysis along with the results from traditional Point by Point Regression (PPR) analysis to demonstrate the benefits of developing the reconstructions under a Bayesian modeling framework. Further, we also present the comparative results of the model validation using various performance evaluation metrics such as reduction in error (RE) and coefficient of efficiency (CE). The reconstructed streamflow at various stations can be utilized to examine the frequency and recurrence attributes of extreme droughts in the region and their potential connections to known low frequency climate modes.

  3. Hatching timing, oxygen availability, and external gill regression in the tree frog, Agalychnis callidryas.

    PubMed

    Warkentin, Karen M

    2002-01-01

    The physiological role of the embryonic external gills in anurans is equivocal. In some species, diffusion alone is clearly sufficient to supply oxygen throughout the embryonic period. In others, morphological elaboration and environmental regulation of the external gills suggest functional importance. Since oxygen stress is a common trigger of hatching, I examined the relationships among hatching timing, oxygen stress, and external gill loss. I worked with the red-eyed tree frog, Agalychnis callidryas, a species with arboreal eggs and aquatic tadpoles in which gill regression is associated with hatching, and hatching timing affects posthatching survival with aquatic predators. Both exposure to a hypoxic gas mixture and submergence in water, a natural context in which hypoxic stress can occur, induced early hatching. Exposure to hyperoxic gas mixtures induced regression of external gills, and subsequent exposure to air induced early hatching. Prostaglandin-induced external gill regression also induced hatching, and this effect was partially ameliorated by exposure to hyperoxic gas. Together, these results suggest that external gills enhance the oxygen uptake of embryos and are necessary to extend embryonic development past the onset of hatching competence. PMID:12024291

  4. Regression tree modeling of forest NPP using site conditions and climate variables across eastern USA

    NASA Astrophysics Data System (ADS)

    Kwon, Y.

    2013-12-01

    As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA

  5. Comparison of universal kriging and regression tree modelling for soil property mapping

    NASA Astrophysics Data System (ADS)

    Kempen, Bas

    2013-04-01

    Geostatistical modelling approaches have been dominating the field of digital soil mapping (DSM) since its inception in the early 1980s. In recent years, however, machine learning methods such as classification and regression trees, random forests, and neural networks have quickly gained popularity among researchers in the DSM community. The increased use of these methods has largely gone at the cost of geostatistical approaches. Despite the apparent shift in the application of DSM methods from geostatistics to machine learning, quantitative comparisons of the prediction performance of these methods are largely lacking. The aims of this research, therefore, are: i) to map two soil properties (topsoil organic matter content and thickness of the peat layer in the soil profile) using regression tree (RT) modelling and universal kriging (UK), and ii) to compare the prediction performance of these methods with independent data obtained by probability sampling. Using such data for validation does not only yield a statistically valid and unbiased estimates of the map accuracy, but it also allows a statistical comparison of the accuracies of the maps generated by the two methods. The topsoil organic matter content and the thickness of the peat layer were mapped for a 14,000 ha area in the province of Drenthe, The Netherlands. The calibration dataset contained soil property observations at 1,715 sites. The covariates used include layers derived from soil and paleogeography maps, land cover, relative elevation, drainage class, land reclamation period, elevation change, and historic land use. The validation dataset contained 125 observations selected by stratified simple random sampling of the study area. The root mean squared error (RMSE) of the soil organic matter map obtained by RT modelling was 0.603 log(%), that of the map obtained by UK 0.595 log(%). The difference in map accuracy was not significant (p = 0.377). The RMSE of the peat thickness map obtained by RT

  6. Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis.

    PubMed

    Cohen, Ira L; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N S; Romanczyk, Raymond G; Karmel, Bernard Z; Gardner, Judith M

    2016-09-01

    In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80 %, generalized to an independent validation set, and generalized across age groups and sites, and agreed well with ADOS classifications. Parent PDDBIs yielded better results than teacher PDDBIs but, when CART predictions agreed across informants, sensitivity increased. Results also revealed three subtypes of ASD: minimally verbal, verbal, and atypical; and two, relatively common subtypes of non-ASD children: social pragmatic problems and good social skills. These subgroups corresponded to differences in behavior profiles and associated bio-medical findings. PMID:27318809

  7. Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees

    NASA Astrophysics Data System (ADS)

    Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.

    2015-12-01

    In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.

  8. Nitrogen and phosphorus additions negatively affect tree species diversity in tropical forest regrowth trajectories.

    PubMed

    Siddique, Ilyas; Vieira, Ima Célia Guimarães; Schmidt, Susanne; Lamb, David; Carvalho, Cláudio José Reis; Figueiredo, Ricardo de Oliveira; Blomberg, Simon; Davidson, Eric A

    2010-07-01

    Nutrient enrichment is increasingly affecting many tropical ecosystems, but there is no information on how this affects tree biodiversity. To examine dynamics in vegetation structure and tree species biomass and diversity, we annually remeasured tree species before and for six years after repeated additions of nitrogen (N) and phosphorus (P) in permanent plots of abandoned pasture in Amazonia. Nitrogen and, to a lesser extent, phosphorus addition shifted growth among woody species. Nitrogen stimulated growth of two common pioneer tree species and one common tree species adaptable to both high- and low-light environments, while P stimulated growth only of the dominant pioneer tree Rollinia exsucca (Annonaceae). Overall, N or P addition reduced tree assemblage evenness and delayed tree species accrual over time, likely due to competitive monopolization of other resources by the few tree species responding to nutrient enrichment with enhanced establishment and/or growth rates. Absolute tree growth rates were elevated for two years after nutrient addition. However, nutrient-induced shifts in relative tree species growth and reduced assemblage evenness persisted for more than three years after nutrient addition, favoring two nutrient-responsive pioneers and one early-secondary tree species. Surprisingly, N + P effects on tree biomass and species diversity were consistently weaker than N-only and P-only effects, because grass biomass increased dramatically in response to N + P addition. The resulting intensified competition probably prevented an expected positive N + P synergy in the tree assemblage. Thus, N or P enrichment may favor unknown tree functional response types, reduce the diversity of coexisting species, and delay species accrual during structurally and functionally complex tropical rainforest secondary succession. PMID:20715634

  9. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    ERIC Educational Resources Information Center

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  10. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  11. Multicenter study on caries risk assessment in adults using survival Classification and Regression Trees

    PubMed Central

    Arino, Masumi; Ito, Ataru; Fujiki, Shozo; Sugiyama, Seiichi; Hayashi, Mikako

    2016-01-01

    Dental caries is an important public health problem worldwide. This study aims to prove how preventive therapies reduce the onset of caries in adult patients, and to identify patients with high or low risk of caries by using Classification and Regression Trees based survival analysis (survival CART). A clinical data set of 732 patients aged 20 to 64 years in nine Japanese general practices was analyzed with the following parameters: age, DMFT, number of mutans streptococci (SM) and Lactobacilli (LB), secretion rate and buffer capacity of saliva, and compliance with a preventive program. Results showed the incidence of primary carious lesion was affected by SM, LB and compliance with a preventive program; secondary carious lesion was affected by DMFT, SM and LB. Survival CART identified high-risk patients for primary carious lesion according to their poor compliance with a preventive program and SM (≥106 CFU/ml) with a hazard ratio of 3.66 (p = 0.0002). In the case of secondary caries, patients with LB (≥105 CFU/ml) and DMFT (>15) were identified as high risk with a hazard ratio of 3.50 (p < 0.0001). We conclude that preventive programs can be effective in limiting the incidence of primary carious lesion. PMID:27381750

  12. Matching in Vitro Bioaccessibility of Polyphenols and Antioxidant Capacity of Soluble Coffee by Boosted Regression Trees.

    PubMed

    Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A

    2015-11-01

    The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods. PMID:26457815

  13. Multicenter study on caries risk assessment in adults using survival Classification and Regression Trees.

    PubMed

    Arino, Masumi; Ito, Ataru; Fujiki, Shozo; Sugiyama, Seiichi; Hayashi, Mikako

    2016-01-01

    Dental caries is an important public health problem worldwide. This study aims to prove how preventive therapies reduce the onset of caries in adult patients, and to identify patients with high or low risk of caries by using Classification and Regression Trees based survival analysis (survival CART). A clinical data set of 732 patients aged 20 to 64 years in nine Japanese general practices was analyzed with the following parameters: age, DMFT, number of mutans streptococci (SM) and Lactobacilli (LB), secretion rate and buffer capacity of saliva, and compliance with a preventive program. Results showed the incidence of primary carious lesion was affected by SM, LB and compliance with a preventive program; secondary carious lesion was affected by DMFT, SM and LB. Survival CART identified high-risk patients for primary carious lesion according to their poor compliance with a preventive program and SM (≥10(6) CFU/ml) with a hazard ratio of 3.66 (p = 0.0002). In the case of secondary caries, patients with LB (≥10(5) CFU/ml) and DMFT (>15) were identified as high risk with a hazard ratio of 3.50 (p < 0.0001). We conclude that preventive programs can be effective in limiting the incidence of primary carious lesion. PMID:27381750

  14. Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree

    NASA Astrophysics Data System (ADS)

    Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao

    2016-08-01

    Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.

  15. Improving Automatic English Writing Assessment Using Regression Trees and Error-Weighting

    NASA Astrophysics Data System (ADS)

    Lee, Kong-Joo; Kim, Jee-Eun

    The proposed automated scoring system for English writing tests provides an assessment result including a score and diagnostic feedback to test-takers without human's efforts. The system analyzes an input sentence and detects errors related to spelling, syntax and content similarity. The scoring model has adopted one of the statistical approaches, a regression tree. A scoring model in general calculates a score based on the count and the types of automatically detected errors. Accordingly, a system with higher accuracy in detecting errors raises the accuracy in scoring a test. The accuracy of the system, however, cannot be fully guaranteed for several reasons, such as parsing failure, incompleteness of knowledge bases, and ambiguous nature of natural language. In this paper, we introduce an error-weighting technique, which is similar to term-weighting widely used in information retrieval. The error-weighting technique is applied to judge reliability of the errors detected by the system. The score calculated with the technique is proven to be more accurate than the score without it.

  16. Exploring the link between drought indicators and impacts through data visualization and regression trees

    NASA Astrophysics Data System (ADS)

    Bachmair, Sophie; Stahl, Kerstin; Blauhut, Veit; Kohn, Irene

    2014-05-01

    impact occurrence. The applied data visualization and regression tree approach proved to be a valuable methodology for exploring the link between indicators and impacts. Nevertheless, the results are influenced by the uncertainty of identifying and quantifying drought impacts and vulnerability factors at a suitable spatial and temporal scale. This calls for more research on methodological issues of drought impact and vulnerability assessment, as well as for further developing impact inventories and exploiting the link between drought indicators and impacts.

  17. Availability and Capacity of Substance Abuse Programs in Correctional Settings: A Classification and Regression Tree Analysis

    PubMed Central

    Kitsantas, Panagiota

    2009-01-01

    Objective to be addressed The purpose of this study was to investigate the structural and organizational factors that contribute to the availability and increased capacity for substance abuse treatment programs in correctional settings. We used Classification and Regression Tree statistical procedures to identify how multi-level data can explain the variability in availability and capacity of substance abuse treatment programs in jails and probation/parole offices. Methods The data for this study combined the National Criminal Justice Treatment Practices survey (NCJTP) and the 2000 Census. The NCJTP survey was a nationally representative sample of correctional administrators for jails and probation/parole agencies. The sample size included 295 substance abuse treatment programs that were classified according to the intensity of their services: high, medium, and low. The independent variables included jurisdictional-level structural variables, attributes of the correctional administrators, and program and service delivery characteristics of the correctional agency. Results The two most important variables in predicting the availability of all three types of services were stronger working relationships with other organizations and the adoption of a standardized substance abuse screening tool by correctional agencies. For high and medium intensive programs, the capacity increased when an organizational learning strategy was used by administrators and the organization used a substance abuse screening tool. Implications on advancing treatment practices in correctional settings are discussed, including further work to test theories on how to better understand access to intensive treatment services. This study presents the first phase of understanding capacity-related issues regarding treatment programs offered in correctional settings. PMID:19395204

  18. Chronic subdural hematoma: Surgical management and outcome in 986 cases: A classification and regression tree approach

    PubMed Central

    Rovlias, Aristedis; Theodoropoulos, Spyridon; Papoutsakis, Dimitrios

    2015-01-01

    Background: Chronic subdural hematoma (CSDH) is one of the most common clinical entities in daily neurosurgical practice which carries a most favorable prognosis. However, because of the advanced age and medical problems of patients, surgical therapy is frequently associated with various complications. This study evaluated the clinical features, radiological findings, and neurological outcome in a large series of patients with CSDH. Methods: A classification and regression tree (CART) technique was employed in the analysis of data from 986 patients who were operated at Asclepeion General Hospital of Athens from January 1986 to December 2011. Burr holes evacuation with closed system drainage has been the operative technique of first choice at our institution for 29 consecutive years. A total of 27 prognostic factors were examined to predict the outcome at 3-month postoperatively. Results: Our results indicated that neurological status on admission was the best predictor of outcome. With regard to the other data, age, brain atrophy, thickness and density of hematoma, subdural accumulation of air, and antiplatelet and anticoagulant therapy were found to correlate significantly with prognosis. The overall cross-validated predictive accuracy of CART model was 85.34%, with a cross-validated relative error of 0.326. Conclusions: Methodologically, CART technique is quite different from the more commonly used methods, with the primary benefit of illustrating the important prognostic variables as related to outcome. Since, the ideal therapy for the treatment of CSDH is still under debate, this technique may prove useful in developing new therapeutic strategies and approaches for patients with CSDH. PMID:26257985

  19. Regression Tree-Based Methodology for Customizing Building Energy Benchmarks to Individual Commercial Buildings

    NASA Astrophysics Data System (ADS)

    Kaskhedikar, Apoorva Prakash

    According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI

  20. Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees.

    PubMed

    Cheong, Yoon Ling; Leitão, Pedro J; Lakes, Tobia

    2014-07-01

    The transmission of dengue disease is influenced by complex interactions among vector, host and virus. Land use such as water bodies or certain agricultural practices have been identified as likely risk factors for dengue because of the provision of suitable habitats for the vector. Many studies have focused on the land use factors of dengue vector abundance in small areas but have not yet studied the relationship between land use factors and dengue cases for large regions. This study aims to clarify if land use factors other than human settlements, e.g. different types of agricultural land use, water bodies and forest are associated with reported dengue cases from 2008 to 2010 in the state of Selangor, Malaysia. From the correlative relationship, we aim to generate a prediction risk map. We used Boosted Regression Trees (BRT) to account for nonlinearities and interactions between the factors with high predictive accuracies. Our model with a cross-validated performance score (Area Under the Receiver Operator Characteristic Curve, ROC AUC) of 0.81 showed that the most important land use factors are human settlements (model importance of 39.2%), followed by water bodies (16.1%), mixed horticulture (8.7%), open land (7.5%) and neglected grassland (6.7%). A risk map after 100 model runs with a cross-validated ROC AUC mean of 0.81 (±0.001 s.d.) is presented. Our findings may be an important asset for improving surveillance and control interventions for dengue. PMID:25113593

  1. Effects of aluminum and iron nanoparticle additives on composite AP/HTPB solid propellant regression rate

    NASA Astrophysics Data System (ADS)

    Styborski, Jeremy A.

    This project was started in the interest of supplementing existing data on additives to composite solid propellants. The study on the addition of iron and aluminum nanoparticles to composite AP/HTPB propellants was conducted at the Combustion and Energy Systems Laboratory at RPI in the new strand-burner experiment setup. For this study, a large literature review was conducted on history of solid propellant combustion modeling and the empirical results of tests on binders, plasticizers, AP particle size, and additives. The study focused on the addition of nano-scale aluminum and iron in small concentrations to AP/HTPB solid propellants with an average AP particle size of 200 microns. Replacing 1% of the propellant's AP with 40-60 nm aluminum particles produced no change in combustive behavior. The addition of 1% 60-80 nm iron particles produced a significant increase in burn rate, although the increase was lesser at higher pressures. These results are summarized in Table 2. The increase in the burn rate at all pressures due to the addition of iron nanoparticles warranted further study on the effect of concentration of iron. Tests conducted at 10 atm showed that the mean regression rate varied with iron concentration, peaking at 1% and 3%. Regardless of the iron concentration, the regression rate was higher than the baseline AP/HTPB propellants. These results are summarized in Table 3.

  2. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    NASA Astrophysics Data System (ADS)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2016-05-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  3. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    NASA Astrophysics Data System (ADS)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  4. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China

    PubMed Central

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia. PMID:27002822

  5. Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree

    NASA Astrophysics Data System (ADS)

    Kisi, Ozgur

    2015-09-01

    Pan evaporation (Ep) modeling is an important issue in reservoir management, regional water resources planning and evaluation of drinking-water supplies. The main purpose of this study is to investigate the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) in modeling Ep. The first part of the study focused on testing the ability of the LSSVM, MARS and M5Tree models in estimating the Ep data of Mersin and Antalya stations located in Mediterranean Region of Turkey by using cross-validation method. The LSSVM models outperformed the MARS and M5Tree models in estimating Ep of Mersin and Antalya stations with local input and output data. The average root mean square error (RMSE) of the M5Tree and MARS models was decreased by 24-32.1% and 10.8-18.9% using LSSVM models for the Mersin and Antalya stations, respectively. The ability of three different methods was examined in estimation of Ep using input air temperature, solar radiation, relative humidity and wind speed data from nearby station in the second part of the study (cross-station application without local input data). The results showed that the MARS models provided better accuracy than the LSSVM and M5Tree models with respect to RMSE, mean absolute error (MAE) and determination coefficient (R2) criteria. The average RMSE accuracy of the LSSVM and M5Tree was increased by 3.7% and 16.5% using MARS. In the case of without local input data, the average RMSE accuracy of the LSSVM and M5Tree was respectively increased by 11.4% and 18.4% using MARS. In the third part of the study, the ability of the applied models was examined in Ep estimation using input and output data of nearby station. The results reported that the MARS models performed better than the other models with respect to RMSE, MAE and R2 criteria. The average RMSE of the LSSVM and M5Tree was respectively decreased by 54% and 3.4% using MARS. The overall results indicated that

  6. Use of generalized regression tree models to characterize vegetation favoring Anopheles albimanus breeding.

    PubMed

    Hernandez, J E; Epstein, L D; Rodriguez, M H; Rodriguez, A D; Rejmankova, E; Roberts, D R

    1997-03-01

    We propose the use of generalized tree models (GTMs) to analyze data from entomological field studies. Generalized tree models can be used to characterize environments with different mosquito breeding capacity. A GTM simultaneously analyzes a set of predictor variables (e.g., vegetation coverage) in relation to a response variable (e.g., counts of Anopheles albimanus larvae), and how it varies with respect to a set of criterion variables (e.g., presence of predators). The algorithm produces a treelike graphical display with its root at the top and 2 branches stemming down from each node. At each node, conditions on the value of predictors partition the observations into subgroups (environments) in which the relation between response and criterion variables is most homogeneous. PMID:9152872

  7. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    ERIC Educational Resources Information Center

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  8. A comparison of three additive tree algorithms that rely on a least-squares loss criterion.

    PubMed

    Smith, T J

    1998-11-01

    The performances of three additive tree algorithms which seek to minimize a least-squares loss criterion were compared. The algorithms included the penalty-function approach of De Soete (1983), the iterative projection strategy of Hubert & Arabie (1995) and the two-stage ADDTREE algorithm, (Corter, 1982; Sattath & Tversky, 1977). Model fit, comparability of structure, processing time and metric recovery were assessed. Results indicated that the iterative projection strategy consistently located the best-fitting tree, but also displayed a wider range and larger number of local optima. PMID:9854946

  9. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

    PubMed Central

    Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.

    2016-01-01

    Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The

  10. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models.

    PubMed

    Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S

    2016-01-01

    Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The

  11. Identification of sexually abused female adolescents at risk for suicidal ideations: a classification and regression tree analysis.

    PubMed

    Brabant, Marie-Eve; Hébert, Martine; Chagnon, François

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions. PMID:23428149

  12. Structured additive regression modeling of age of menarche and menopause in a breast cancer screening program.

    PubMed

    Duarte, Elisa; de Sousa, Bruno; Cadarso-Suarez, Carmen; Rodrigues, Vitor; Kneib, Thomas

    2014-05-01

    Breast cancer risk is believed to be associated with several reproductive factors, such as early menarche and late menopause. This study is based on the registries of the first time a woman enters the screening program, and presents a spatio-temporal analysis of the variables age of menarche and age of menopause along with other reproductive and socioeconomic factors. The database was provided by the Portuguese Cancer League (LPCC), a private nonprofit organization dealing with multiple issues related to oncology of which the Breast Cancer Screening Program is one of its main activities. The registry consists of 259,652 records of women who entered the screening program for the first time between 1990 and 2007 (45-69-year age group). Structured Additive Regression (STAR) models were used to explore spatial and temporal correlations with a wide range of covariates. These models are flexible enough to deal with a variety of complex datasets, allowing us to reveal possible relationships among the variables considered in this study. The analysis shows that early menarche occurs in younger women and in municipalities located in the interior of central Portugal. Women living in inland municipalities register later ages for menopause, and those born in central Portugal after 1933 show a decreasing trend in the age of menopause. Younger ages of menarche and late menopause are observed in municipalities with a higher purchasing power index. The analysis performed in this study portrays the time evolution of the age of menarche and age of menopause and their spatial characterization, adding to the identification of factors that could be of the utmost importance in future breast cancer incidence research. PMID:24615881

  13. Reconstructing palaeoclimatic variables from fossil pollen using boosted regression trees: comparison and synthesis with other quantitative reconstruction methods

    NASA Astrophysics Data System (ADS)

    Salonen, J. Sakari; Luoto, Miska; Alenius, Teija; Heikkilä, Maija; Seppä, Heikki; Telford, Richard J.; Birks, H. John B.

    2014-03-01

    We test and analyse a new calibration method, boosted regression trees (BRTs) in palaeoclimatic reconstructions based on fossil pollen assemblages. We apply BRTs to multiple Holocene and Lateglacial pollen sequences from northern Europe, and compare their performance with two commonly-used calibration methods: weighted averaging regression (WA) and the modern-analogue technique (MAT). Using these calibration methods and fossil pollen data, we present synthetic reconstructions of Holocene summer temperature, winter temperature, and water balance changes in northern Europe. Highly consistent trends are found for summer temperature, with a distinct Holocene thermal maximum at ca 8000-4000 cal. a BP, with a mean Tjja anomaly of ca +0.7 °C at 6 ka compared to 0.5 ka. We were unable to reconstruct reliably winter temperature or water balance, due to the confounding effects of summer temperature and the great between-reconstruction variability. We find BRTs to be a promising tool for quantitative reconstructions from palaeoenvironmental proxy data. BRTs show good performance in cross-validations compared with WA and MAT, can model a variety of taxon response types, find relevant predictors and incorporate interactions between predictors, and show some robustness with non-analogue fossil assemblages.

  14. Analysis of the effect of evergreen and deciduous trees on urban nitrogen dioxide levels in the U.S. using land-use regression

    NASA Astrophysics Data System (ADS)

    Rao, M.; George, L. A.

    2012-12-01

    Nitrogen dioxide (NO2), an atmospheric pollutant generated primarily by anthropogenic combustion processes, is typically found at higher concentrations in urban areas compared to non-urbanized environments. Elevated NO2 levels have multiple ecosystem effects at different spatial scales. At the local scale, elevated levels affect human health directly and through the formation of secondary pollutants such as ozone and aerosols; at the regional scale secondary pollutants such as nitric acid and organic nitrates have deleterious effects on non-urbanized areas; and, at the global scale, nitrogen oxide emissions significantly alter the natural biogeochemical nitrogen cycle. As cities globally become larger and larger sources of nitrogen oxide emissions, it is important to assess possible mitigation strategies to reduce the impact of emissions locally, regionally and globally. In this study, we build a national land-use regression (LUR) model to compare the impacts of deciduous and evergreen trees on urban NO2 levels in the United States. We use the EPA monitoring network values of NO2 levels for 2006, the 2006 NLCD tree canopy data for deciduous and evergreen canopies, and the US Census Bureau's TIGER shapefiles for roads, railroads, impervious area & population density as proxies for NO2 sources on-road traffic, railroad traffic, off-road and area sources respectively. Our preliminary LUR model corroborates previous LUR studies showing that the presence of trees is associated with reduced urban NO2 levels. Additionally, our model indicates that deciduous and evergreen trees reduce NO2 to different extents, and that the amount of NO2 reduced varies seasonally. The model indicates that every square kilometer of deciduous canopy within a 2km buffer is associated with a reduction in ambient NO2 levels of 0.64 ppb in summer and 0.46ppb in winter. Similarly, every square kilometer of evergreen tree canopy within a 2 km buffer is associated with a reduction in ambient NO2 by

  15. Mineral elements of subtropical tree seedlings in response to elevated carbon dioxide and nitrogen addition.

    PubMed

    Huang, Wenjuan; Zhou, Guoyi; Liu, Juxiu; Zhang, Deqiang; Liu, Shizhong; Chu, Guowei; Fang, Xiong

    2015-01-01

    Mineral elements in plants have been strongly affected by increased atmospheric carbon dioxide (CO2) concentrations and nitrogen (N) deposition due to human activities. However, such understanding is largely limited to N and phosphorus in grassland. Using open-top chambers, we examined the concentrations of potassium (K), calcium (Ca), magnesium (Mg), aluminum (Al), copper (Cu) and manganese (Mn) in the leaves and roots of the seedlings of five subtropical tree species in response to elevated CO2 (ca. 700 μmol CO2 mol(-1)) and N addition (100 kg N ha(-1) yr(-1)) from 2005 to 2009. These mineral elements in the roots responded more strongly to elevated CO2 and N addition than those in the leaves. Elevated CO2 did not consistently decrease the concentrations of plant mineral elements, with increases in K, Al, Cu and Mn in some tree species. N addition decreased K and had no influence on Cu in the five tree species. Given the shifts in plant mineral elements, Schima superba and Castanopsis hystrix were less responsive to elevated CO2 and N addition alone, respectively. Our results indicate that plant stoichiometry would be altered by increasing CO2 and N deposition, and K would likely become a limiting nutrient under increasing N deposition in subtropics. PMID:25794046

  16. Mineral Elements of Subtropical Tree Seedlings in Response to Elevated Carbon Dioxide and Nitrogen Addition

    PubMed Central

    Huang, Wenjuan; Zhou, Guoyi; Liu, Juxiu; Zhang, Deqiang; Liu, Shizhong; Chu, Guowei; Fang, Xiong

    2015-01-01

    Mineral elements in plants have been strongly affected by increased atmospheric carbon dioxide (CO2) concentrations and nitrogen (N) deposition due to human activities. However, such understanding is largely limited to N and phosphorus in grassland. Using open-top chambers, we examined the concentrations of potassium (K), calcium (Ca), magnesium (Mg), aluminum (Al), copper (Cu) and manganese (Mn) in the leaves and roots of the seedlings of five subtropical tree species in response to elevated CO2 (ca. 700 μmol CO2 mol-1) and N addition (100 kg N ha-1 yr-1) from 2005 to 2009. These mineral elements in the roots responded more strongly to elevated CO2 and N addition than those in the leaves. Elevated CO2 did not consistently decrease the concentrations of plant mineral elements, with increases in K, Al, Cu and Mn in some tree species. N addition decreased K and had no influence on Cu in the five tree species. Given the shifts in plant mineral elements, Schima superba and Castanopsis hystrix were less responsive to elevated CO2 and N addition alone, respectively. Our results indicate that plant stoichiometry would be altered by increasing CO2 and N deposition, and K would likely become a limiting nutrient under increasing N deposition in subtropics. PMID:25794046

  17. Further Insight and Additional Inference Methods for Polynomial Regression Applied to the Analysis of Congruence

    ERIC Educational Resources Information Center

    Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti

    2010-01-01

    In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…

  18. [Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

    PubMed

    Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

    2016-05-10

    Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions. PMID:27188374

  19. Assessing College Student Interest in Math and/or Computer Science in a Cross-National Sample Using Classification and Regression Trees

    ERIC Educational Resources Information Center

    Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas

    2012-01-01

    The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The…

  20. Prediction of the Effectiveness of Grass Buffer Strips in Removing Sediment using Artificial Neural Networks and Regression Trees

    NASA Astrophysics Data System (ADS)

    Akram, S.; Ghadiri, H.; Yu, B.

    2013-12-01

    Grass buffer strips are widely used and known as effective management practices for controlling sediment and particulate nutrients. They change the hydrology and hydraulics of the flow by increasing the infiltration rate and decreasing the flow velocity. It is essential to consider the effects of major factors on performance of grass strips in order to predict their efficiency in removing sediment. An artificial neural network model with a 'two-layer feedforward backpropagation' structure and an ensemble of 'bootstrap aggregation' regression trees were developed using data gathered from 35 different studies in order to predict the efficiency of grass strips on removing sediment in different conditions. Slope, length of strips, size distribution of the inflow sediment, antecedent soil moisture, and density and stiffness of the grass strips were the major factors considered in developing the models. The two model predictions of the efficiency of grass strips in trapping sediment compared reasonably well with independent data sets, giving low root mean square errors and high coefficients of model efficiency. The sensitivity analysis showed that particle size distribution, length of strips, and the antecedent soil moisture are the most effective factors upon the performance of grass strips in removing sediment.

  1. Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme

    NASA Astrophysics Data System (ADS)

    Yang, Tiantian; Gao, Xiaogang; Sorooshian, Soroosh; Li, Xin

    2016-03-01

    The controlled outflows from a reservoir or dam are highly dependent on the decisions made by the reservoir operators, instead of a natural hydrological process. Difference exists between the natural upstream inflows to reservoirs and the controlled outflows from reservoirs that supply the downstream users. With the decision maker's awareness of changing climate, reservoir management requires adaptable means to incorporate more information into decision making, such as water delivery requirement, environmental constraints, dry/wet conditions, etc. In this paper, a robust reservoir outflow simulation model is presented, which incorporates one of the well-developed data-mining models (Classification and Regression Tree) to predict the complicated human-controlled reservoir outflows and extract the reservoir operation patterns. A shuffled cross-validation approach is further implemented to improve CART's predictive performance. An application study of nine major reservoirs in California is carried out. Results produced by the enhanced CART, original CART, and random forest are compared with observation. The statistical measurements show that the enhanced CART and random forest overperform the CART control run in general, and the enhanced CART algorithm gives a better predictive performance over random forest in simulating the peak flows. The results also show that the proposed model is able to consistently and reasonably predict the expert release decisions. Experiments indicate that the release operation in the Oroville Lake is significantly dominated by SWP allocation amount and reservoirs with low elevation are more sensitive to inflow amount than others.

  2. Hourly predictive artificial neural network and multivariate regression tree models of Alternaria and Cladosporium spore concentrations in Szczecin (Poland)

    NASA Astrophysics Data System (ADS)

    Grinn-Gofroń, Agnieszka; Strzelczak, Agnieszka

    2009-11-01

    A study was made of the link between time of day, weather variables and the hourly content of certain fungal spores in the atmosphere of the city of Szczecin, Poland, in 2004-2007. Sampling was carried out with a Lanzoni 7-day-recording spore trap. The spores analysed belonged to the taxa Alternaria and Cladosporium. These spores were selected both for their allergenic capacity and for their high level presence in the atmosphere, particularly during summer. Spearman correlation coefficients between spore concentrations, meteorological parameters and time of day showed different indices depending on the taxon being analysed. Relative humidity (RH), air temperature, air pressure and clouds most strongly and significantly influenced the concentration of Alternaria spores. Cladosporium spores correlated less strongly and significantly than Alternaria. Multivariate regression tree analysis revealed that, at air pressures lower than 1,011 hPa the concentration of Alternaria spores was low. Under higher air pressure spore concentrations were higher, particularly when RH was lower than 36.5%. In the case of Cladosporium, under higher air pressure (>1,008 hPa), the spores analysed were more abundant, particularly after 0330 hours. In artificial neural networks, RH, air pressure and air temperature were the most important variables in the model for Alternaria spore concentration. For Cladosporium, clouds, time of day, air pressure, wind speed and dew point temperature were highly significant factors influencing spore concentration. The maximum abundance of Cladosporium spores in air fell between 1200 and 1700 hours.

  3. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree.

    PubMed

    Chao, Cheng-Min; Yu, Ya-Wen; Cheng, Bor-Wen; Kuo, Yao-Lung

    2014-10-01

    The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients' survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90% for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame. PMID:25119239

  4. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    NASA Astrophysics Data System (ADS)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  5. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  6. Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding

    PubMed Central

    Tosteson, Tor D.; Morden, Nancy E.; Stukel, Therese A.; O'Malley, A. James

    2014-01-01

    The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival. PMID:25506259

  7. Whole-fish versus filet polychlorinated-biphenyl concentrations: An analysis using classification and regression tree models

    SciTech Connect

    Amrhein, J.F.; Stow, C.A.; Wible, C.

    1999-08-01

    Fish polychlorinated-biphenyl (PCB) measurements usually represent one of two different sample types: filets or homogenized whole fish. Filet measurements are more appropriate for use if the goal of analysis is estimating human PCB consumption, while whole-fish analysis may be more useful for quantifying and understanding processes of contaminant flow and bioaccumulation. While it is generally assumed that whole-fish PCB concentrations exceed filet concentrations because of the presence of fatty internal organs in whole-fish samples, the literature contains no reported comparisons of filet versus whole-fish PCB concentrations. The authors measured total PCB concentrations in filets and whole-fish samples from the same individuals in Lake Michigan coho salmon (Oncorhynchus kisutch) and rainbow trout (Oncorhynchus mykiss). The average whole-fish to filet PCB concentration ratio was 1.70 for coho salmon and 1.47 for rainbow trout, but it varied considerably among individuals, with a few fish exhibiting a higher concentration in the filet than in the whole-fish sample. Classification and regression tree (CART) models indicated that filet PCB concentration and fish length were the best predictors of whole-fish PCB concentration, whereas filet and whole-fish lipid concentrations were less important predictors. Lipid normalization of the PCB data decreased within-individual variability, was equivocal with respect to variability among individuals, and accentuated the between-species difference. Both species exhibit a pronounced 1:1 relationship between the whole-fish to filet PCB concentration ratio and the whole-fish to filet lipid concentration ratio; however, the authors point out that there is a strong spurious component to this relationship, which indicates that the relationship may be more algebraic rather than an indication of underlying mechanisms.

  8. Multi-scale remote sensing sagebrush characterization with regression trees over Wyoming, USA: laying a foundation for monitoring

    USGS Publications Warehouse

    Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.

    2012-01-01

    agebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change – adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush (Artemisia spp.), percent big sagebrush (Artemisia tridentata), percent Wyoming sagebrush (Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.

  9. Multi-scale remote sensing sagebrush characterization with regression trees over Wyoming, USA: Laying a foundation for monitoring

    NASA Astrophysics Data System (ADS)

    Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.

    2012-02-01

    Sagebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change - adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush ( Artemisia spp.), percent big sagebrush ( Artemisia tridentata), percent Wyoming sagebrush ( Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.

  10. Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

    USGS Publications Warehouse

    Moisen, G.G.; Freeman, E.A.; Blackard, J.A.; Frescino, T.S.; Zimmermann, N.E.; Edwards, T.C., Jr.

    2006-01-01

    Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's?? See5 and Cubist (for binary and continuous responses, respectively) are the tools of choice in many of these applications. These tools are widely used in large remote sensing applications, but are not easily interpretable, do not have ties with survey estimation methods, and use proprietary unpublished algorithms. Consequently, three alternative modelling techniques were compared for mapping presence and basal area of 13 species located in the mountain ranges of Utah, USA. The modelling techniques compared included the widely used See5/Cubist, generalized additive models (GAMs), and stochastic gradient boosting (SGB). Model performance was evaluated using independent test data sets. Evaluation criteria for mapping species presence included specificity, sensitivity, Kappa, and area under the curve (AUC). Evaluation criteria for the continuous basal area variables included correlation and relative mean squared error. For predicting species presence (setting thresholds to maximize Kappa), SGB had higher values for the majority of the species for specificity and Kappa, while GAMs had higher values for the majority of the species for sensitivity. In evaluating resultant AUC values, GAM and/or SGB models had significantly better results than the See5 models where significant differences could be detected between models. For nine out of 13 species, basal area prediction results for all modelling techniques were poor (correlations less than 0.5 and relative mean squared errors greater than 0.8), but SGB provided the most stable predictions in these instances. SGB and Cubist performed equally well for modelling basal area for three species with moderate prediction success

  11. PARAMETRIC AND NON PARAMETRIC (MARS: MULTIVARIATE ADDITIVE REGRESSION SPLINES) LOGISTIC REGRESSIONS FOR PREDICTION OF A DICHOTOMOUS RESPONSE VARIABLE WITH AN EXAMPLE FOR PRESENCE/ABSENCE OF AMPHIBIANS

    EPA Science Inventory

    The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...

  12. Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression

    PubMed Central

    Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.

    2013-01-01

    Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839

  13. Effects of Additives, Photodegradation, and Water-tree Degradation on the Photoluminescence in Polyethylene and Polypropylene

    NASA Astrophysics Data System (ADS)

    Ito, Toshihide; Fuse, Norikazu; Ohki, Yoshimichi

    Photoluminescence (PL) spectra induced by irradiation of ultraviolet photons are compared among low-density polyethylene (LDPE), crosslinked polyethylene (XLPE), and polypropylene (PP). Three PL bands appear around 4.2, 3.6, and 3.1 eV in LDPE and XLPE, while similar three PL bands are observed at similar energies in PP. The PL spectra and their decay profiles are independent of the presence of additives and are also independent of whether the samples were crosslinked or not. These results indicate that neither the additives nor the crosslinking has any significant effects on the respective three PLs in PE and PP. When the sample was pre-irradiated by the ultraviolet photons under different atmospheres (air, O2, and vacuum), all the PL intensities decrease with the progress of the pre-irradiation regardless of whether the sample is PE or PP. Therefore, all the PLs are considered to result from impurities. In all the pre-irradiated samples, a new PL band appears at 2.9 eV, of which intensity is stronger when the oxygen partial pressure during the pre-irradiation was lower. This PL is considered to be due to photo-induced conjugated double bonds. It has also been confirmed that water-tree degradation in LDPE or in XLPE does not contribute to PL.

  14. Contrasting regional and national mechanisms for predicting elevated arsenic in private wells across the United States using classification and regression trees.

    PubMed

    Frederick, Logan; VanDerslice, James; Taddie, Marissa; Malecki, Kristen; Gregg, Josh; Faust, Nicholas; Johnson, William P

    2016-03-15

    Arsenic contamination in groundwater is a public health and environmental concern in the United States (U.S.) particularly where monitoring is not required under the Safe Water Drinking Act. Previous studies suggest the influence of regional mechanisms for arsenic mobilization into groundwater; however, no study has examined how influencing parameters change at a continental scale spanning multiple regions. We herein examine covariates for groundwater in the western, central and eastern U.S. regions representing mechanisms associated with arsenic concentrations exceeding the U.S. Environmental Protection Agency maximum contamination level (MCL) of 10 parts per billion (ppb). Statistically significant covariates were identified via classification and regression tree (CART) analysis, and included hydrometeorological and groundwater chemical parameters. The CART analyses were performed at two scales: national and regional; for which three physiographic regions located in the western (Payette Section and the Snake River Plain), central (Osage Plains of the Central Lowlands), and eastern (Embayed Section of the Coastal Plains) U.S. were examined. Validity of each of the three regional CART models was indicated by values >85% for the area under the receiver-operating characteristic curve. Aridity (precipitation minus potential evapotranspiration) was identified as the primary covariate associated with elevated arsenic at the national scale. At the regional scale, aridity and pH were the major covariates in the arid to semi-arid (western) region; whereas dissolved iron (taken to represent chemically reducing conditions) and pH were major covariates in the temperate (eastern) region, although additional important covariates emerged, including elevated phosphate. Analysis in the central U.S. region indicated that elevated arsenic concentrations were driven by a mixture of those observed in the western and eastern regions. PMID:26803265

  15. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  16. A community-level, mesoscale analysis of fish assemblage structure in shoreline habitats of a large river using multivariate regression trees

    NASA Astrophysics Data System (ADS)

    Wilkes, Martin; Maddock, Ian; Link, Oscar; Habit, Evelyn

    2015-04-01

    Despite the numerous advantages over traditional methods ascribed to community-level analyses, including the ability to rapidly predict the abundance of multiple species and the integration of complex biological interactions, very few applications to the mesoscale of river habitats can be found in the extant literature. Most previous work has been based on single species, species-by-species modelling or reduced dimensionality approaches. Community-level analyses have especially good properties for improving the understanding of habitat associations in large rivers where biological interactions are most intense and applications of the mesohabitat concept relatively sparse. This chapter seeks to identify quantitative relationships between key environmental variables and community structure using a particular type of community-level technique known as multivariate regression trees in order to test the ecological basis for applications of the mesohabitat concept in large rivers. Mesohabitats were mapped and their environmental characteristics recorded along a reach of the San Pedro River, Chile, which is inhabited by a highly endemic fish community. A representative portion of the mesohabitats were selected for fish sampling and multivariate regression trees produced to predict community structure based on combinations of environmental variables. The analyses showed that fish assemblages were distinct at the mesoscale, with flow depth, bank materials, cover and woody debris the key predictor variables. The results support the application of the mesohabitat concept in this geographical context and establish a basis for predicting the community structure of any mesohabitat along the reach.

  17. Decomposition of conifer tree bark under field conditions: effects of nitrogen and phosphorus additions

    NASA Astrophysics Data System (ADS)

    Lopes de Gerenyu, Valentin; Kurganova, Irina; Kapitsa, Ekaterina; Shorokhova, Ekaterina

    2016-04-01

    In forest ecosystems, the processes of decomposition of coarse woody debris (CWD) can contribute significantly to the emission component of carbon (C) cycle and thus accelerate the greenhouse effect and global climate change. A better understanding of decomposition of CWD is required to refine estimates of the C balance in forest ecosystems and improve biogeochemical models. These estimates will in turn contribute to assessing the role of forests in maintaining their long-term productivity and other ecosystems services. We examined the decomposition rate of coniferous bark with added nitrogen (N) and phosphorus (P) fertilizers in experiment under field conditions. The experiment was carried out in 2015 during 17 weeks in Moscow region (54o50'N, 37o36'E) under continental-temperate climatic conditions. The conifer tree bark mixture (ca. 70% of Norway spruce and 30% of Scots pine) was combined with soil and placed in piles of soil-bark substrate (SBS) with height of ca. 60 cm and surface area of ca. 3 m2. The dry mass ratio of bark to soil was 10:1. The experimental design included following treatments: (1) soil (Luvisols Haplic) without bark, (S), (2) pure SBS, (3) SBS with N addition in the amount of 1% of total dry bark mass (SBS-N), and (4) SBS with N and P addition in the amount of 1% of total dry bark mass for each element (SBS-NP). The decomposition rate expressed as CO2 emission flux, g C/m2/h was measured using closed chamber method 1-3 times per week from July to early November using LiCor 6400 (Nebraska, USA). During the experiment, we also controlled soil temperature at depths of 5, 20, 40, and 60 cm below surface of SBS using thermochrons iButton (DS1921G, USA). The pattern of CO2 emission rate from SBS depended strongly on fertilizing. The highest decomposition rates (DecR) of 2.8-5.6 g C/m2/h were observed in SBS-NP treatment during the first 6 weeks of experiment. The decay process of bark was less active in the treatment with only N addition. In this

  18. Partitioning of multivariate phenotypes using regression trees reveals complex patterns of adaptation to climate across the range of black cottonwood (Populus trichocarpa)

    PubMed Central

    Oubida, Regis W.; Gantulga, Dashzeveg; Zhang, Man; Zhou, Lecong; Bawa, Rajesh; Holliday, Jason A.

    2015-01-01

    Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13–0.32) and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref) explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP), and mean annual precipitation (MAP). These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref) had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures) had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP) had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP) performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar. PMID:25870603

  19. Hourly predictive artificial neural network and multivariate regression trees models of Ganoderma spore concentrations in Rzeszów and Szczecin (Poland).

    PubMed

    Kasprzyk, Idalia; Grinn-Gofroń, Agnieszka; Strzelczak, Agnieszka; Wolski, Tomasz

    2011-02-01

    Ganoderma spores are one of the most airspora abundant taxa in many regions of the world, and are considered to be important allergens. The aerobiology of Ganoderma basidiospores in two cities in Poland was examined using the volumetric method, (Burkard and Lanzonii Spore Traps), from selected days in 2004, 2005 and 2006. Spores of Ganoderma were present in the atmosphere from June to November, with peak concentrations generally occurring from late July to mid-October. ANN (artificial neural network) and MRT (multivariate regression trees), models indicated that atmospheric phenomenon, hour and relative humidity were the most important variables influencing spore content. The remaining variables (air temperature, dew point, air pressure, wind speed and wind direction), also contributed to the high network performance, (ratio above 1), but their impact was less distinct. Those results are consistent with the Spearman's rank correlation analysis. PMID:21183203

  20. Repeated measurements of blood lactate concentration as a prognostic marker in horses with acute colitis evaluated with classification and regression trees (CART) and random forest analysis.

    PubMed

    Petersen, M B; Tolver, A; Husted, L; Tølbøll, T H; Pihl, T H

    2016-07-01

    The objective of this study was to investigate the prognostic value of single and repeated measurements of blood l-lactate (Lac) and ionised calcium (iCa) concentrations, packed cell volume (PCV) and plasma total protein (TP) concentration in horses with acute colitis. A total of 66 adult horses admitted with acute colitis (<24 h) to a referral hospital in the 2002-2011 period were included. The prognostic value of Lac, iCa, PCV and TP recorded at admission and 6 h post admission was analysed with univariate analysis, logistic regression, classification and regression trees, as well as random forest analysis. Ponies and Icelandic horses made up 59% of the population, whilst the remaining 41% were horses. Blood lactate concentration at admission was the only individual parameter significantly associated with probability of survival to discharge (P < 0.001). In a training sample, a Lac cut-off value of 7 mmol/L had a sensitivity of 0.66 and a specificity of 0.92 in predicting survival. In independent test data, the sensitivity was 0.69 and the specificity was 0.76. At the observed survival rate (38%), the optimal decision tree identified horses as non-survivors when the Lac at admission was ≥4.3 mmol/L and the Lac 6 h post admission stayed at >2 mmol/L (sensitivity, 0.72; specificity, 0.8). In conclusion, blood lactate concentration measured at admission and repeated 6 h later aided the prognostic evaluation of horses with acute colitis in this population with a very high mortality rate. This should allow clinicians to give a more reliable prognosis for the horse. PMID:27240909

  1. A regional classification scheme for estimating reference water quality in streams using land-use-adjusted spatial regression-tree analysis

    USGS Publications Warehouse

    Robertson, D.M.; Saad, D.A.; Heisey, D.M.

    2006-01-01

    Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.

  2. Effects of multiple chronic conditions on health care costs: an analysis based on an advanced tree-based regression model

    PubMed Central

    2013-01-01

    Background To analyze the impact of multimorbidity (MM) on health care costs taking into account data heterogeneity. Methods Data come from a multicenter prospective cohort study of 1,050 randomly selected primary care patients aged 65 to 85 years suffering from MM in Germany. MM was defined as co-occurrence of ≥3 conditions from a list of 29 chronic diseases. A conditional inference tree (CTREE) algorithm was used to detect the underlying structure and most influential variables on costs of inpatient care, outpatient care, medications as well as formal and informal nursing care. Results Irrespective of the number and combination of co-morbidities, a limited number of factors influential on costs were detected. Parkinson’s disease (PD) and cardiac insufficiency (CI) were the most influential variables for total costs. Compared to patients not suffering from any of the two conditions, PD increases predicted mean total costs 3.5-fold to approximately € 11,000 per 6 months, and CI two-fold to approximately € 6,100. The high total costs of PD are largely due to costs of nursing care. Costs of inpatient care were significantly influenced by cerebral ischemia/chronic stroke, whereas medication costs were associated with COPD, insomnia, PD and Diabetes. Except for costs of nursing care, socio-demographic variables did not significantly influence costs. Conclusions Irrespective of any combination and number of co-occurring diseases, PD and CI appear to be most influential on total health care costs in elderly patients with MM, and only a limited number of factors significantly influenced cost. Trial registration Current Controlled Trials ISRCTN89818205 PMID:23768192

  3. An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    PubMed Central

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2010-01-01

    Recursive partitioning methods have become popular and widely used tools for non-parametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing. PMID:19968396

  4. Performance of seedlings of a shade-tolerant tropical tree species after moderate addition of N and P

    NASA Astrophysics Data System (ADS)

    Cárate Tandalla, Daisy; Leuschner, Christoph; Homeier, Jürgen

    2015-12-01

    Nitrogen deposition to tropical forests is predicted to increase in future in many regions due to agricultural intensification. We conducted a seedling transplantation experiment in a tropical premontane forest in Ecuador with a locally abundant late-successional tree species (Pouteria torta, Sapotaceae) aimed at detecting species-specific responses to moderate N and P addition and to understand how increasing nutrient availability will affect regeneration. From locally collected seeds, 320 seedlings were produced and transplanted to the plots of the Ecuadorian Nutrient Manipulation Experiment (NUMEX) with three treatments (moderate N addition: 50 kg N ha-1 yr-1, moderate P addition: 10 kg P ha-1 yr-1 and combined N and P addition) and a control (80 plants per treatment). After 12 months, mortality, relative growth rate, leaf nutrient content and leaf herbivory rate were measured. N and NP addition significantly increased the mortality rate (70 % vs. 54 % in the control). However, N and P addition also increased the diameter growth rate of the surviving seedlings. N and P addition did not alter foliar nutrient concentrations and leaf N:P ratio, but N addition decreased the leaf C:N ratio and increased SLA. P addition (but not N addition) resulted in higher leaf area loss to herbivore consumption and also shifted carbon allocation to root growth. This fertilization experiment with a common rainforest tree species conducted in old-growth forest shows that already moderate doses of added N and P are affecting seedling performance which most likely will have consequences for the competitive strength in the understory and the recruitment success of P. torta. Simultaneous increases in growth, herbivory and mortality rates make it difficult to assess the species' overall performance and predict how a future increase in nutrient deposition will alter the abundance of this species in the Andean tropical montane forests.

  5. Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees

    PubMed Central

    2013-01-01

    Background Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann–Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression. Results The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence… Conclusions The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence. PMID:23506640

  6. Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees

    NASA Astrophysics Data System (ADS)

    Sayegh, Arwa; Tate, James E.; Ropkins, Karl

    2016-02-01

    Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper

  7. Thin cloud removal from remote sensing images using multidirectional dual tree complex wavelet transform and transfer least square support vector regression

    NASA Astrophysics Data System (ADS)

    Hu, Gensheng; Li, Xiaoyi; Liang, Dong

    2015-01-01

    The existence of clouds affects the interpretation and utilization of remote sensing images. A thin cloud removal algorithm for cloud-contaminated remote sensing images is proposed by combining a multidirectional dual tree complex wavelet transform (M-DTCWT) with domain adaptation transfer least square support vector regression (T-LSSVR). First, M-DTCWT is constructed by using the hourglass filter bank in combination with DTCWT, which is used to decompose remote sensing images into multiscale and multidirectional subbands. Then the low-frequency subband coefficients of the cloud-free regions on target images and source domain images are used as samples for a T-LSSVR model, which can be used to predict those of the cloud regions on cloud-contaminated images. Finally, by enhancing the high-frequency coefficients and replacing the low-frequency coefficients, the thin clouds on cloud-contaminated images are removed. Experimental results show that M-DTCWT contributes to keeping the details of the ground objects of cloud-contaminated images, and the T-LSSVR model can effectively learn the contour information from multisource and multitemporal images, therefore, the proposed method achieves a good effect of thin cloud removal.

  8. Identifying changes in dissolved organic matter content and characteristics by fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis during wastewater treatment.

    PubMed

    Yu, Huibin; Song, Yonghui; Liu, Ruixia; Pan, Hongwei; Xiang, Liancheng; Qian, Feng

    2014-10-01

    The stabilization of latent tracers of dissolved organic matter (DOM) of wastewater was analyzed by three-dimensional excitation-emission matrix (EEM) fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis (CART) in wastewater treatment performance. DOM of water samples collected from primary sedimentation, anaerobic, anoxic, oxic and secondary sedimentation tanks in a large-scale wastewater treatment plant contained four fluorescence components: tryptophan-like (C1), tyrosine-like (C2), microbial humic-like (C3) and fulvic-like (C4) materials extracted by self-organizing map. These components showed good positive linear correlations with dissolved organic carbon of DOM. C1 and C2 were representative components in the wastewater, and they were removed to a higher extent than those of C3 and C4 in the treatment process. C2 was a latent parameter determined by CART to differentiate water samples of oxic and secondary sedimentation tanks from the successive treatment units, indirectly proving that most of tyrosine-like material was degraded by anaerobic microorganisms. C1 was an accurate parameter to comprehensively separate the samples of the five treatment units from each other, indirectly indicating that tryptophan-like material was decomposed by anaerobic and aerobic bacteria. EEM fluorescence spectroscopy in combination with self-organizing map and CART analysis can be a nondestructive effective method for characterizing structural component of DOM fractions and monitoring organic matter removal in wastewater treatment process. PMID:25065793

  9. Addition of wsp sequences to the Wolbachia phylogenetic tree and stability of the classification.

    PubMed

    Pintureau, B; Chaudier, S; Lassablière, F; Charles, H; Grenier, S

    2000-10-01

    Wolbachia are symbiotic bacteria altering reproductive characters of numerous arthropods. Their most recent phylogeny and classification are based on sequences of the wsp gene. We sequenced wsp gene from six Wolbachia strains infecting six Trichogramma species that live as egg parasitoids on many insects. This allows us to test the effect of the addition of sequences on the Wolbachia phylogeny and to check the classification of Wolbachia infecting Trichogramma. The six Wolbachia studied are classified in the B supergroup. They confirm the monophyletic structure of the B Wolbachia in Trichogramma but introduce small differences in the Wolbachia classification. Modifications include the definition of a new group, Sem, for Wolbachia of T. semblidis and the merging of the two closely related groups, Sib and Kay. Specific primers were determined and tested for the Sem group. PMID:11040288

  10. Quantifying mineral abundances of complex mixtures by coupling spectral deconvolution of SWIR spectra (2.1-2.4 μm) and regression tree analysis

    USGS Publications Warehouse

    Mulder, V.L.; Plotze, Michael; de Bruin, Sytze; Schaepman, Michael E.; Mavris, C.; Kokaly, Raymond F.; Egli, Markus

    2013-01-01

    This paper presents a methodology for assessing mineral abundances of mixtures having more than two constituents using absorption features in the 2.1-2.4 μm wavelength region. In the first step, the absorption behaviour of mineral mixtures is parameterised by exponential Gaussian optimisation. Next, mineral abundances are predicted by regression tree analysis using these parameters as inputs. The approach is demonstrated on a range of prepared samples with known abundances of kaolinite, dioctahedral mica, smectite, calcite and quartz and on a set of field samples from Morocco. The latter contained varying quantities of other minerals, some of which did not have diagnostic absorption features in the 2.1-2.4 μm region. Cross validation showed that the prepared samples of kaolinite, dioctahedral mica, smectite and calcite were predicted with a root mean square error (RMSE) less than 9 wt.%. For the field samples, the RMSE was less than 8 wt.% for calcite, dioctahedral mica and kaolinite abundances. Smectite could not be well predicted, which was attributed to spectral variation of the cations within the dioctahedral layered smectites. Substitution of part of the quartz by chlorite at the prediction phase hardly affected the accuracy of the predicted mineral content; this suggests that the method is robust in handling the omission of minerals during the training phase. The degree of expression of absorption components was different between the field sample and the laboratory mixtures. This demonstrates that the method should be calibrated and trained on local samples. Our method allows the simultaneous quantification of more than two minerals within a complex mixture and thereby enhances the perspectives of spectral analysis for mineral abundances.

  11. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  12. Responses of nitrous oxide emissions to nitrogen and phosphorus additions in two tropical plantations with N-fixing vs. non-N-fixing tree species

    NASA Astrophysics Data System (ADS)

    Zhang, W.; Zhu, X.; Luo, Y.; Rafique, R.; Chen, H.; Huang, J.; Mo, J.

    2014-01-01

    Leguminous tree plantations at phosphorus (P) limited sites may result in higher rates of nitrous oxide (N2O) emissions, however, the effects of nitrogen (N) and P applications on soil N2O emissions from plantations with N-fixing vs. non-N-fixing tree species has rarely been studied in the field. We conducted an experimental manipulation of N and P additions in two tropical plantations with Acacia auriculiformis (AA) and Eucalyptus urophylla (EU) tree species in South China. The objective was to determine the effects of N- or P-addition alone, as well as NP application together on soil N2O emissions from tropical plantations with N-fixing vs. non-N-fixing tree species. We found that the average N2O emission from control was greater in AA (2.26 ± 0.06 kg N2O-N ha-1 yr-1) than in EU plantation (1.87 ± 0.05 kg N2O-N ha-1 yr-1). For the AA plantation, N-addition stimulated the N2O emission from soil while P-addition did not. Applications of N with P together significantly decreased N2O emission compared to N-addition alone, especially in high level treatment plots (decreased by 18%). In the EU plantation, N2O emissions significantly decreased in P-addition plots compared with the controls, however, N- and NP-additions did not. The differing response of N2O emissions to N- or P-addition was attributed to the higher initial soil N status in the AA than that of the EU plantation, due to symbiotic N fixation in the former. Our results suggest that atmospheric N deposition potentially stimulates N2O emissions from leguminous tree plantations in the tropics, whereas P fertilization has the potential to mitigate N deposition-induced N2O emissions from such plantations.

  13. Tree Scanning

    PubMed Central

    Templeton, Alan R.; Maxwell, Taylor; Posada, David; Stengård, Jari H.; Boerwinkle, Eric; Sing, Charles F.

    2005-01-01

    We use evolutionary trees of haplotypes to study phenotypic associations by exhaustively examining all possible biallelic partitions of the tree, a technique we call tree scanning. If the first scan detects significant associations, additional rounds of tree scanning are used to partition the tree into three or more allelic classes. Two worked examples are presented. The first is a reanalysis of associations between haplotypes at the Alcohol Dehydrogenase locus in Drosophila melanogaster that was previously analyzed using a nested clade analysis, a more complicated technique for using haplotype trees to detect phenotypic associations. Tree scanning and the nested clade analysis yield the same inferences when permutation testing is used with both approaches. The second example is an analysis of associations between variation in various lipid traits and genetic variation at the Apolipoprotein E (APOE) gene in three human populations. Tree scanning successfully identified phenotypic associations expected from previous analyses. Tree scanning for the most part detected more associations and provided a better biological interpretative framework than single SNP analyses. We also show how prior information can be incorporated into the tree scan by starting with the traditional three electrophoretic alleles at APOE. Tree scanning detected genetically determined phenotypic heterogeneity within all three electrophoretic allelic classes. Overall, tree scanning is a simple, powerful, and flexible method for using haplotype trees to detect phenotype/genotype associations at candidate loci. PMID:15371364

  14. Incremental hierarchical discriminant regression.

    PubMed

    Weng, Juyang; Hwang, Wey-Shiuan

    2007-03-01

    This paper presents incremental hierarchical discriminant regression (IHDR) which incrementally builds a decision tree or regression tree for very high-dimensional regression or decision spaces by an online, real-time learning system. Biologically motivated, it is an approximate computational model for automatic development of associative cortex, with both bottom-up sensory inputs and top-down motor projections. At each internal node of the IHDR tree, information in the output space is used to automatically derive the local subspace spanned by the most discriminating features. Embedded in the tree is a hierarchical probability distribution model used to prune very unlikely cases during the search. The number of parameters in the coarse-to-fine approximation is dynamic and data-driven, enabling the IHDR tree to automatically fit data with unknown distribution shapes (thus, it is difficult to select the number of parameters up front). The IHDR tree dynamically assigns long-term memory to avoid the loss-of-memory problem typical with a global-fitting learning algorithm for neural networks. A major challenge for an incrementally built tree is that the number of samples varies arbitrarily during the construction process. An incrementally updated probability model, called sample-size-dependent negative-log-likelihood (SDNLL) metric is used to deal with large sample-size cases, small sample-size cases, and unbalanced sample-size cases, measured among different internal nodes of the IHDR tree. We report experimental results for four types of data: synthetic data to visualize the behavior of the algorithms, large face image data, continuous video stream from robot navigation, and publicly available data sets that use human defined features. PMID:17385628

  15. Responses of nitrous oxide emissions to nitrogen and phosphorus additions in two tropical plantations with N-fixing vs. non-N-fixing tree species

    NASA Astrophysics Data System (ADS)

    Zhang, W.; Zhu, X.; Luo, Y.; Rafique, R.; Chen, H.; Huang, J.; Mo, J.

    2014-09-01

    Leguminous tree plantations at phosphorus (P) limited sites may result in excess nitrogen (N) and higher rates of nitrous oxide (N2O) emissions. However, the effects of N and P applications on soil N2O emissions from plantations with N-fixing vs. non-N-fixing tree species have rarely been studied in the field. We conducted an experimental manipulation of N and/or P additions in two plantations with Acacia auriculiformis (AA, N-fixing) and Eucalyptus urophylla (EU, non-N-fixing) in South China. The objective was to determine the effects of N or P addition alone, as well as NP application together on soil N2O emissions from these tropical plantations. We found that the average N2O emission from control was greater in the AA (2.3 ± 0.1 kg N2O-N ha-1 yr-1) than in EU plantation (1.9 ± 0.1 kg N2O-N ha-1 yr-1). For the AA plantation, N addition stimulated N2O emission from the soil while P addition did not. Applications of N with P together significantly decreased N2O emission compared to N addition alone, especially in the high-level treatments (decreased by 18%). In the EU plantation, N2O emissions significantly decreased in P-addition plots compared with the controls; however, N and NP additions did not. The different response of N2O emission to N or P addition was attributed to the higher initial soil N status in the AA than that of EU plantation, due to symbiotic N fixation in the former. Our result suggests that atmospheric N deposition potentially stimulates N2O emissions from leguminous tree plantations in the tropics, whereas P fertilization has the potential to mitigate N-deposition-induced N2O emissions from such plantations.

  16. Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California

    NASA Astrophysics Data System (ADS)

    Beckerman, Bernardo S.; Jerrett, Michael; Martin, Randall V.; van Donkelaar, Aaron; Ross, Zev; Burnett, Richard T.

    2013-10-01

    Land use regression (LUR) models are widely employed in health studies to characterize chronic exposure to air pollution. The LUR is essentially an interpolation technique that employs the pollutant of interest as the dependent variable with proximate land use, traffic, and physical environmental variables used as independent predictors. Two major limitations with this method have not been addressed: (1) variable selection in the model building process, and (2) dealing with unbalanced repeated measures. In this paper, we address these issues with a modeling framework that implements the deletion/substitution/addition (DSA) machine learning algorithm that uses a generalized linear model to average over unbalanced temporal observations. Models were derived for fine particulate matter with aerodynamic diameter of 2.5 microns or less (PM2.5) and nitrogen dioxide (NO2) using monthly observations. We used 4119 observations at 108 sites and 15,301 observations at 138 sites for PM2.5 and NO2, respectively. We derived models with good predictive capacity (cross-validated-R2 values were 0.65 and 0.71 for PM2.5 and NO2, respectively). By addressing these two shortcomings in current approaches to LUR modeling, we have developed a framework that minimizes arbitrary decisions during the model selection process. We have also demonstrated how to integrate temporally unbalanced data in a theoretically sound manner. These developments could have widespread applicability for future LUR modeling efforts.

  17. The integration of geophysical and enhanced Moderate Resolution Imaging Spectroradiometer Normalized Difference Vegetation Index data into a rule-based, piecewise regression-tree model to estimate cheatgrass beginning of spring growth

    USGS Publications Warehouse

    Boyte, Stephen P.; Wylie, Bruce K.; Major, Donald J.; Brown, Jesslyn F.

    2015-01-01

    Cheatgrass exhibits spatial and temporal phenological variability across the Great Basin as described by ecological models formed using remote sensing and other spatial data-sets. We developed a rule-based, piecewise regression-tree model trained on 99 points that used three data-sets – latitude, elevation, and start of season time based on remote sensing input data – to estimate cheatgrass beginning of spring growth (BOSG) in the northern Great Basin. The model was then applied to map the location and timing of cheatgrass spring growth for the entire area. The model was strong (R2 = 0.85) and predicted an average cheatgrass BOSG across the study area of 29 March–4 April. Of early cheatgrass BOSG areas, 65% occurred at elevations below 1452 m. The highest proportion of cheatgrass BOSG occurred between mid-April and late May. Predicted cheatgrass BOSG in this study matched well with previous Great Basin cheatgrass green-up studies.

  18. Photosynthetic and Growth Response of Sugar Maple (Acer saccharum Marsh.) Mature Trees and Seedlings to Calcium, Magnesium, and Nitrogen Additions in the Catskill Mountains, NY, USA

    PubMed Central

    Momen, Bahram; Behling, Shawna J.; Lawrence, Greg B.; Sullivan, Joseph H.

    2015-01-01

    Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the forest floor

  19. Photosynthetic and Growth Response of Sugar Maple (Acer saccharum Marsh.) Mature Trees and Seedlings to Calcium, Magnesium, and Nitrogen Additions in the Catskill Mountains, NY, USA.

    PubMed

    Momen, Bahram; Behling, Shawna J; Lawrence, Greg B; Sullivan, Joseph H

    2015-01-01

    Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the forest floor

  20. Photosynthetic and growth response of sugar maple (Acer saccharum Marsh.) mature trees and seedlings to calcium, magnesium, and nitrogen additions in the Catskill Mountains, NY, USA

    USGS Publications Warehouse

    Momen, Bahram; Behling, Shawna J; Lawrence, Gregory B.; Sullivan, Joseph H

    2015-01-01

    Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the

  1. Functional relationships between leaf hydraulics and leaf economic traits in response to nutrient addition in subtropical tree species.

    PubMed

    Villagra, Mariana; Campanello, Paula I; Bucci, Sandra J; Goldstein, Guillermo

    2013-12-01

    Leaves can be both a hydraulic bottleneck and a safety valve against hydraulic catastrophic dysfunctions, and thus changes in traits related to water movement in leaves and associated costs may be critical for the success of plant growth. A 4-year fertilization experiment with nitrogen (N) and phosphorus (P) addition was done in a semideciduous Atlantic forest in northeastern Argentina. Saplings of five dominant canopy species were grown in similar gaps inside the forests (five control and five N + P addition plots). Leaf lifespan (LL), leaf mass per unit area (LMA), leaf and stem vulnerability to cavitation, leaf hydraulic conductance (K(leaf_area) and K(leaf_mass)) and leaf turgor loss point (TLP) were measured in the five species and in both treatments. Leaf lifespan tended to decrease with the addition of fertilizers, and LMA was significantly higher in plants with nutrient addition compared with individuals in control plots. The vulnerability to cavitation of leaves (P50(leaf)) either increased or decreased with the nutrient treatment depending on the species, but the average P50(leaf) did not change with nutrient addition. The P50(leaf) decreased linearly with increasing LMA and LL across species and treatments. These trade-offs have an important functional significance because more expensive (higher LMA) and less vulnerable leaves (lower P50(leaf)) are retained for a longer period of time. Osmotic potentials at TLP and at full turgor became more negative with decreasing P50(leaf) regardless of nutrient treatment. The K(leaf) on a mass basis was negatively correlated with LMA and LL, indicating that there is a carbon cost associated with increased water transport that is compensated by a longer LL. The vulnerability to cavitation of stems and leaves were similar, particularly in fertilized plants. Leaves in the species studied may not function as safety valves at low water potentials to protect the hydraulic pathway from water stress-induced cavitation

  2. Interactions between CO2 enhancement and N addition on net primary productivity and water-use efficiency in a mesocosm with multiple subtropical tree species.

    PubMed

    Yan, Junhua; Zhang, Deqiang; Liu, Juxiu; Zhou, Guoyi

    2014-07-01

    Carbon dioxide (CO2 ) enhancement (eCO2 ) and N addition (aN) have been shown to increase net primary production (NPP) and to affect water-use efficiency (WUE) for many temperate ecosystems, but few studies have been made on subtropical tree species. This study compared the responses of NPP and WUE from a mesocosm composing five subtropical tree species to eCO2 (700 ppm), aN (10 g N m(-2) yr(-1) ) and eCO2 × aN using open-top chambers. Our results showed that mean annual ecosystem NPP did not changed significantly under eCO2 , increased by 56% under aN and 64% under eCO2 × aN. Ecosystem WUE increased by 14%, 55%, and 61% under eCO2 , aN and eCO2 × aN, respectively. We found that the observed responses of ecosystem WUE were largely driven by the responses of ecosystem NPP. Statistical analysis showed that there was no significant interactions between eCO2 and aN on ecosystem NPP (P = 0.731) or WUE (P = 0.442). Our results showed that increasing N deposition was likely to have much stronger effects on ecosystem NPP and WUE than increasing CO2 concentration for the subtropical forests. However, different tree species responded quite differently. aN significantly increased annual NPP of the fast-growing species (Schima superba). Nitrogen-fixing species (Ormosia pinnata) grew significantly faster only under eCO2 × aN. eCO2 had no effects on annual NPP of those two species but significantly increased annual NPP of other two species (Castanopsis hystrix and Acmena acuminatissima). Differential responses of the NPP among different tree species to eCO2 and aN will likely have significant implications on the species composition of subtropical forests under future global change. PMID:24339232

  3. Logistic Regression

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.

  4. The Application of Classification and Regression Trees for the Triage of Women for Referral to Colposcopy and the Estimation of Risk for Cervical Intraepithelial Neoplasia: A Study Based on 1625 Cases with Incomplete Data from Molecular Tests

    PubMed Central

    Pouliakis, Abraham; Karakitsou, Efrossyni; Chrelias, Charalampos; Pappas, Asimakis; Panayiotides, Ioannis; Valasoulis, George; Kyrgiou, Maria; Paraskevaidis, Evangelos; Karakitsos, Petros

    2015-01-01

    Objective. Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs) for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN) in cases with ASCUS+ in cytology. Study Design. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Results. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. Conclusions. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study. PMID:26339651

  5. Robust Regression.

    PubMed

    Huang, Dong; Cabral, Ricardo; De la Torre, Fernando

    2016-02-01

    Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740

  6. Morse-Smale Regression

    PubMed Central

    Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.

    2012-01-01

    This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424

  7. Morse–Smale Regression

    SciTech Connect

    Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.

    2012-01-19

    This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.

  8. Under which conditions, additional monitoring data are worth gathering for improving decision making? Application of the VOI theory in the Bayesian Event Tree eruption forecasting framework

    NASA Astrophysics Data System (ADS)

    Loschetter, Annick; Rohmer, Jérémy

    2016-04-01

    Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered

  9. Boosted Beta Regression

    PubMed Central

    Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas

    2013-01-01

    Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706

  10. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed

  11. The Regression Trunk Approach to Discover Treatment Covariate Interaction

    ERIC Educational Resources Information Center

    Dusseldorp, Elise; Meulman, Jacqueline J.

    2004-01-01

    The regression trunk approach (RTA) is an integration of regression trees and multiple linear regression analysis. In this paper RTA is used to discover treatment covariate interactions, in the regression of one continuous variable on a treatment variable with "multiple" covariates. The performance of RTA is compared to the classical method of…

  12. Data Mining within a Regression Framework

    NASA Astrophysics Data System (ADS)

    Berk, Richard A.

    Regression analysis can imply a far wider range of statistical procedures than often appreciated. In this chapter, a number of common Data Mining procedures are discussed within a regression framework. These include non-parametric smoothers, classification and regression trees, bagging, and random forests. In each case, the goal is to characterize one or more of the distributional features of a response conditional on a set of predictors.

  13. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  14. Atlas of relations between climatic parameters and distributions of important trees and shrubs in North America; additional conifers, hardwoods, and monocots

    USGS Publications Warehouse

    Thompson, Robert S.; Anderson, Katherine H.; Bartlein, Patrick J.; Smith, Sharon A.

    2000-01-01

    This volume explores the continental-scale relations between climate and the geographic ranges of woody plant species in North America. A 25-km equal-area grid of modern climatic and bioclimatic parameters for North America was constructed from instrumental weather records. The geographic distributions of selected tree and shrub species were digitized, and the presence or absence of each species was determined for each cell on the 25-km grid, thus providing a basis for comparing climatic data and species' distribution.

  15. Regression: A Bibliography.

    ERIC Educational Resources Information Center

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  16. Our Air: Unfit for Trees.

    ERIC Educational Resources Information Center

    Dochinger, Leon S.

    To help urban, suburban, and rural tree owners know about air pollution's effects on trees and their tolerance and intolerance to pollutants, the USDA Forest Service has prepared this booklet. It answers the following questions about atmospheric pollution: Where does it come from? What can it do to trees? and What can we do about it? In addition,…

  17. Tree Lifecycle.

    ERIC Educational Resources Information Center

    Nature Study, 1998

    1998-01-01

    Presents a Project Learning Tree (PLT) activity that has students investigate and compare the lifecycle of a tree to other living things and the tree's role in the ecosystem. Includes background material as well as step-by-step instructions, variation and enrichment ideas, assessment opportunities, and student worksheets. (SJR)

  18. Kernel Continuum Regression.

    PubMed

    Lee, Myung Hee; Liu, Yufeng

    2013-12-01

    The continuum regression technique provides an appealing regression framework connecting ordinary least squares, partial least squares and principal component regression in one family. It offers some insight on the underlying regression model for a given application. Moreover, it helps to provide deep understanding of various regression techniques. Despite the useful framework, however, the current development on continuum regression is only for linear regression. In many applications, nonlinear regression is necessary. The extension of continuum regression from linear models to nonlinear models using kernel learning is considered. The proposed kernel continuum regression technique is quite general and can handle very flexible regression model estimation. An efficient algorithm is developed for fast implementation. Numerical examples have demonstrated the usefulness of the proposed technique. PMID:24058224

  19. Predictive Classification Trees

    NASA Astrophysics Data System (ADS)

    Dlugosz, Stephan; Müller-Funk, Ulrich

    CART (Breiman et al., Classification and Regression Trees, Chapman and Hall, New York, 1984) and (exhaustive) CHAID (Kass, Appl Stat 29:119-127, 1980) figure prominently among the procedures actually used in data based management, etc. CART is a well-established procedure that produces binary trees. CHAID, in contrast, admits multiple splittings, a feature that allows to exploit the splitting variable more extensively. On the other hand, that procedure depends on premises that are questionable in practical applications. This can be put down to the fact that CHAID relies on simultaneous Chi-Square- resp. F-tests. The null-distribution of the second test statistic, for instance, relies on the normality assumption that is not plausible in a data mining context. Moreover, none of these procedures - as implemented in SPSS, for instance - take ordinal dependent variables into account. In the paper we suggest an alternative tree-algorithm that: Requires explanatory categorical variables

  20. Wrong Signs in Regression Coefficients

    NASA Technical Reports Server (NTRS)

    McGee, Holly

    1999-01-01

    When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.

  1. Assessing visual green effects of individual urban trees using airborne Lidar data.

    PubMed

    Chen, Ziyue; Xu, Bing; Gao, Bingbo

    2015-12-01

    Urban trees benefit people's daily life in terms of air quality, local climate, recreation and aesthetics. Among these functions, a growing number of studies have been conducted to understand the relationship between residents' preference towards local environments and visual green effects of urban greenery. However, except for on-site photography, there are few quantitative methods to calculate green visibility, especially tree green visibility, from viewers' perspectives. To fill this research gap, a case study was conducted in the city of Cambridge, which has a diversity of tree species, sizes and shapes. Firstly, a photograph-based survey was conducted to approximate the actual value of visual green effects of individual urban trees. In addition, small footprint airborne Lidar (Light detection and ranging) data was employed to measure the size and shape of individual trees. Next, correlations between visual tree green effects and tree structural parameters were examined. Through experiments and gradual refinement, a regression model with satisfactory R2 and limited large errors is proposed. Considering the diversity of sample trees and the result of cross-validation, this model has the potential to be applied to other study sites. This research provides urban planners and decision makers with an innovative method to analyse and evaluate landscape patterns in terms of tree greenness. PMID:26218562

  2. Abstract Expression Grammar Symbolic Regression

    NASA Astrophysics Data System (ADS)

    Korns, Michael F.

    This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.

  3. Time-Warped Geodesic Regression

    PubMed Central

    Hong, Yi; Singh, Nikhil; Kwitt, Roland; Niethammer, Marc

    2016-01-01

    We consider geodesic regression with parametric time-warps. This allows, for example, to capture saturation effects as typically observed during brain development or degeneration. While highly-flexible models to analyze time-varying image and shape data based on generalizations of splines and polynomials have been proposed recently, they come at the cost of substantially more complex inference. Our focus in this paper is therefore to keep the model and its inference as simple as possible while allowing to capture expected biological variation. We demonstrate that by augmenting geodesic regression with parametric time-warp functions, we can achieve comparable flexibility to more complex models while retaining model simplicity. In addition, the time-warp parameters provide useful information of underlying anatomical changes as demonstrated for the analysis of corpora callosa and rat calvariae. We exemplify our strategy for shape regression on the Grassmann manifold, but note that the method is generally applicable for time-warped geodesic regression. PMID:25485368

  4. Talking Trees

    ERIC Educational Resources Information Center

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  5. Tree Amigos.

    ERIC Educational Resources Information Center

    Center for Environmental Study, Grand Rapids, MI.

    Tree Amigos is a special cross-cultural program that uses trees as a common bond to bring the people of the Americas together in unique partnerships to preserve and protect the shared global environment. It is a tangible program that embodies the philosophy that individuals, acting together, can make a difference. This resource book contains…

  6. Phylogenetic trees in bioinformatics

    SciTech Connect

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  7. Bayesian Evidence Framework for Decision Tree Learning

    NASA Astrophysics Data System (ADS)

    Chatpatanasiri, Ratthachat; Kijsirikul, Boonserm

    2005-11-01

    This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure `the fitness' for each decision tree given a set of observed data. Our method, in fact, is analogous to a well-known Bayesian model selection method for interpolating noisy continuous-value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham's razor, and hence reasonably deals with the issue of underfitting-overfitting tradeoff.

  8. Multiple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050

  9. NCCS Regression Test Harness

    2015-09-09

    The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.

  10. Orthogonal Regression and Equivariance.

    ERIC Educational Resources Information Center

    Blankmeyer, Eric

    Ordinary least-squares regression treats the variables asymmetrically, designating a dependent variable and one or more independent variables. When it is not obvious how to make this distinction, a researcher may prefer to use orthogonal regression, which treats the variables symmetrically. However, the usual procedure for orthogonal regression is…

  11. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  12. Rate of tree carbon accumulation increases continuously with tree size.

    PubMed

    Stephenson, N L; Das, A J; Condit, R; Russo, S E; Baker, P J; Beckman, N G; Coomes, D A; Lines, E R; Morris, W K; Rüger, N; Alvarez, E; Blundo, C; Bunyavejchewin, S; Chuyong, G; Davies, S J; Duque, A; Ewango, C N; Flores, O; Franklin, J F; Grau, H R; Hao, Z; Harmon, M E; Hubbell, S P; Kenfack, D; Lin, Y; Makana, J-R; Malizia, A; Malizia, L R; Pabst, R J; Pongpattananurak, N; Su, S-H; Sun, I-F; Tan, S; Thomas, D; van Mantgem, P J; Wang, X; Wiser, S K; Zavala, M A

    2014-03-01

    Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle--particularly net primary productivity and carbon storage--increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree's total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to undertand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence. PMID:24429523

  13. Rate of tree carbon accumulation increases continuously with tree size

    NASA Astrophysics Data System (ADS)

    Stephenson, N. L.; Das, A. J.; Condit, R.; Russo, S. E.; Baker, P. J.; Beckman, N. G.; Coomes, D. A.; Lines, E. R.; Morris, W. K.; Rüger, N.; Álvarez, E.; Blundo, C.; Bunyavejchewin, S.; Chuyong, G.; Davies, S. J.; Duque, Á.; Ewango, C. N.; Flores, O.; Franklin, J. F.; Grau, H. R.; Hao, Z.; Harmon, M. E.; Hubbell, S. P.; Kenfack, D.; Lin, Y.; Makana, J.-R.; Malizia, A.; Malizia, L. R.; Pabst, R. J.; Pongpattananurak, N.; Su, S.-H.; Sun, I.-F.; Tan, S.; Thomas, D.; van Mantgem, P. J.; Wang, X.; Wiser, S. K.; Zavala, M. A.

    2014-03-01

    Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle--particularly net primary productivity and carbon storage--increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree's total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to undertand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.

  14. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  15. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-02-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  16. Classificação geométrica de galáxias bianeladas através do metódo CART (Classification And Regression Trees)

    NASA Astrophysics Data System (ADS)

    Ormeño, M. I.; Faúndez-Abans, M.; Cavada, G.

    2003-08-01

    A importância deste trabalho deve-se à seleção de objetos ainda não tratados particularmente como uma família e ao emprego de procedimento estatístico robusto que não precisa de pressupostos ou condições de contorno. Contribui, assim, ao melhor entendimento do cenário das Galáxias Aneladas do diagrama de Hubble via classificação e estudo de subclasses. Selecionaram-se 100 galáxias possuidoras de dois anéis do Catalog of Southern Ringed Galaxies compilado por Ronald Buta, de modo a construir uma amostra completa em termos de conhecimento dos semi-eixos dos anéis interno e externo projetados no plano do céu. Visando uma possível classificação destas galáxias aneladas normais em famílias de acordo com as características geométricas dos anéis, empregou-se primeiramente a Análise de Aglomerados (ferramenta de classificação: medições de semelhança em um espaço bidimensional) para explorar a possível existência de famílias. As variáveis analisadas foram: os diâmetros interiores menores d(I) e maiores D(I), os diâmetros exteriores menores d(E) e maiores D(E), e os ângulos de inclinação dos semi-eixos maiores interiores q(I) e exteriores q(E) dos anéis. Como metodologia de discriminação, empregou-se a construção de Árvores de Classificação. As árvores de classificação constituem um método de discriminação alternativo aos modelos clássicos, tais como a Análise Discriminante e a Regressão Logística, onde uma base de dados é dividida em partições (subgrupos) da árvore por ação de um predictor (variável específica). Os pacotes estatísticos utilizados para o processamento da informação foram: SAS versão 8.0 (Statistical Analisys System) e CART versão 3.6.3. Esta análise estatística sugere a existência de três possíveis famílias de galáxias bianeladas, com base apenas na geometria dos anéis. Como forma exploratória inicial deste resultado, a construção de um diagrama BT (magnitude total) versus o

  17. Decision tree modeling using R

    PubMed Central

    2016-01-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769

  18. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769

  19. Rate of tree carbon accumulation increases continuously with tree size

    USGS Publications Warehouse

    Stephenson, N.L.; Das, A.J.; Condit, R.; Russo, S.E.; Baker, P.J.; Beckman, N.G.; Coomes, D.A.; Lines, E.R.; Morris, W.K.; Rüger, N.; Álvarez, E.; Blundo, C.; Bunyavejchewin, S.; Chuyong, G.; Davies, S.J.; Duque, Á.; Ewango, C.N.; Flores, O.; Franklin, J.F.; Grau, H.R.; Hao, Z.; Harmon, M.E.; Hubbell, S.P.; Kenfack, D.; Lin, Y.; Makana, J.-R.; Malizia, A.; Malizia, L.R.; Pabst, R.J.; Pongpattananurak, N.; Su, S.-H.; Sun, I-F.; Tan, S.; Thomas, D.; van Mantgem, P.J.; Wang, X.; Wiser, S.K.; Zavala, M.A.

    2014-01-01

    Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle—particularly net primary productivity and carbon storage—increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree’s total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to understand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.

  20. The fault-tree compiler

    NASA Technical Reports Server (NTRS)

    Martensen, Anna L.; Butler, Ricky W.

    1987-01-01

    The Fault Tree Compiler Program is a new reliability tool used to predict the top event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N gates. The high level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precise (within the limits of double precision floating point arithmetic) to the five digits in the answer. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Corporation VAX with the VMS operation system.

  1. Prediction in Multiple Regression.

    ERIC Educational Resources Information Center

    Osborne, Jason W.

    2000-01-01

    Presents the concept of prediction via multiple regression (MR) and discusses the assumptions underlying multiple regression analyses. Also discusses shrinkage, cross-validation, and double cross-validation of prediction equations and describes how to calculate confidence intervals around individual predictions. (SLD)

  2. Improved Regression Calibration

    ERIC Educational Resources Information Center

    Skrondal, Anders; Kuha, Jouni

    2012-01-01

    The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…

  3. Greenhouse trees

    SciTech Connect

    Hanover, J.W.; Hart, J.W.

    1980-05-09

    Michigan State University has been conducting research on growth control of woody plants with emphasis on commercial plantations. The objective was to develop the optimum levels for the major factors that affect tree seedling growth and development so that high quality plants can be produced for a specific use. This article describes the accelerated-optimal-growth (AOG) concept, describes precautions to take in its application, and shows ways to maximize the potential of AOG for producing ornamental trees. Factors considered were container growing system; protective culture including light, temperature, mineral nutrients, water, carbon dioxide, growth regulators, mycorrhizae, growing media, competition, and pests; size of seedlings; and acclamation. 1 table. (DP)

  4. Audubon Tree Study Program.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    Included are an illustrated student reader, "The Story of Trees," a leaders' guide, and a large tree chart with 37 colored pictures. The student reader reviews several aspects of trees: a definition of a tree; where and how trees grow; flowers, pollination and seed production; how trees make their food; how to recognize trees; seasonal changes;…

  5. Visualizing phylogenetic trees using TreeView.

    PubMed

    Page, Roderic D M

    2002-08-01

    TreeView provides a simple way to view the phylogenetic trees produced by a range of programs, such as PAUP*, PHYLIP, TREE-PUZZLE, and ClustalX. While some phylogenetic programs (such as the Macintosh version of PAUP*) have excellent tree printing facilities, many programs do not have the ability to generate publication quality trees. TreeView addresses this need. The program can read and write a range of tree file formats, display trees in a variety of styles, print trees, and save the tree as a graphic file. Protocols in this unit cover both displaying and printing a tree. Support protocols describe how to download and install TreeView, and how to display bootstrap values in trees generated by ClustalX and PAUP*. PMID:18792942

  6. Impact of gene family evolutionary histories on phylogenetic species tree inference by gene tree parsimony.

    PubMed

    Shi, Tao

    2016-03-01

    Complicated history of gene duplication and loss brings challenge to molecular phylogenetic inference, especially in deep phylogenies. However, phylogenomic approaches, such as gene tree parsimony (GTP), show advantage over some other approaches in its ability to use gene families with duplications. GTP searches the 'optimal' species tree by minimizing the total cost of biological events such as duplications, but accuracy of GTP and phylogenetic signal in the context of different gene families with distinct histories of duplication and loss are unclear. To evaluate how different evolutionary properties of different gene families can impact on species tree inference, 3900 gene families from seven angiosperms encompassing a wide range of gene content, lineage-specific expansions and contractions were analyzed. It was found that the gene content and total duplication number in a gene family strongly influence species tree inference accuracy, with the highest accuracy achieved at either very low or very high gene content (or duplication number) and lowest accuracy centered in intermediate gene content (or duplication number), as the relationship can fit a binomial regression. Besides, for gene families of similar level of average gene content, those with relatively higher lineage-specific expansion or duplication rates tend to show lower accuracy. Additional correlation tests support that high accuracy for those gene families with large gene content may rely on abundant ancestral copies to provide many subtrees to resolve conflicts, whereas high accuracy for single or low copy gene families are just subject to sequence substitution per se. Very low accuracy reached by gene families of intermediate gene content or duplication number can be due to insufficient subtrees to resolve the conflicts from loss of alternative copies. As these evolutionary properties can significantly influence species tree accuracy, I discussed the potential weighting of the duplication cost by

  7. On Tree-Based Phylogenetic Networks.

    PubMed

    Zhang, Louxin

    2016-07-01

    A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model. PMID:27228397

  8. Factors Governing Stemflow Production from Plantation Grown Teak Trees in Thailand

    NASA Astrophysics Data System (ADS)

    Tanaka, N.; Levia, D. F., Jr.; Igarashi, Y.; Yoshifuji, N.; Tanaka, K.; Chatchai, T.; Nanko, K.; Suzuki, M.; Kumagai, T.

    2015-12-01

    Stemflow (SF) is recognized as an important process delivering water, solute, and particulate fluxes to spatially localized areas of the forest floor. Using both long-term SF data from nine even-aged deciduous teak trees grown in the same plantation and meteorological data from a nearby tower, this study seeks to better understand how: (1) specific biotic and abiotic factors control stand-scale SF production of teak; and (2) various biotic and abiotic factors affect tree-to-tree variations in teak SF production. A conventional regression analysis of SF volume against rainfall indicates that, for five individuals among the nine, SF was more efficiently produced in the leafless than in the leafed. However, for the other individuals, there was no such a relation, suggesting tree-to-tree variation in the response of SF to canopy status. A boosted regression tree (BRT) analysis setting daily basis SF funneling ratios (SFF) of the nine trees as dependent variables, indicates that SFF was intricately controlled by a variety of biotic and abiotic factors. The top six influential factors were, in descending order, rainfall duration, tree height, rainfall intensity, air temperature, wind speed, and antecedent dry period length having positive, negative, positive, negative, positive, and negative influence on SFF, respectively. Although teak exhibits drastic intra-annual changes in leaf phenology, leaf area index (LAI) had an unexpectedly small influence on SFF on a stand scale. Additional BRT analyses focusing on individuals with the maximum and the minimum SFF values (among the nine individuals) showed that there was considerable tree-to-tree variation in an array of the influential variables for SFF, even though they were planted in the same year and grown in the same plot. In addition to this difference, the BRT analyses also showed that response of SFF to LAI differs between the two individuals. The differentiating responses to LAI depending on individuals may be the

  9. George: Gaussian Process regression

    NASA Astrophysics Data System (ADS)

    Foreman-Mackey, Daniel

    2015-11-01

    George is a fast and flexible library, implemented in C++ with Python bindings, for Gaussian Process regression useful for accounting for correlated noise in astronomical datasets, including those for transiting exoplanet discovery and characterization and stellar population modeling.

  10. Multivariate Regression with Calibration*

    PubMed Central

    Liu, Han; Wang, Lie; Zhao, Tuo

    2014-01-01

    We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite-sample performance. Computationally, we develop an efficient smoothed proximal gradient algorithm which has a worst-case iteration complexity O(1/ε), where ε is a pre-specified numerical accuracy. Theoretically, we prove that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR is as competitive as the handcrafted model created by human experts. PMID:25620861

  11. Image segmentation via piecewise constant regression

    NASA Astrophysics Data System (ADS)

    Acton, Scott T.; Bovik, Alan C.

    1994-09-01

    We introduce a novel unsupervised image segmentation technique that is based on piecewise constant (PICO) regression. Given an input image, a PICO output image for a specified feature size (scale) is computed via nonlinear regression. The regression effectively provides the constant region segmentation of the input image that has a minimum deviation from the input image. PICO regression-based segmentation avoids the problems of region merging, poor localization, region boundary ambiguity, and region fragmentation. Additionally, our segmentation method is particularly well-suited for corrupted (noisy) input data. An application to segmentation and classification of remotely sensed imagery is provided.

  12. Regression based modeling of vegetation and climate variables for the Amazon rainforests

    NASA Astrophysics Data System (ADS)

    Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.

    2015-12-01

    Both short-term (weather) and long-term (climate) variations in the atmosphere directly impact various ecosystems on earth. Forest ecosystems, especially tropical forests, are crucial as they are the largest reserves of terrestrial carbon sink. For example, the Amazon forests are a critical component of global carbon cycle storing about 100 billion tons of carbon in its woody biomass. There is a growing concern that these forests could succumb to precipitation reduction in a progressively warming climate, leading to release of significant amount of carbon in the atmosphere. Therefore, there is a need to accurately quantify the dependence of vegetation growth on different climate variables and obtain better estimates of drought-induced changes to atmospheric CO2. The availability of globally consistent climate and earth observation datasets have allowed global scale monitoring of various climate and vegetation variables such as precipitation, radiation, surface greenness, etc. Using these diverse datasets, we aim to quantify the magnitude and extent of ecosystem exposure, sensitivity and resilience to droughts in forests. The Amazon rainforests have undergone severe droughts twice in last decade (2005 and 2010), which makes them an ideal candidate for the regional scale analysis. Current studies on vegetation and climate relationships have mostly explored linear dependence due to computational and domain knowledge constraints. We explore a modeling technique called symbolic regression based on evolutionary computation that allows discovery of the dependency structure without any prior assumptions. In symbolic regression the population of possible solutions is defined via trees structures. Each tree represents a mathematical expression that includes pre-defined functions (mathematical operators) and terminal sets (independent variables from data). Selection of these sets is critical to computational efficiency and model accuracy. In this work we investigate

  13. Cascades of Regression Tree Fields for Image Restoration.

    PubMed

    Schmidt, Uwe; Jancsary, Jeremy; Nowozin, Sebastian; Roth, Stefan; Rother, Carsten

    2016-04-01

    Conditional random fields (CRFs) are popular discriminative models for computer vision and have been successfully applied in the domain of image restoration, especially to image denoising. For image deblurring, however, discriminative approaches have been mostly lacking. We posit two reasons for this: First, the blur kernel is often only known at test time, requiring any discriminative approach to cope with considerable variability. Second, given this variability it is quite difficult to construct suitable features for discriminative prediction. To address these challenges we first show a connection between common half-quadratic inference for generative image priors and Gaussian CRFs. Based on this analysis, we then propose a cascade model for image restoration that consists of a Gaussian CRF at each stage. Each stage of our cascade is semi-parametric, i.e., it depends on the instance-specific parameters of the restoration problem, such as the blur kernel. We train our model by loss minimization with synthetically generated training data. Our experiments show that when applied to non-blind image deblurring, the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur. Moreover, we demonstrate its suitability for image denoising, where we achieve competitive results for grayscale and color images. PMID:26959673

  14. Current and Potential Tree Locations in Tree Line Ecotone of Changbai Mountains, Northeast China: The Controlling Effects of Topography

    PubMed Central

    Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei

    2014-01-01

    Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone. PMID:25170918

  15. Regression versus No Regression in the Autistic Disorder: Developmental Trajectories

    ERIC Educational Resources Information Center

    Bernabei, P.; Cerquiglini, A.; Cortesi, F.; D' Ardia, C.

    2007-01-01

    Developmental regression is a complex phenomenon which occurs in 20-49% of the autistic population. Aim of the study was to assess possible differences in the development of regressed and non-regressed autistic preschoolers. We longitudinally studied 40 autistic children (18 regressed, 22 non-regressed) aged 2-6 years. The following developmental…

  16. Technical Tree Climbing.

    ERIC Educational Resources Information Center

    Jenkins, Peter

    Tree climbing offers a safe, inexpensive adventure sport that can be performed almost anywhere. Using standard procedures practiced in tree surgery or rock climbing, almost any tree can be climbed. Tree climbing provides challenge and adventure as well as a vigorous upper-body workout. Tree Climbers International classifies trees using a system…

  17. Modelling of filariasis in East Java with Poisson regression and generalized Poisson regression models

    NASA Astrophysics Data System (ADS)

    Darnah

    2016-04-01

    Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.

  18. Practical Session: Logistic Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.

  19. Tree thinning as an option to increase herbaceous yield of an encroached semi-arid savanna in South Africa

    PubMed Central

    Smit, Gert N

    2005-01-01

    Background The investigation was conducted in a savanna area covered by what was considered an undesirably dense stand of Colophospermum mopane trees, mainly because such a dense stand of trees often results in the suppression of herbaceous plants. The objectives of this study were to determine the influence of intensity of tree thinning on the dry matter yield of herbaceous plants (notably grasses) and to investigate differences in herbaceous species composition between defined subhabitats (under tree canopies, between tree canopies and where trees have been removed). Seven plots (65 × 180 m) were subjected to different intensities of tree thinning, ranging from a totally cleared plot (0 %) to plots thinned to the equivalent of 10 %, 20%, 35 %, 50% and 75 % of the leaf biomass of a control plot (100 %) with a tree density of 2711 plants ha-1. The establishment of herbaceous plants (grasses and forbs) in response to reduced competition from the woody plants was measured during three full growing seasons following the thinning treatments. Results The grass component reacted positively to the tree thinning in terms of total dry matter (DM) yield, but forbs were negatively influenced. Rainfall interacted with tree density and the differences between grass DM yields in thinned plots during years of below average rainfall were substantially higher than those of the control. At high tree densities, yields differed little between seasons of varying rainfall. The relation between grass DM yield and tree biomass was curvilinear, best described by the exponential regression equation. Subhabitat differentiation by C. mopane trees did provide some qualitative benefits, with certain desirable grass species showing a preference for the subhabitat under tree canopies. Conclusion While it can be concluded from this study that high tree densities suppress herbaceous production, the decision to clear/thin the C. mopane trees should include additional considerations. Thinning of C

  20. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  1. The gene tree delusion.

    PubMed

    Springer, Mark S; Gatesy, John

    2016-01-01

    Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are

  2. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    PubMed

    Chen, Shu-Chuan; Ogata, Aaron

    2015-01-01

    The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process. PMID:25826378

  3. Understanding Boswellia papyrifera tree secondary metabolites through bark spectral analysis

    NASA Astrophysics Data System (ADS)

    Girma, Atkilt; Skidmore, Andrew K.; de Bie, C. A. J. M.; Bongers, Frans

    2015-07-01

    Decision makers are concerned whether to tap or rest Boswellia Papyrifera trees. Tapping for the production of frankincense is known to deplete carbon reserves from the tree leading to production of less viable seeds, tree carbon starvation and ultimately tree mortality. Decision makers use traditional experience without considering the amount of metabolites stored or depleted from the stem-bark of the tree. This research was designed to come up with a non-destructive B. papyrifera tree metabolite estimation technique relevant for management using spectroscopy. The concentration of biochemicals (metabolites) found in the tree bark was estimated through spectral analysis. Initially, a random sample of 33 trees was selected, the spectra of bark measured with an Analytical Spectral Device (ASD) spectrometer. Bark samples were air dried and ground. Then, 10 g of sample was soaked in Petroleum ether to extract crude metabolites. Further chemical analysis was conducted to quantify and isolate pure metabolite compounds such as incensole acetate and boswellic acid. The crude metabolites, which relate to frankincense produce, were compared to plant properties (such as diameter and crown area) and reflectance spectra of the bark. Moreover, the extract was compared to the ASD spectra using partial least square regression technique (PLSR) and continuum removed spectral analysis. The continuum removed spectral analysis were performed, on two wavelength regions (1275-1663 and 1836-2217) identified through PLSR, using absorption features such as band depth, area, position, asymmetry and the width to characterize and find relationship with the bark extracts. The results show that tree properties such as diameter at breast height (DBH) and the crown area of untapped and healthy trees were strongly correlated to the amount of stored crude metabolites. In addition, the PLSR technique applied to the first derivative transformation of the reflectance spectrum was found to estimate the

  4. Explorations in Statistics: Regression

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2011-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive connection.…

  5. Modern Regression Discontinuity Analysis

    ERIC Educational Resources Information Center

    Bloom, Howard S.

    2012-01-01

    This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…

  6. CORRELATION AND REGRESSION

    EPA Science Inventory

    Webcast entitled Statistical Tools for Making Sense of Data, by the National Nutrient Criteria Support Center, N-STEPS (Nutrients-Scientific Technical Exchange Partnership. The section "Correlation and Regression" provides an overview of these two techniques in the context of nut...

  7. Multiple linear regression analysis

    NASA Technical Reports Server (NTRS)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  8. Partial covariate adjusted regression

    PubMed Central

    Şentürk, Damla; Nguyen, Danh V.

    2008-01-01

    Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (Şentürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regression (PCAR) models to accommodate both distorted and undistorted (adjusted/unadjusted) predictors. The PCAR model allows for unadjusted predictors, such as age, gender and demographic variables, which are common in the analysis of biomedical and epidemiological data. The available estimation and inference procedures for CAR are shown to be invalid for the proposed PCAR model. We propose new estimators and develop new inference tools for the more general PCAR setting. In particular, we establish the asymptotic normality of the proposed estimators and propose consistent estimators of their asymptotic variances. Finite sample properties of the proposed estimators are investigated using simulation studies and the method is also illustrated with a Pima Indians diabetes data set. PMID:20126296

  9. Mechanisms of neuroblastoma regression

    PubMed Central

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  10. Bayesian ARTMAP for regression.

    PubMed

    Sasu, L M; Andonie, R

    2013-10-01

    Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. PMID:23665468

  11. Atlas of United States Trees, Volume 2: Alaska Trees and Common Shrubs.

    ERIC Educational Resources Information Center

    Viereck, Leslie A.; Little, Elbert L., Jr.

    This volume is the second in a series of atlases describing the natural distribution or range of native tree species in the United States. The 82 species maps include 32 of trees in Alaska, 6 of shrubs rarely reaching tree size, and 44 more of common shrubs. More than 20 additional maps summarize environmental factors and furnish general…

  12. Ridge Regression: A Regression Procedure for Analyzing Correlated Independent Variables.

    ERIC Educational Resources Information Center

    Rakow, Ernest A.

    Ridge regression is presented as an analytic technique to be used when predictor variables in a multiple linear regression situation are highly correlated, a situation which may result in unstable regression coefficients and difficulties in interpretation. Ridge regression avoids the problem of selection of variables that may occur in stepwise…

  13. Classification and concentration estimation of explosive precursors using nanowires sensor array and decision tree learning

    NASA Astrophysics Data System (ADS)

    Cho, Junghwan; Li, Xiaopeng; Gu, Zhiyong; Kurup, Pradeep

    2011-09-01

    This paper aims to classify and estimate concentrations of explosive precursors using a nanowire sensor array and decision tree learning algorithm. The nanowire sensor array consists of tin oxide sensors with four different additives, platinum (Pt), copper (Cu), indium (In), and nickel (Ni). The nanowire sensor array was tested using the vapors from four explosives precursors, acetone, nitrobenzene, nitrotoluene, and octane with 10 different concentration levels each. A pattern recognition technique based on decision tree learning was applied to classify the explosive precursors and estimate their concentration. Classification and regression tree (CART) analysis was used for classification. The CART was also utilized for the purpose of structure identification in Sugeno fuzzy inference system (FIS) for estimating the concentration of the precursors. Two CARTs were trained and their testing results were investigated.

  14. Ridge Regression Signal Processing

    NASA Technical Reports Server (NTRS)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  15. Fast Censored Linear Regression

    PubMed Central

    HUANG, YIJIAN

    2013-01-01

    Weighted log-rank estimating function has become a standard estimation method for the censored linear regression model, or the accelerated failure time model. Well established statistically, the estimator defined as a consistent root has, however, rather poor computational properties because the estimating function is neither continuous nor, in general, monotone. We propose a computationally efficient estimator through an asymptotics-guided Newton algorithm, in which censored quantile regression methods are tailored to yield an initial consistent estimate and a consistent derivative estimate of the limiting estimating function. We also develop fast interval estimation with a new proposal for sandwich variance estimation. The proposed estimator is asymptotically equivalent to the consistent root estimator and barely distinguishable in samples of practical size. However, computation time is typically reduced by two to three orders of magnitude for point estimation alone. Illustrations with clinical applications are provided. PMID:24347802

  16. The Tree Worker's Manual.

    ERIC Educational Resources Information Center

    Smithyman, S. J.

    This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…

  17. Women, land, and trees.

    PubMed

    1999-07-01

    This article discusses women's land rights in the context of the findings of the paper, "Women's Land Rights in the Transition to Individualized Ownership: Implications for Tree Resource Management in Western Ghana." The study showed that customary land tenure institutions have evolved toward individualized systems, which provide incentives to invest in tree planting. In effect, individualization of land tenure had strengthened women's land rights through inter vivos gifts. However, transferring of land ownership to women is unlikely to raise productivity if access to and use of other inputs remains unequal. This suggests that attempts to equalize land rights of men and women are unlikely to lead to gender equity and improved efficiency and productivity of women farmers unless other constraints faced by women are also addressed. The article also documents comments, suggestions, and recommendations in response to the summary of the paper. In addition, the different practices of guaranteeing land ownership for women in some countries of Africa are presented. PMID:12295514

  18. Orthogonal Regression: A Teaching Perspective

    ERIC Educational Resources Information Center

    Carr, James R.

    2012-01-01

    A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…

  19. Environmental conditions for alternative tree cover states in high latitudes

    NASA Astrophysics Data System (ADS)

    Abis, Beniamino; Brovkin, Victor

    2016-04-01

    Previous analysis of the vegetation cover from remote sensing revealed the existence of three alternative modes in the frequency distribution of boreal tree cover: a sparsely vegetated treeless state, a savanna-like state, and a forest state. Identifying which are the regions subject to multimodality, and assessing which are the main factors underlying their existence, is important to project future change of natural vegetation cover and its effect on climate. We study the impact on the forest cover fraction distribution of seven globally-observed environmental factors: mean annual rainfall, mean minimum temperature, growing degree days above 0, permafrost distribution, soil moisture, wildfire occurrence frequency, and thawing depth. Through the use of generalised additive models, regression trees, and conditional histograms, we find that the main factors determining the forest distribution in high latitudes are: permafrost distribution, mean annual rainfall, mean minimum temperature, soil moisture, and wildfire frequency. Additionally, we find differences between regions within the boreal area, such as Eurasia, Eastern North America, and Western North America. Furthermore, using a classification based on these factors, we show the existence and location of alternative tree cover states under the same climate conditions in the boreal region. These are areas of potential interest for a more detailed analysis of land-atmosphere interactions.

  20. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression. PMID:18450049

  1. Method for estimating potential tree-grade distributions for northeastern forest species. Forest Service research paper (Final)

    SciTech Connect

    Yaussy, D.A.

    1993-03-01

    The generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive variables, allowing the equations to be incorporated into individual-tree growth and yield simulators such as NE-TWIGS.

  2. Digression and Value Concatenation to Enable Privacy-Preserving Regression

    PubMed Central

    Li, Xiao-Bai; Sarkar, Sumit

    2015-01-01

    Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals’ sensitive data. This problem, which we call a “regression attack,” has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression, which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis. PMID:26752802

  3. Steganalysis using logistic regression

    NASA Astrophysics Data System (ADS)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  4. Estimating Scots Pine Tree Mortality Using High Resolution Multispectral Images

    NASA Astrophysics Data System (ADS)

    Buriak, L.; Sukhinin, A. I.; Conard, S. G.; Ivanova, G. A.; McRae, D. J.; Soja, A. J.; Okhotkina, E.

    2010-12-01

    Scots pine (Pinus sylvestris) forest stands of central Siberia are characterized by a mixed-severity fire regime that is dominated by low- to high-severity surface fires, with crown fires occurring less frequently. The purpose of this study was to link ground measurements with air-borne and satellite observations of active wildfires and older fire scars to better estimate tree mortality remotely. Data from field sampling on experimental fires and wildfires were linked with intermediate-resolution satellite (Landsat Enhanced Thematic Mapper) data to estimate fire severity and carbon emissions. Results are being applied to Advanced Very High Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectroradiometer (MODIS) imagery, MERIS, Landsat-ETM, SPOT (i.e., low, middle and high spatial resolution), to understand their remote-sensing capability for mapping fire severity, as indicated by tree mortality. Tree mortality depends on fireline intensity, residence time, and the physiological effects on the cambium layer, foliage and roots. We have correlated tree mortality measured after fires of varying severity with NDVI and other Chlorophyll Indexes to model tree mortality on a landscape scale. The field data obtained on experimental and wildfires are being analyzed and compared with intermediate-resolution satellite data (Landsat7-ETM) to help estimate fire severity, emissions, and carbon balance. In addition, it is being used to monitor immediate ecosystem fire effects (e.g., tree mortality) and long-term postfire vegetation recovery. These data are also being used to validate AVHRR , MODIS, and MERIS estimates of burn area. We studied burned areas in the Angara Region of central Siberia (northeast of Lake Baikal) for which both ground data and satellite data (ENVISAT-MERIS, Spot4, Landsat5, Landsat7-ETM) were available for the 2003 - 2004 and 2006 - 2008 periods. Ground validation was conducted on seventy sample plots established on burned sites differing in

  5. Using tree diversity to compare phylogenetic heuristics

    PubMed Central

    Sul, Seung-Jin; Matthews, Suzanne; Williams, Tiffani L

    2009-01-01

    Background Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic heuristics are used to search stochastically for the best-scoring trees in tree space. Given that better tree scores are believed to be better approximations of the true phylogeny, traditional evaluation techniques have used tree scores to determine the heuristics that find the best scores in the fastest time. We develop new techniques to evaluate phylogenetic heuristics based on both tree scores and topologies to compare Pauprat and Rec-I-DCM3, two popular Maximum Parsimony search algorithms. Results Our results show that although Pauprat and Rec-I-DCM3 find the trees with the same best scores, topologically these trees are quite different. Furthermore, the Rec-I-DCM3 trees cluster distinctly from the Pauprat trees. In addition to our heatmap visualizations of using parsimony scores and the Robinson-Foulds distance to compare best-scoring trees found by the two heuristics, we also develop entropy-based methods to show the diversity of the trees found. Overall, Pauprat identifies more diverse trees than Rec-I-DCM3. Conclusion Overall, our work shows that there is value to comparing heuristics beyond the parsimony scores that they find. Pauprat is a slower heuristic than Rec-I-DCM3. However, our work shows that there is tremendous value in using Pauprat to reconstruct trees—especially since it finds identical scoring but topologically distinct trees. Hence, instead of discounting Pauprat, effort should go in improving its implementation. Ultimately, improved performance measures lead to better phylogenetic heuristics and will result in better approximations of the true evolutionary history of the organisms of interest. PMID:19426451

  6. The influence of tree morphology on stemflow generation in a tropical lowland rainforest

    NASA Astrophysics Data System (ADS)

    Uber, Magdalena; Levia, Delphis F.; Zimmermann, Beate; Zimmermann, Alexander

    2014-05-01

    Even though stemflow usually accounts for only a small proportion of rainfall, it is an important point source of water and ion input to forest floors and may, for instance, influence soil moisture patterns and groundwater recharge. Previous studies showed that the generation of stemflow depends on a multitude of meteorological and biological factors. Interestingly, despite the tremendous progress in stemflow research during the last decades it is still largely unknown which combination of tree characteristics determines stemflow volumes in species-rich tropical forests. This knowledge gap motivated us to analyse the influence of tree characteristics on stemflow volumes in a 1 hectare plot located in a Panamanian lowland rainforest. Our study comprised stemflow measurements in six randomly selected 10 m by 10 m subplots. In each subplot we measured stemflow of all trees with a diameter at breast height (DBH) > 5 cm on an event-basis for a period of six weeks. Additionally, we identified all tree species and determined a set of tree characteristics including DBH, crown diameter, bark roughness, bark furrowing, epiphyte coverage, tree architecture, stem inclination, and crown position. During the sampling period, we collected 985 L of stemflow (0.98 % of total rainfall). Based on regression analyses and comparisons among plant functional groups we show that palms were most efficient in yielding stemflow due to their large inclined fronds. Trees with large emergent crowns also produced relatively large amounts of stemflow. Due to their abundance, understory trees contribute much to stemflow yield not on individual but on the plot scale. Even though parameters such as crown diameter, branch inclination and position of the crown influence stemflow generation to some extent, these parameters explain less than 30 % of the variation in stemflow volumes. In contrast to published results from temperate forests, we did not detect a negative correlation between bark roughness

  7. Insert tree completion system

    SciTech Connect

    Brands, K.W.; Ball, I.G.; Cegielski, E.J.; Gresham, J.S.; Saunders, D.N.

    1982-09-01

    This paper outlines the overall project for development and installation of a low-profile, caisson-installed subsea Christmas tree. After various design studies and laboratory and field tests of key components, a system for installation inside a 30-in. conductor was ordered in July 1978 from Cameron Iron Works Inc. The system is designed to have all critical-pressure-containing components below the mudline and, with the reduced profile (height) above seabed, provides for improved safety of satellite underwater wells from damage by anchors, trawl boards, and even icebergs. In addition to the innovative nature of the tree design, the completion includes improved 3 1/2-in. through flowline (TFL) pumpdown completion equipment with deep set safety valves and a dual detachable packer head for simplified workover capability. The all-hydraulic control system incorporates a new design of sequencing valve for both Christmas tree control and remote flowline connection. A semisubmersible drilling rig was used to initiate the first end flowline connection at the wellhead for subsequent tie-in to the prelaid, surface-towed, all-welded subsea pipeline bundle.

  8. Ridge regression processing

    NASA Technical Reports Server (NTRS)

    Kuhl, Mark R.

    1990-01-01

    Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.

  9. Tree Tectonics

    NASA Astrophysics Data System (ADS)

    Vogt, Peter R.

    2004-09-01

    Nature often replicates her processes at different scales of space and time in differing media. Here a tree-trunk cross section I am preparing for a dendrochronological display at the Battle Creek Cypress Swamp Nature Sanctuary (Calvert County, Maryland) dried and cracked in a way that replicates practically all the planform features found along the Mid-Oceanic Ridge (see Figure 1). The left-lateral offset of saw marks, contrasting with the right-lateral ``rift'' offset, even illustrates the distinction between transcurrent (strike-slip) and transform faults, the latter only recognized as a geologic feature, by J. Tuzo Wilson, in 1965. However, wood cracking is but one of many examples of natural processes that replicate one or several elements of lithospheric plate tectonics. Many of these examples occur in everyday venues and thus make great teaching aids, ``teachable'' from primary school to university levels. Plate tectonics, the dominant process of Earth geology, also occurs in miniature on the surface of some lava lakes, and as ``ice plate tectonics'' on our frozen seas and lakes. Ice tectonics also happens at larger spatial and temporal scales on the Jovian moons Europa and perhaps Ganymede. Tabletop plate tectonics, in which a molten-paraffin ``asthenosphere'' is surfaced by a skin of congealing wax ``plates,'' first replicated Mid-Oceanic Ridge type seafloor spreading more than three decades ago. A seismologist (J. Brune, personal communication, 2004) discovered wax plate tectonics by casually and serendipitously pulling a stick across a container of molten wax his wife and daughters had used in making candles. Brune and his student D. Oldenburg followed up and mirabile dictu published the results in Science (178, 301-304).

  10. The Needs of Trees

    ERIC Educational Resources Information Center

    Boyd, Amy E.; Cooper, Jim

    2004-01-01

    Tree rings can be used not only to look at plant growth, but also to make connections between plant growth and resource availability. In this lesson, students in 2nd-4th grades use role-play to become familiar with basic requirements of trees and how availability of those resources is related to tree ring sizes and tree growth. These concepts can…

  11. Recursive Algorithm For Linear Regression

    NASA Technical Reports Server (NTRS)

    Varanasi, S. V.

    1988-01-01

    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  12. Multipolar consensus for phylogenetic trees.

    PubMed

    Bonnard, Cécile; Berry, Vincent; Lartillot, Nicolas

    2006-10-01

    Collections of phylogenetic trees are usually summarized using consensus methods. These methods build a single tree, supposed to be representative of the collection. However, in the case of heterogeneous collections of trees, the resulting consensus may be poorly resolved (strict consensus, majority-rule consensus, ...), or may perform arbitrary choices among mutually incompatible clades, or splits (greedy consensus). Here, we propose an alternative method, which we call the multipolar consensus (MPC). Its aim is to display all the splits having a support above a predefined threshold, in a minimum number of consensus trees, or poles. We show that the problem is equivalent to a graph-coloring problem, and propose an implementation of the method. Finally, we apply the MPC to real data sets. Our results indicate that, typically, all the splits down to a weight of 10% can be displayed in no more than 4 trees. In addition, in some cases, biologically relevant secondary signals, which would not have been present in any of the classical consensus trees, are indeed captured by our method, indicating that the MPC provides a convenient exploratory method for phylogenetic analysis. The method was implemented in a package freely available at http://www.lirmm.fr/~cbonnard/MPC.html PMID:17060203

  13. Non-phytoseiid Mesostigmata within citrus orchards in Florida: species distribution, relative and seasonal abundance within trees, associated vines and ground cover plants and additional collection records of mites in citrus orchards.

    PubMed

    Childers, Carl C; Ueckermann, Eduard A

    2015-03-01

    Seven citrus orchards on reduced- to no-pesticide spray programs in central and south central Florida were sampled for non-phytoseiid mesostigmatid mites. Inner and outer canopy leaves, fruits, twigs and trunk scrapings were sampled monthly between August 1994 and January 1996. Open flowers were sampled in March from five of the sites. A total of 431 samples from one or more of 82 vine or ground cover plants were sampled monthly in five of the seven orchards. Two of the seven orchards (Mixon I and II) were on full herbicide programs and vines and ground cover plants were absent. A total of 2,655 mites (26 species) within the families: Ascidae, Blattisociidae, Laelapidae, Macrochelidae, Melicharidae, Pachylaelapidae and Parasitidae were identified. A total of 685 mites in the genus Asca (nine species: family Ascidae) were collected from within tree samples, 79 from vine or ground cover plants. Six species of Blattisociidae were collected: Aceodromus convolvuli, Blattisocius dentriticus, B. keegani, Cheiroseius sp. near jamaicensis, Lasioseius athiashenriotae and L. dentatus. A total of 485 Blattisociidae were collected from within tree samples compared with 167 from vine or ground cover plants. Low numbers of Laelapidae and Macrochelidae were collected from within tree samples. One Zygoseius furciger (Pachylaelapidae) was collected from Eleusine indica. Four species of Melicharidae were identified from 34 mites collected from within tree samples and 1,190 from vine or ground cover plants: Proctolaelaps lobatus was the most abundant species with 1,177 specimens collected from seven ground cover plants. One Phorytocarpais fimetorum (Parasitidae) was collected from inner leaves and four from twigs. Species of Ascidae, Blattisociidae, Melicharidae, Laelapidae and Pachylaelapidae were collected from 31 of the 82 vine or ground cover plants sampled, representing only a small fraction of the total number of Phytoseiidae collected from the same plants. Including the

  14. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

    PubMed Central

    Stadler, Tanja; Degnan, James H.; Rosenberg, Noah A.

    2016-01-01

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth–death and multispecies coalescent model can explain the difference in empirical trees and birth–death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785

  15. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

    PubMed

    Stadler, Tanja; Degnan, James H; Rosenberg, Noah A

    2016-07-01

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth-death and multispecies coalescent model can explain the difference in empirical trees and birth-death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785

  16. Decision Tree Modeling for Ranking Data

    NASA Astrophysics Data System (ADS)

    Yu, Philip L. H.; Wan, Wai Ming; Lee, Paul H.

    Ranking/preference data arises from many applications in marketing, psychology, and politics. We establish a new decision tree model for the analysis of ranking data by adopting the concept of classification and regression tree. The existing splitting criteria are modified in a way that allows them to precisely measure the impurity of a set of ranking data. Two types of impurity measures for ranking data are introduced, namelyg-wise and top-k measures. Theoretical results show that the new measures exhibit properties of impurity functions. In model assessment, the area under the ROC curve (AUC) is applied to evaluate the tree performance. Experiments are carried out to investigate the predictive performance of the tree model for complete and partially ranked data and promising results are obtained. Finally, a real-world application of the proposed methodology to analyze a set of political rankings data is presented.

  17. Multinomial logistic regression ensembles.

    PubMed

    Lee, Kyewon; Ahn, Hongshik; Moon, Hojin; Kodell, Ralph L; Chen, James J

    2013-05-01

    This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model. PMID:23611203

  18. Bayesian Spatial Quantile Regression

    PubMed Central

    Reich, Brian J.; Fuentes, Montserrat; Dunson, David B.

    2013-01-01

    Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997–2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. PMID:23459794

  19. Bayesian Spatial Quantile Regression.

    PubMed

    Reich, Brian J; Fuentes, Montserrat; Dunson, David B

    2011-03-01

    Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997-2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. PMID:23459794

  20. Canonical variate regression.

    PubMed

    Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun

    2016-07-01

    In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. PMID:26861909

  1. Fuzzy tree automata and syntactic pattern recognition.

    PubMed

    Lee, E T

    1982-04-01

    An approach of representing patterns by trees and processing these trees by fuzzy tree automata is described. Fuzzy tree automata are defined and investigated. The results include that the class of fuzzy root-to-frontier recognizable ¿-trees is closed under intersection, union, and complementation. Thus, the class of fuzzy root-to-frontier recognizable ¿-trees forms a Boolean algebra. Fuzzy tree automata are applied to processing fuzzy tree representation of patterns based on syntactic pattern recognition. The grade of acceptance is defined and investigated. Quantitative measures of ``approximate isosceles triangle,'' ``approximate elongated isosceles triangle,'' ``approximate rectangle,'' and ``approximate cross'' are defined and used in the illustrative examples of this approach. By using these quantitative measures, a house, a house with high roof, and a church are also presented as illustrative examples. In addition, three fuzzy tree automata are constructed which have the capability of processing the fuzzy tree representations of ``fuzzy houses,'' ``houses with high roofs,'' and ``fuzzy churches,'' respectively. The results may have useful applications in pattern recognition, image processing, artificial intelligence, pattern database design and processing, image science, and pictorial information systems. PMID:21869062

  2. Trees grow on money: urban tree canopy cover and environmental justice.

    PubMed

    Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G; Zhou, Weiqi; McHale, Melissa; Grove, J Morgan; O'Neil-Dunne, Jarlath; McFadden, Joseph P; Buckley, Geoffrey L; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L

    2015-01-01

    This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman's correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns. PMID:25830303

  3. Trees Grow on Money: Urban Tree Canopy Cover and Environmental Justice

    PubMed Central

    Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G.; Zhou, Weiqi; McHale, Melissa; Grove, J. Morgan; O’Neil-Dunne, Jarlath; McFadden, Joseph P.; Buckley, Geoffrey L.; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L.

    2015-01-01

    This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman’s correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns. PMID:25830303

  4. Linear regression in astronomy. I

    NASA Technical Reports Server (NTRS)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  5. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  6. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  7. Fault tree handbook

    SciTech Connect

    Haasl, D.F.; Roberts, N.H.; Vesely, W.E.; Goldberg, F.F.

    1981-01-01

    This handbook describes a methodology for reliability analysis of complex systems such as those which comprise the engineered safety features of nuclear power generating stations. After an initial overview of the available system analysis approaches, the handbook focuses on a description of the deductive method known as fault tree analysis. The following aspects of fault tree analysis are covered: basic concepts for fault tree analysis; basic elements of a fault tree; fault tree construction; probability, statistics, and Boolean algebra for the fault tree analyst; qualitative and quantitative fault tree evaluation techniques; and computer codes for fault tree evaluation. Also discussed are several example problems illustrating the basic concepts of fault tree construction and evaluation.

  8. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  9. Categorizing Ideas about Trees: A Tree of Trees

    PubMed Central

    Fisler, Marie; Lecointre, Guillaume

    2013-01-01

    The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a “tree of trees.” Then, we categorize schools of tree-representations. Classical schools like “cladists” and “pheneticists” are recovered but others are not: “gradists” are separated into two blocks, one of them being called here “grade theoreticians.” We propose new interesting categories like the “buffonian school,” the “metaphoricians,” and those using “strictly genealogical classifications.” We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization. PMID:23950877

  10. Forest Management Intensity Affects Aquatic Communities in Artificial Tree Holes

    PubMed Central

    Petermann, Jana S.; Rohland, Anja; Sichardt, Nora; Lade, Peggy; Guidetti, Brenda; Weisser, Wolfgang W.; Gossner, Martin M.

    2016-01-01

    Forest management could potentially affect organisms in all forest habitats. However, aquatic communities in water-filled tree-holes may be especially sensitive because of small population sizes, the risk of drought and potential dispersal limitation. We set up artificial tree holes in forest stands subject to different management intensities in two regions in Germany and assessed the influence of local environmental properties (tree-hole opening type, tree diameter, water volume and water temperature) as well as regional drivers (forest management intensity, tree-hole density) on tree-hole insect communities (not considering other organisms such as nematodes or rotifers), detritus content, oxygen and nutrient concentrations. In addition, we compared data from artificial tree holes with data from natural tree holes in the same area to evaluate the methodological approach of using tree-hole analogues. We found that forest management had strong effects on communities in artificial tree holes in both regions and across the season. Abundance and species richness declined, community composition shifted and detritus content declined with increasing forest management intensity. Environmental variables, such as tree-hole density and tree diameter partly explained these changes. However, dispersal limitation, indicated by effects of tree-hole density, generally showed rather weak impacts on communities. Artificial tree holes had higher water temperatures (on average 2°C higher) and oxygen concentrations (on average 25% higher) than natural tree holes. The abundance of organisms was higher but species richness was lower in artificial tree holes. Community composition differed between artificial and natural tree holes. Negative management effects were detectable in both tree-hole systems, despite their abiotic and biotic differences. Our results indicate that forest management has substantial and pervasive effects on tree-hole communities and may alter their structure and

  11. Forest Management Intensity Affects Aquatic Communities in Artificial Tree Holes.

    PubMed

    Petermann, Jana S; Rohland, Anja; Sichardt, Nora; Lade, Peggy; Guidetti, Brenda; Weisser, Wolfgang W; Gossner, Martin M

    2016-01-01

    Forest management could potentially affect organisms in all forest habitats. However, aquatic communities in water-filled tree-holes may be especially sensitive because of small population sizes, the risk of drought and potential dispersal limitation. We set up artificial tree holes in forest stands subject to different management intensities in two regions in Germany and assessed the influence of local environmental properties (tree-hole opening type, tree diameter, water volume and water temperature) as well as regional drivers (forest management intensity, tree-hole density) on tree-hole insect communities (not considering other organisms such as nematodes or rotifers), detritus content, oxygen and nutrient concentrations. In addition, we compared data from artificial tree holes with data from natural tree holes in the same area to evaluate the methodological approach of using tree-hole analogues. We found that forest management had strong effects on communities in artificial tree holes in both regions and across the season. Abundance and species richness declined, community composition shifted and detritus content declined with increasing forest management intensity. Environmental variables, such as tree-hole density and tree diameter partly explained these changes. However, dispersal limitation, indicated by effects of tree-hole density, generally showed rather weak impacts on communities. Artificial tree holes had higher water temperatures (on average 2°C higher) and oxygen concentrations (on average 25% higher) than natural tree holes. The abundance of organisms was higher but species richness was lower in artificial tree holes. Community composition differed between artificial and natural tree holes. Negative management effects were detectable in both tree-hole systems, despite their abiotic and biotic differences. Our results indicate that forest management has substantial and pervasive effects on tree-hole communities and may alter their structure and

  12. Heritability Estimation using Regression Models for Correlation

    PubMed Central

    Lee, Hye-Seung; Paik, Myunghee Cho; Rundek, Tatjana; Sacco, Ralph L; Dong, Chuanhui; Krischer, Jeffrey P

    2012-01-01

    Heritability estimates a polygenic effect on a trait for a population. Reliable interpretation of heritability is critical in planning further genetic studies to locate a gene responsible for the trait. This study accommodates both single and multiple trait cases by employing regression models for correlation parameter to infer the heritability. Sharing the properties of regression approach, the proposed methods are exible to incorporate non-genetic and/or non-additive genetic information in the analysis. The performances of the proposed model are compared with those using the likelihood approach through simulations and carotid Intima Media Thickness analysis from Northern Manhattan family Study. PMID:22457844

  13. Improving phylogenetic regression under complex evolutionary models.

    PubMed

    Mazel, Florent; Davies, T Jonathan; Georges, Damien; Lavergne, Sébastien; Thuiller, Wilfried; Peres-NetoO, Pedro R

    2016-02-01

    Phylogenetic Generalized Least Square (PGLS) is the tool of choice among phylogenetic comparative methods to measure the correlation between species features such as morphological and life-history traits or niche characteristics. In its usual form, it assumes that the residual variation follows a homogenous model of evolution across the branches of the phylogenetic tree. Since a homogenous model of evolution is unlikely to be realistic in nature, we explored the robustness of the phylogenetic regression when this assumption is violated. We did so by simulating a set of traits under various heterogeneous models of evolution, and evaluating the statistical performance (type I error [the percentage of tests based on samples that incorrectly rejected a true null hypothesis] and power [the percentage of tests that correctly rejected a false null hypothesis]) of classical phylogenetic regression. We found that PGLS has good power but unacceptable type I error rates. This finding is important since this method has been increasingly used in comparative analyses over the last decade. To address this issue, we propose a simple solution based on transforming the underlying variance-covariance matrix to adjust for model heterogeneity within PGLS. We suggest that heterogeneous rates of evolution might be particularly prevalent in large phylogenetic trees, while most current approaches assume a homogenous rate of evolution. Our analysis demonstrates that overlooking rate heterogeneity can result in inflated type I errors, thus misleading comparative analyses. We show that it is possible to correct for this bias even when the underlying model of evolution is not known a priori. PMID:27145604

  14. Linear regression in astronomy. II

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  15. Quantile regression for climate data

    NASA Astrophysics Data System (ADS)

    Marasinghe, Dilhani Shalika

    Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile regression.Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis. When dealing with the non-normality and non constant variance assumption, quantile regression is a better candidate for the estimation of the derivative.

  16. The Allometry of Coarse Root Biomass: Log-Transformed Linear Regression or Nonlinear Regression?

    PubMed Central

    Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J.; Ma, Keping

    2013-01-01

    Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees. PMID:24116197

  17. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  18. Retro-regression--another important multivariate regression improvement.

    PubMed

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA. PMID:11410035

  19. Decision Tree Approach for Soil Liquefaction Assessment

    PubMed Central

    Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498

  20. Evolution of tree nutrition.

    PubMed

    Raven, John A; Andrews, Mitchell

    2010-09-01

    Using a broad definition of trees, the evolutionary origins of trees in a nutritional context is considered using data from the fossil record and molecular phylogeny. Trees are first known from the Late Devonian about 380 million years ago, originated polyphyletically at the pteridophyte grade of organization; the earliest gymnosperms were trees, and trees are polyphyletic in the angiosperms. Nutrient transporters, assimilatory pathways, homoiohydry (cuticle, intercellular gas spaces, stomata, endohydric water transport systems including xylem and phloem-like tissue) and arbuscular mycorrhizas preceded the origin of trees. Nutritional innovations that began uniquely in trees were the seed habit and, certainly (but not necessarily uniquely) in trees, ectomycorrhizas, cyanobacterial, actinorhizal and rhizobial (Parasponia, some legumes) diazotrophic symbioses and cluster roots. PMID:20581011

  1. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  2. Chem-Is-Tree.

    ERIC Educational Resources Information Center

    Barry, Dana M.

    1997-01-01

    Provides details on the chemical composition of trees including a definition of wood. Also includes an activity on anthocyanins as well as a discussion of the resistance of wood to solvents and chemicals. Lists interesting products from trees. (DDR)

  3. Category of trees in representation theory of quantum algebras

    SciTech Connect

    Moskaliuk, N. M.; Moskaliuk, S. S.

    2013-10-15

    New applications of categorical methods are connected with new additional structures on categories. One of such structures in representation theory of quantum algebras, the category of Kuznetsov-Smorodinsky-Vilenkin-Smirnov (KSVS) trees, is constructed, whose objects are finite rooted KSVS trees and morphisms generated by the transition from a KSVS tree to another one.

  4. The space of ultrametric phylogenetic trees.

    PubMed

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. PMID:27188249

  5. Decision-Tree Program

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1994-01-01

    IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.

  6. Comparison of various texture classification methods using multiresolution analysis and linear regression modelling.

    PubMed

    Dhanya, S; Kumari Roshni, V S

    2016-01-01

    Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform. PMID:26835234

  7. Classifying pairs with trees for supervised biological network inference† †Electronic supplementary information (ESI) available: Implementation and computational issues, supplementary performance curves, and illustration of interpretability of trees. See DOI: 10.1039/c5mb00174a Click here for additional data file.

    PubMed Central

    Wehenkel, Louis; Babu, M. Madan; Geurts, Pierre

    2015-01-01

    Networks are ubiquitous in biology, and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as a classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the latter for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods. PMID:26008881

  8. Winter Birch Trees

    ERIC Educational Resources Information Center

    Sweeney, Debra; Rounds, Judy

    2011-01-01

    Trees are great inspiration for artists. Many art teachers find themselves inspired and maybe somewhat obsessed with the natural beauty and elegance of the lofty tree, and how it changes through the seasons. One such tree that grows in several regions and always looks magnificent, regardless of the time of year, is the birch. In this article, the…

  9. Illumination Under Trees

    SciTech Connect

    Max, N

    2002-08-19

    This paper is a survey of the author's work on illumination and shadows under trees, including the effects of sky illumination, sun penumbras, scattering in a misty atmosphere below the trees, and multiple scattering and transmission between leaves. It also describes a hierarchical image-based rendering method for trees.

  10. Minnesota's Forest Trees. Revised.

    ERIC Educational Resources Information Center

    Miles, William R.; Fuller, Bruce L.

    This bulletin describes 46 of the more common trees found in Minnesota's forests and windbreaks. The bulletin contains two tree keys, a summer key and a winter key, to help the reader identify these trees. Besides the two keys, the bulletin includes an introduction, instructions for key use, illustrations of leaf characteristics and twig…

  11. The Wish Tree Project

    ERIC Educational Resources Information Center

    Brooks, Sarah DeWitt

    2010-01-01

    This article describes the author's experience in implementing a Wish Tree project in her school in an effort to bring the school community together with a positive art-making experience during a potentially stressful time. The concept of a wish tree is simple: plant a tree; provide tags and pencils for writing wishes; and encourage everyone to…

  12. Diary of a Tree.

    ERIC Educational Resources Information Center

    Srulowitz, Frances

    1992-01-01

    Describes an activity to develop students' skills of observation and recordkeeping by studying the growth of a tree's leaves during the spring. Children monitor the growth of 11 tress over a 2-month period, draw pictures of the tree at different stages of growth, and write diaries of the tree's growth. (MDH)

  13. Harmonic regression and scale stability.

    PubMed

    Lee, Yi-Hsuan; Haberman, Shelby J

    2013-10-01

    Monitoring a very frequently administered educational test with a relatively short history of stable operation imposes a number of challenges. Test scores usually vary by season, and the frequency of administration of such educational tests is also seasonal. Although it is important to react to unreasonable changes in the distributions of test scores in a timely fashion, it is not a simple matter to ascertain what sort of distribution is really unusual. Many commonly used approaches for seasonal adjustment are designed for time series with evenly spaced observations that span many years and, therefore, are inappropriate for data from such educational tests. Harmonic regression, a seasonal-adjustment method, can be useful in monitoring scale stability when the number of years available is limited and when the observations are unevenly spaced. Additional forms of adjustments can be included to account for variability in test scores due to different sources of population variations. To illustrate, real data are considered from an international language assessment. PMID:24092490

  14. Precision Efficacy Analysis for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.

    When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…

  15. Ecological Regression and Voting Rights.

    ERIC Educational Resources Information Center

    Freedman, David A.; And Others

    1991-01-01

    The use of ecological regression in voting rights cases is discussed in the context of a lawsuit against Los Angeles County (California) in 1990. Ecological regression assumes that systematic voting differences between precincts are explained by ethnic differences. An alternative neighborhood model is shown to lead to different conclusions. (SLD)

  16. Logistic Regression: Concept and Application

    ERIC Educational Resources Information Center

    Cokluk, Omay

    2010-01-01

    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  17. Fungible weights in logistic regression.

    PubMed

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record PMID:26651981

  18. [Regression grading in gastrointestinal tumors].

    PubMed

    Tischoff, I; Tannapfel, A

    2012-02-01

    Preoperative neoadjuvant chemoradiation therapy is a well-established and essential part of the interdisciplinary treatment of gastrointestinal tumors. Neoadjuvant treatment leads to regressive changes in tumors. To evaluate the histological tumor response different scoring systems describing regressive changes are used and known as tumor regression grading. Tumor regression grading is usually based on the presence of residual vital tumor cells in proportion to the total tumor size. Currently, no nationally or internationally accepted grading systems exist. In general, common guidelines should be used in the pathohistological diagnostics of tumors after neoadjuvant therapy. In particularly, the standard tumor grading will be replaced by tumor regression grading. Furthermore, tumors after neoadjuvant treatment are marked with the prefix "y" in the TNM classification. PMID:22293790

  19. Distributed Contour Trees

    SciTech Connect

    Morozov, Dmitriy; Weber, Gunther H.

    2014-03-31

    Topological techniques provide robust tools for data analysis. They are used, for example, for feature extraction, for data de-noising, and for comparison of data sets. This chapter concerns contour trees, a topological descriptor that records the connectivity of the isosurfaces of scalar functions. These trees are fundamental to analysis and visualization of physical phenomena modeled by real-valued measurements. We study the parallel analysis of contour trees. After describing a particular representation of a contour tree, called local{global representation, we illustrate how di erent problems that rely on contour trees can be solved in parallel with minimal communication.

  20. Multiple weight stepwise regression

    SciTech Connect

    Atkins, J. |; Campbell, J.

    1993-10-01

    In many science and engineering applications, there is an interest in predicting the outputs of a process for given levels of inputs. In order to develop a model, one could run the process (or a simulation of the process) at a number of points (a point would be one run at one set of possible input values) and observe the values of the outputs at those points. There observations can be used to predict the values of the outputs for other values of the inputs. Since the outputs are a function of the inputs, we can generate a surface in the space of possible inputs and outputs. This surface is called a response surface. In some cases, collecting data needed to generate a response surface can e very expensive. Thus, in these cases, there is a powerful incentive to minimize the sample size while building better response surfaces. One such case is the semiconductor equipment manufacturing industry. Semiconductor manufacturing equipment is complex and expensive. Depending upon the type of equipment, the number of control parameters may range from 10 to 30 with perhaps 5 to 10 being important. Since a single run can cost hundreds or thousands of dollars, it is very important to have efficient methods for building response surfaces. A current approach to this problem is to do the experiment in two stages. First, a traditional design (such as fractional factorial) is used to screen variables. After deciding which variables are significant, additional runs of the experiment are conducted. The original runs and the new runs are used to build a model with the significant variables. However, the original (screening) runs are not as helpful for building the model as some other points might have been. This paper presents a point selection scheme that is more efficient than traditional designs.

  1. Environmental regulation of xylem sap flow and total conductance of Larix gmelinii trees in eastern Siberia.

    PubMed

    Arneth, A.; Kelliher, F. M.; Bauer, G.; Hollinger, D. Y.; Byers, J. N.; Hunt, J. E.; McSeveny, T. M.; Ziegler, W.; Vygodskaya, N. N.; Milukova, I.; Sogachov, A.; Varlagin, A.; Schulze, E.-D.

    1996-01-01

    -sided leaf area basis, which is comparable to the published porometer data for Larix. Diurnal variation in total tree conductance (G(t)) was related to changes in the above-canopy visible irradiance (Q) and D. A saturating upper-boundary function for the relationship between G(t) and Q was defined as G(t) = G(tmax)(Q/[Q + Q(50)]), where Q(50) = 164 +/- 85 micro mol m(-2) s(-1) when G(t) = G(tmax)/2. Accounting for Q by excluding data for Q < Q(85) when G(t) was at least 85% of G(tmax), the upper limit for the relationship between G(t) and D was determined based on the function G(t) = (a + blnD)(2), where a and b are regression coefficients. The relationship between G(t) and D was curvilinear, indicating that there was a proportional decrease in G(t) with increasing D such that F was relatively constant throughout much of the day, even when D ranged between about 2 and 4 kPa, which may be interpreted as an adaption of the species to its continental climate. However, at given values of Q and D, G(t) was generally higher in the morning than in the afternoon. The additional environmental constraints on G(t) imposed by leaf nitrogen nutrition and afternoon water stress are discussed. PMID:14871769

  2. Growth of a Pine Tree

    ERIC Educational Resources Information Center

    Rollinson, Susan Wells

    2012-01-01

    The growth of a pine tree is examined by preparing "tree cookies" (cross-sectional disks) between whorls of branches. The use of Christmas trees allows the tree cookies to be obtained with inexpensive, commonly available tools. Students use the tree cookies to investigate the annual growth of the tree and how it corresponds to the number of whorls…

  3. Trees, soils, and food security

    PubMed Central

    Sanchez, P. A.; Buresh, R. J.; Leakey, R. R. B.

    1997-01-01

    Trees have a different impact on soil properties than annual crops, because of their longer residence time, larger biomass accumulation, and longer-lasting, more extensive root systems. In natural forests nutrients are efficiently cycled with very small inputs and outputs from the system. In most agricultural systems the opposite happens. Agroforestry encompasses the continuum between these extremes, and emerging hard data is showing that successful agroforestry systems increase nutrient inputs, enhance internal flows, decrease nutrient losses and provide environmental benefits: when the competition for growth resources between the tree and the crop component is well managed. The three main determinants for overcoming rural poverty in Africa are (i) reversing soil fertility depletion, (ii) intensifying and diversifying land use with high-value products, and (iii) providing an enabling policy environment for the smallholder farming sector. Agroforestry practices can improve food production in a sustainable way through their contribution to soil fertility replenishment. The use of organic inputs as a source of biologically-fixed nitrogen, together with deep nitrate that is captured by trees, plays a major role in nitrogen replenishment. The combination of commercial phosphorus fertilizers with available organic resources may be the key to increasing and sustaining phosphorus capital. High-value trees, 'Cinderella' species, can fit in specific niches on farms, thereby making the system ecologically stable and more rewarding economically, in addition to diversifying and increasing rural incomes and improving food security. In the most heavily populated areas of East Africa, where farm size is extremely small, the number of trees on farms is increasing as farmers seek to reduce labour demands, compatible with the drift of some members of the family into the towns to earn off-farm income. Contrary to the concept that population pressure promotes deforestation, there is

  4. Efficient tree codes on SIMD computer architectures

    NASA Astrophysics Data System (ADS)

    Olson, Kevin M.

    1996-11-01

    This paper describes changes made to a previous implementation of an N -body tree code developed for a fine-grained, SIMD computer architecture. These changes include (1) switching from a balanced binary tree to a balanced oct tree, (2) addition of quadrupole corrections, and (3) having the particles search the tree in groups rather than individually. An algorithm for limiting errors is also discussed. In aggregate, these changes have led to a performance increase of over a factor of 10 compared to the previous code. For problems several times larger than the processor array, the code now achieves performance levels of ~ 1 Gflop on the Maspar MP-2 or roughly 20% of the quoted peak performance of this machine. This percentage is competitive with other parallel implementations of tree codes on MIMD architectures. This is significant, considering the low relative cost of SIMD architectures.

  5. IND - THE IND DECISION TREE PACKAGE

    NASA Technical Reports Server (NTRS)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  6. Controls on tree water uptake and information storage in tree rings

    NASA Astrophysics Data System (ADS)

    Blume, Theresa; Simard, Sonia; Heidbüchel, Ingo; Güntner, Andreas; Heinrich, Ingo

    2016-04-01

    Controls on tree water uptake are investigated in various forest stands in the northeastern German lowlands by a multi-method approach. This approach combines sapflow and dendrometer measurements as well as tree-ring analyses with soil moisture derived root water uptake rates. The latter method has the advantage that it provides depth distributions of root water uptake and thus additional information allowing for a more detailed analysis of the relationship between water availability and water uptake. High resolution climatic data makes it possible to investigate the site specific interplay between atmospheric demand and water availability on the one hand and tree response and adaptation on the other hand. The comparison of spatio-temporal patterns of these responses with concurrent tree growth as well as tree-ring analyses enables a first matching of actual and "archived" patterns and thus an estimate of how much of this information is stored in tree rings.

  7. A tutorial on Bayesian Normal linear regression

    NASA Astrophysics Data System (ADS)

    Klauenberg, Katy; Wübbeler, Gerd; Mickan, Bodo; Harris, Peter; Elster, Clemens

    2015-12-01

    Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view. Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach. These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.

  8. Practical Session: Simple Linear Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).

  9. Splines for Diffeomorphic Image Regression

    PubMed Central

    Singh, Nikhil; Niethammer, Marc

    2016-01-01

    This paper develops a method for splines on diffeomorphisms for image regression. In contrast to previously proposed methods to capture image changes over time, such as geodesic regression, the method can capture more complex spatio-temporal deformations. In particular, it is a first step towards capturing periodic motions for example of the heart or the lung. Starting from a variational formulation of splines the proposed approach allows for the use of temporal control points to control spline behavior. This necessitates the development of a shooting formulation for splines. Experimental results are shown for synthetic and real data. The performance of the method is compared to geodesic regression. PMID:25485370

  10. Modeling confounding by half-sibling regression

    PubMed Central

    Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas

    2016-01-01

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154

  11. Modeling confounding by half-sibling regression.

    PubMed

    Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas

    2016-07-01

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154

  12. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  13. Multiple Regression and Its Discontents

    ERIC Educational Resources Information Center

    Snell, Joel C.; Marsh, Mitchell

    2012-01-01

    Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.

  14. Basis Selection for Wavelet Regression

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin R.; Lau, Sonie (Technical Monitor)

    1998-01-01

    A wavelet basis selection procedure is presented for wavelet regression. Both the basis and the threshold are selected using cross-validation. The method includes the capability of incorporating prior knowledge on the smoothness (or shape of the basis functions) into the basis selection procedure. The results of the method are demonstrated on sampled functions widely used in the wavelet regression literature. The results of the method are contrasted with other published methods.

  15. Regression methods for spatial data

    NASA Technical Reports Server (NTRS)

    Yakowitz, S. J.; Szidarovszky, F.

    1982-01-01

    The kriging approach, a parametric regression method used by hydrologists and mining engineers, among others also provides an error estimate the integral of the regression function. The kriging method is explored and some of its statistical characteristics are described. The Watson method and theory are extended so that the kriging features are displayed. Theoretical and computational comparisons of the kriging and Watson approaches are offered.

  16. Species integrity in trees.

    PubMed

    Ortiz-Barrientos, Daniel; Baack, Eric J

    2014-09-01

    From California sequoia, to Australian eucalyptus, to the outstanding diversity of Amazonian forests, trees are fundamental to many processes in ecology and evolution. Trees define the communities that they inhabit, are host to a multiplicity of other organisms and can determine the ecological dynamics of other plants and animals. Trees are also at the heart of major patterns of biodiversity such as the latitudinal gradient of species diversity and thus are important systems for studying the origin of new plant species. Although the role of trees in community assembly and ecological succession is partially understood, the origin of tree diversity remains largely opaque. For instance, the relative importance of differing habitats and phenologies as barriers to hybridization between closely related species is still largely uncharacterized in trees. Consequently, we know very little about the origin of trees species and their integrity. Similarly, studies on the interplay between speciation and tree community assembly are in their infancy and so are studies on how processes like forest maturation modifies the context in which reproductive isolation evolves. In this issue of Molecular Ecology, Lindtke et al. (2014) and Lagache et al. (2014) overcome some traditional difficulties in studying mating systems and sexual isolation in the iconic oaks and poplars, providing novel insights about the integrity of tree species and on how ecology leads to variation in selection on reproductive isolation over time and space. PMID:25155715

  17. More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011

    NASA Astrophysics Data System (ADS)

    Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R.

    2016-01-01

    Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period.

  18. More Trees, More Poverty? The Socioeconomic Effects of Tree Plantations in Chile, 2001-2011.

    PubMed

    Andersson, Krister; Lawrence, Duncan; Zavaleta, Jennifer; Guariguata, Manuel R

    2016-01-01

    Tree plantations play a controversial role in many nations' efforts to balance goals for economic development, ecological conservation, and social justice. This paper seeks to contribute to this debate by analyzing the socioeconomic impact of such plantations. We focus our study on Chile, a country that has experienced extraordinary growth of industrial tree plantations. Our analysis draws on a unique dataset with longitudinal observations collected in 180 municipal territories during 2001-2011. Employing panel data regression techniques, we find that growth in plantation area is associated with higher than average rates of poverty during this period. PMID:26285776

  19. Tree growth response to ENSO in Durango, Mexico

    NASA Astrophysics Data System (ADS)

    Pompa-García, Marin; Miranda-Aragón, Liliana; Aguirre-Salado, Carlos Arturo

    2015-01-01

    The dynamics of forest ecosystems worldwide have been driven largely by climatic teleconnections. El Niño-Southern Oscillation (ENSO) is the strongest interannual variation of the Earth's climate, affecting the regional climatic regime. These teleconnections may impact plant phenology, growth rate, forest extent, and other gradual changes in forest ecosystems. The objective of this study was to investigate how Pinus cooperi populations face the influence of ENSO and regional microclimates in five ecozones in northwestern Mexico. Using standard dendrochronological techniques, tree-ring chronologies (TRI) were generated. TRI, ENSO, and climate relationships were correlated from 1950-2010. Additionally, multiple regressions were conducted in order to detect those ENSO months with direct relations in TRI ( p < 0.1). The five chronologies showed similar trends during the period they overlapped, indicating that the P. cooperi populations shared an interannual growth variation. In general, ENSO index showed correspondences with tree-ring growth in synchronous periods. We concluded that ENSO had connectivity with regional climate in northern Mexico and radial growth of P. cooperi populations has been driven largely by positive ENSO values (El Niño episodes).

  20. Potomac River Streamflow Since 1730 as Reconstructed by Tree Rings.

    NASA Astrophysics Data System (ADS)

    Cook, Edward R.; Jacoby, Gordon C.

    1983-10-01

    A 248-year reconstruction of the low-flow (July, August and September) period of the Potomac River indicates that the prolonged drought of the 1960s may have been the most severe since 1730. However, there appear to have been several long periods of about 50 years in length when flow was generally above or below the long-term median flow. The period from 1900 through 1950, which comprises most of the measured flow period, was generally above median. Long-period climatic shifts can have important water resource implications.The Potomac River streamflow at Point of Rocks, Maryland was reconstructed by using tree-ring chronologies from sites in or near the river basin. Canonical regression analysis was used to reconstruct simultaneously July, August and September discharge after screening all the tree-ring predictors. Verification statistics and cross-spectral analysis indicate that the average reconstruction of these three months is most reliable for periods longer than about six years and shorter than about three years. Spectral analysis of the reconstruction indicates the presence of a 15.7-year periodicity that warrants verification through examination of meteorological data, as well as through additional streamflow reconstructions in the region.

  1. Tree growth response to ENSO in Durango, Mexico.

    PubMed

    Pompa-García, Marin; Miranda-Aragón, Liliana; Aguirre-Salado, Carlos Arturo

    2015-01-01

    The dynamics of forest ecosystems worldwide have been driven largely by climatic teleconnections. El Niño-Southern Oscillation (ENSO) is the strongest interannual variation of the Earth's climate, affecting the regional climatic regime. These teleconnections may impact plant phenology, growth rate, forest extent, and other gradual changes in forest ecosystems. The objective of this study was to investigate how Pinus cooperi populations face the influence of ENSO and regional microclimates in five ecozones in northwestern Mexico. Using standard dendrochronological techniques, tree-ring chronologies (TRI) were generated. TRI, ENSO, and climate relationships were correlated from 1950-2010. Additionally, multiple regressions were conducted in order to detect those ENSO months with direct relations in TRI (p < 0.1). The five chronologies showed similar trends during the period they overlapped, indicating that the P. cooperi populations shared an interannual growth variation. In general, ENSO index showed correspondences with tree-ring growth in synchronous periods. We concluded that ENSO had connectivity with regional climate in northern Mexico and radial growth of P. cooperi populations has been driven largely by positive ENSO values (El Niño episodes). PMID:24728555

  2. Food additives

    MedlinePlus

    Food additives are substances that become part of a food product when they are added during the processing or making of that food. "Direct" food additives are often added during processing to: Add nutrients ...

  3. Demonstration of a Fiber Optic Regression Probe

    NASA Technical Reports Server (NTRS)

    Korman, Valentin; Polzin, Kurt A.

    2010-01-01

    The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for

  4. Functional Generalized Additive Models.

    PubMed

    McLean, Mathew W; Hooker, Giles; Staicu, Ana-Maria; Scheipl, Fabian; Ruppert, David

    2014-01-01

    We introduce the functional generalized additive model (FGAM), a novel regression model for association studies between a scalar response and a functional predictor. We model the link-transformed mean response as the integral with respect to t of F{X(t), t} where F(·,·) is an unknown regression function and X(t) is a functional covariate. Rather than having an additive model in a finite number of principal components as in Müller and Yao (2008), our model incorporates the functional predictor directly and thus our model can be viewed as the natural functional extension of generalized additive models. We estimate F(·,·) using tensor-product B-splines with roughness penalties. A pointwise quantile transformation of the functional predictor is also considered to ensure each tensor-product B-spline has observed data on its support. The methods are evaluated using simulated data and their predictive performance is compared with other competing scalar-on-function regression alternatives. We illustrate the usefulness of our approach through an application to brain tractography, where X(t) is a signal from diffusion tensor imaging at position, t, along a tract in the brain. In one example, the response is disease-status (case or control) and in a second example, it is the score on a cognitive test. R code for performing the simulations and fitting the FGAM can be found in supplemental materials available online. PMID:24729671

  5. Food additives

    PubMed Central

    Spencer, Michael

    1974-01-01

    Food additives are discussed from the food technology point of view. The reasons for their use are summarized: (1) to protect food from chemical and microbiological attack; (2) to even out seasonal supplies; (3) to improve their eating quality; (4) to improve their nutritional value. The various types of food additives are considered, e.g. colours, flavours, emulsifiers, bread and flour additives, preservatives, and nutritional additives. The paper concludes with consideration of those circumstances in which the use of additives is (a) justified and (b) unjustified. PMID:4467857

  6. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  7. Is susceptibility to prenatal methylmercury exposure from fish consumption non-homogeneous? Tree-structured analysis for the Seychelles Child Development Study.

    PubMed

    Huang, Li-Shan; Myers, Gary J; Davidson, Philip W; Cox, Christopher; Xiao, Fenyuan; Thurston, Sally W; Cernichiari, Elsa; Shamlaye, Conrad F; Sloane-Reeves, Jean; Georger, Lesley; Clarkson, Thomas W

    2007-11-01

    Studies of the association between prenatal methylmercury exposure from maternal fish consumption during pregnancy and neurodevelopmental test scores in the Seychelles Child Development Study have found no consistent pattern of associations through age 9 years. The analyses for the most recent 9-year data examined the population effects of prenatal exposure, but did not address the possibility of non-homogeneous susceptibility. This paper presents a regression tree approach: covariate effects are treated non-linearly and non-additively and non-homogeneous effects of prenatal methylmercury exposure are permitted among the covariate clusters identified by the regression tree. The approach allows us to address whether children in the lower or higher ends of the developmental spectrum differ in susceptibility to subtle exposure effects. Of 21 endpoints available at age 9 years, we chose the Weschler Full Scale IQ and its associated covariates to construct the regression tree. The prenatal mercury effect in each of the nine resulting clusters was assessed linearly and non-homogeneously. In addition we reanalyzed five other 9-year endpoints that in the linear analysis had a two-tailed p-value <0.2 for the effect of prenatal exposure. In this analysis, motor proficiency and activity level improved significantly with increasing MeHg for 53% of the children who had an average home environment. Motor proficiency significantly decreased with increasing prenatal MeHg exposure in 7% of the children whose home environment was below average. The regression tree results support previous analyses of outcomes in this cohort. However, this analysis raises the intriguing possibility that an effect may be non-homogeneous among children with different backgrounds and IQ levels. PMID:17942158

  8. Interpretation of Standardized Regression Coefficients in Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for variables…

  9. Demosaicing Based on Directional Difference Regression and Efficient Regression Priors.

    PubMed

    Wu, Jiqing; Timofte, Radu; Van Gool, Luc

    2016-08-01

    Color demosaicing is a key image processing step aiming to reconstruct the missing pixels from a recorded raw image. On the one hand, numerous interpolation methods focusing on spatial-spectral correlations have been proved very efficient, whereas they yield a poor image quality and strong visible artifacts. On the other hand, optimization strategies, such as learned simultaneous sparse coding and sparsity and adaptive principal component analysis-based algorithms, were shown to greatly improve image quality compared with that delivered by interpolation methods, but unfortunately are computationally heavy. In this paper, we propose efficient regression priors as a novel, fast post-processing algorithm that learns the regression priors offline from training data. We also propose an independent efficient demosaicing algorithm based on directional difference regression, and introduce its enhanced version based on fused regression. We achieve an image quality comparable to that of the state-of-the-art methods for three benchmarks, while being order(s) of magnitude faster. PMID:27254866

  10. Interquantile Shrinkage in Regression Models

    PubMed Central

    Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.

    2012-01-01

    Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546

  11. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  12. Trees Are Terrific!

    ERIC Educational Resources Information Center

    Braus, Judy, Ed.

    1992-01-01

    Ranger Rick's NatureScope is a creative education series dedicated to inspiring in children an understanding and appreciation of the natural world while developing the skills they will need to make responsible decisions about the environment. Contents are organized into the following sections: (1) "What Makes a Tree a Tree?," including information…

  13. The Flame Tree

    ERIC Educational Resources Information Center

    Lewis, Richard

    2004-01-01

    Lewis's own experiences living in Indonesia are fertile ground for telling "a ripping good story," one found in "The Flame Tree." He hopes people will enjoy the tale and appreciate the differences of an unfamiliar culture. The excerpt from "The Flame Tree" will reel readers in quickly.

  14. Trees for Mother Earth.

    ERIC Educational Resources Information Center

    Greer, Sandy

    1993-01-01

    Describes Trees for Mother Earth, a program in which secondary students raise funds to buy fruit trees to plant during visits to the Navajo Reservation. Benefits include developing feelings of self-worth among participants, promoting cultural exchange and understanding, and encouraging self-sufficiency among the Navajo. (LP)

  15. Tree Topology Estimation.

    PubMed

    Estrada, Rolando; Tomasi, Carlo; Schmidler, Scott C; Farsiu, Sina

    2015-08-01

    Tree-like structures are fundamental in nature, and it is often useful to reconstruct the topology of a tree - what connects to what - from a two-dimensional image of it. However, the projected branches often cross in the image: the tree projects to a planar graph, and the inverse problem of reconstructing the topology of the tree from that of the graph is ill-posed. We regularize this problem with a generative, parametric tree-growth model. Under this model, reconstruction is possible in linear time if one knows the direction of each edge in the graph - which edge endpoint is closer to the root of the tree - but becomes NP-hard if the directions are not known. For the latter case, we present a heuristic search algorithm to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image. Experimental results on retinal vessel, plant root, and synthetic tree data sets show that our methodology is both accurate and efficient. PMID:26353004

  16. Structural Equation Model Trees

    ERIC Educational Resources Information Center

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  17. CSI for Trees

    ERIC Educational Resources Information Center

    Rubino, Darrin L.; Hanson, Deborah

    2009-01-01

    The circles and patterns in a tree's stem tell a story, but that story can be a mystery. Interpreting the story of tree rings provides a way to heighten the natural curiosity of students and help them gain insight into the interaction of elements in the environment. It also represents a wonderful opportunity to incorporate the nature of science.…

  18. Tree nut oils

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The major tree nuts include almonds, Brazil nuts, cashew nuts, hazelnuts, macadamia nuts, pecans, pine nuts, pistachio nuts, and walnuts. Tree nut oils are appreciated in food applications because of their flavors and are generally more expensive than other gourmet oils. Research during the last de...

  19. ECOLOGICAL RESPONSE SURFACES FOR NORTH AMERICAN BOREAL TREE SPECIES AND THEIR USE IN FOREST CLASSIFICATION

    EPA Science Inventory

    Empirical ecological response surfaces were derived for eight dominant tree species in the boreal forest region of Canada. tepwise logistic regression was used to model species dominance as a response to five climatic predictor variables. he predictor variables (annual snowfall, ...

  20. A Decision Tree Approach to the Interpretation of Multivariate Statistical Techniques.

    ERIC Educational Resources Information Center

    Fok, Lillian Y.; And Others

    1995-01-01

    Discusses the nature, power, and limitations of four multivariate techniques: factor analysis, multiple analysis of variance, multiple regression, and multiple discriminant analysis. Shows how decision trees assist in interpreting results. (SK)

  1. From Family Trees to Decision Trees.

    ERIC Educational Resources Information Center

    Trobian, Helen R.

    This paper is a preliminary inquiry by a non-mathematician into graphic methods of sequential planning and ways in which hierarchical analysis and tree structures can be helpful in developing interest in the use of mathematical modeling in the search for creative solutions to real-life problems. Highlights include a discussion of hierarchical…

  2. Cactus: An Introduction to Regression

    ERIC Educational Resources Information Center

    Hyde, Hartley

    2008-01-01

    When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…

  3. Regression modelling of Dst index

    NASA Astrophysics Data System (ADS)

    Parnowski, Aleksei

    We developed a new approach to the problem of real-time space weather indices forecasting using readily available data from ACE and a number of ground stations. It is based on the regression modelling method [1-3], which combines the benefits of empirical and statistical approaches. Mathematically it is based upon the partial regression analysis and Monte Carlo simulations to deduce the empirical relationships in the system. The typical elapsed time per forecast is a few seconds on an average PC. This technique can be easily extended to other indices like AE and Kp. The proposed system can also be useful for investigating physical phenomena related to interactions between the solar wind and the magnetosphere -it already helped uncovering two new geoeffective parameters. 1. Parnowski A.S. Regression modeling method of space weather prediction // Astrophysics Space Science. — 2009. — V. 323, 2. — P. 169-180. doi:10.1007/s10509-009-0060-4 [arXiv:0906.3271] 2. Parnovskiy A.S. Regression Modeling and its Application to the Problem of Prediction of Space Weather // Journal of Automation and Information Sciences. — 2009. — V. 41, 5. — P. 61-69. doi:10.1615/JAutomatInfScien.v41.i5.70 3. Parnowski A.S. Statistically predicting Dst without satellite data // Earth, Planets and Space. — 2009. — V. 61, 5. — P. 621-624.

  4. Fungible Weights in Multiple Regression

    ERIC Educational Resources Information Center

    Waller, Niels G.

    2008-01-01

    Every set of alternate weights (i.e., nonleast squares weights) in a multiple regression analysis with three or more predictors is associated with an infinite class of weights. All members of a given class can be deemed "fungible" because they yield identical "SSE" (sum of squared errors) and R[superscript 2] values. Equations for generating…

  5. Spontaneous regression of breast cancer.

    PubMed

    Lewison, E F

    1976-11-01

    The dramatic but rare regression of a verified case of breast cancer in the absence of adequate, accepted, or conventional treatment has been observed and documented by clinicians over the course of many years. In my practice limited to diseases of the breast, over the past 25 years I have observed 12 patients with a unique and unusual clinical course valid enough to be regarded as spontaneous regression of breast cancer. These 12 patients, with clinically confirmed breast cancer, had temporary arrest or partial remission of their disease in the absence of complete or adequate treatment. In most of these cases, spontaneous regression could not be equated ultimately with permanent cure. Three of these case histories are summarized, and patient characteristics of pertinent clinical interest in the remaining case histories are presented and discussed. Despite widespread doubt and skepticism, there is ample clinical evidence to confirm the fact that spontaneous regression of breast cancer is a rare phenomenon but is real and does occur. PMID:799758

  6. Regression Models of Atlas Appearance

    PubMed Central

    Rohlfing, Torsten; Sullivan, Edith V.; Pfefferbaum, Adolf

    2010-01-01

    Models of object appearance based on principal components analysis provide powerful and versatile tools in computer vision and medical image analysis. A major shortcoming is that they rely entirely on the training data to extract principal modes of appearance variation and ignore underlying variables (e.g., subject age, gender). This paper introduces an appearance modeling framework based instead on generalized multi-linear regression. The training of regression appearance models is controlled by independent variables. This makes it straightforward to create model instances for specific values of these variables, which is akin to model interpolation. We demonstrate the new framework by creating an appearance model of the human brain from MR images of 36 subjects. Instances of the model created for different ages are compared with average shape atlases created from age-matched sub-populations. Relative tissue volumes vs. age in models are also compared with tissue volumes vs. subject age in the original images. In both experiments, we found excellent agreement between the regression models and the comparison data. We conclude that regression appearance models are a promising new technique for image analysis, with one potential application being the representation of a continuum of mutually consistent, age-specific atlases of the human brain. PMID:19694260

  7. Correlation Weights in Multiple Regression

    ERIC Educational Resources Information Center

    Waller, Niels G.; Jones, Jeff A.

    2010-01-01

    A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…

  8. Quantile Regression with Censored Data

    ERIC Educational Resources Information Center

    Lin, Guixian

    2009-01-01

    The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…

  9. Regression models of atlas appearance.

    PubMed

    Rohlfing, Torsten; Sullivan, Edith V; Pfefferbaum, Adolf

    2009-01-01

    Models of object appearance based on principal components analysis provide powerful and versatile tools in computer vision and medical image analysis. A major shortcoming is that they rely entirely on the training data to extract principal modes of appearance variation and ignore underlying variables (e.g., subject age, gender). This paper introduces an appearance modeling framework based instead on generalized multi-linear regression. The training of regression appearance models is controlled by independent variables. This makes it straightforward to create model instances for specific values of these variables, which is akin to model interpolation. We demonstrate the new framework by creating an appearance model of the human brain from MR images of 36 subjects. Instances of the model created for different ages are compared with average shape atlases created from age-matched sub-populations. Relative tissue volumes vs. age in models are also compared with tissue volumes vs. subject age in the original images. In both experiments, we found excellent agreement between the regression models and the comparison data. We conclude that regression appearance models are a promising new technique for image analysis, with one potential application being the representation of a continuum of mutually consistent, age-specific atlases of the human brain. PMID:19694260

  10. Ridge Regression for Interactive Models.

    ERIC Educational Resources Information Center

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…

  11. Feedback of trees on nitrogen mineralization to restrict the advance of trees in C4 savannahs.

    PubMed

    Higgins, Steven I; Keretetse, Moagi; February, Edmund C

    2015-08-01

    Remote sensing studies suggest that savannahs are transforming into more tree-dominated states; however, progressive nitrogen limitation could potentially retard this putatively CO2-driven invasion. We analysed controls on nitrogen mineralization rates in savannah by manipulating rainfall and the cover of grass and tree elements against the backdrop of the seasonal temperature and rainfall variation. We found that the seasonal pattern of nitrogen mineralization was strongly influenced by rainfall, and that manipulative increases in rainfall could boost mineralization rates. Additionally, mineralization rates were considerably higher on plots with grasses and lower on plots with trees. Our findings suggest that shifting a savannah from a grass to a tree-dominated state can substantially reduce nitrogen mineralization rates, thereby potentially creating a negative feedback on the CO2-induced invasion of savannahs by trees. PMID:26268994

  12. Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors

    PubMed Central

    Woodard, Dawn B.; Crainiceanu, Ciprian; Ruppert, David

    2013-01-01

    We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials. PMID:24293988

  13. Tree cover variability in the District of Columbia

    NASA Astrophysics Data System (ADS)

    Johnston, Andrew K.

    Urban forests are increasingly a focus of interest as urbanized populations grow and urban areas expand. Urban forests change as trees are planted, grow, die, and are removed. These processes alter a city's tree cover over time, but this inherent dynamism is poorly understood. Better understanding of how tree cover is a variable land cover component will enhance knowledge of the urban environment and provide new perspectives for management of urban resources. In this study, tree cover variability within a major urban center was observed over a 20 year period. Changes in tree cover proportion were measured in the District of Columbia between 1984-2004 utilizing highly calibrated satellite remote sensing data. Testing of alternate methodologies demonstrated that an approach utilizing support vector regression provided most consistent accuracy across land use types. Tree cover maps were validated using aerial photography imagery and data from field surveys. Between 1984-2004, the city-wide tree cover remained between 22.1(+/-2.9)% and 28.8(+/-2.9)% of total land surface area. The District of Columbia did not experience an overall increase or decrease in total tree canopy area. Spatial patterns of tree cover variability were investigated to identify local scale changes in tree cover and connections with urban land use. Within the city, greatest variability was observed in low density residential zones. Tree cover proportion in these zones declined 7.4(+/-5.4)% in the years between 1990-1996 and recovered after 1996. Changes in tree cover were observed with high resolution aerial photography to determine relative contribution from fluctuation in the number of standing trees and changes in crown sizes. Land cover conversion removed dense tree cover from 50.2 hectares of the city's land surface between 1984-2004. The results demonstrate that tree cover variability in the District of Columbia occurred primarily within low population density residential areas. Neighborhoods

  14. A new method for dealing with measurement error in explanatory variables of regression models.

    PubMed

    Freedman, Laurence S; Fainberg, Vitaly; Kipnis, Victor; Midthune, Douglas; Carroll, Raymond J

    2004-03-01

    We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables. PMID:15032787

  15. Lazy decision trees

    SciTech Connect

    Friedman, J.H.; Yun, Yeogirl; Kohavi, R.

    1996-12-31

    Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single {open_quotes}best{close_quotes} decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm-LazyDT-that conceptually constructs the {open_quotes}best{close_quote} decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.

  16. Embedded Sensors for Measuring Surface Regression

    NASA Technical Reports Server (NTRS)

    Gramer, Daniel J.; Taagen, Thomas J.; Vermaak, Anton G.

    2006-01-01

    non-eroding end of the sensor. The sensor signal can be transmitted from inside a high-pressure chamber to the ambient environment, using commercially available feedthrough connectors. Miniaturized internal recorders or wireless data transmission could also potentially be employed to eliminate the need for producing penetrations in the chamber case. The rungs are designed so that as each successive rung is eroded away, the resistance changes by an amount that yields a readily measurable signal larger than the background noise. (In addition, signal-conditioning techniques are used in processing the resistance readings to mitigate the effect of noise.) Hence, each discrete change of resistance serves to indicate the arrival of the regressing host material front at the known depth of the affected resistor rung. The average rate of regression between two adjacent resistors can be calculated simply as the distance between the resistors divided by the time interval between their resistance jumps. Advanced data reduction techniques have also been developed to establish the instantaneous surface position and regression rate when the regressing front is between rungs.

  17. Regression Verification Using Impact Summaries

    NASA Technical Reports Server (NTRS)

    Backes, John; Person, Suzette J.; Rungta, Neha; Thachuk, Oksana

    2013-01-01

    Regression verification techniques are used to prove equivalence of syntactically similar programs. Checking equivalence of large programs, however, can be computationally expensive. Existing regression verification techniques rely on abstraction and decomposition techniques to reduce the computational effort of checking equivalence of the entire program. These techniques are sound but not complete. In this work, we propose a novel approach to improve scalability of regression verification by classifying the program behaviors generated during symbolic execution as either impacted or unimpacted. Our technique uses a combination of static analysis and symbolic execution to generate summaries of impacted program behaviors. The impact summaries are then checked for equivalence using an o-the-shelf decision procedure. We prove that our approach is both sound and complete for sequential programs, with respect to the depth bound of symbolic execution. Our evaluation on a set of sequential C artifacts shows that reducing the size of the summaries can help reduce the cost of software equivalence checking. Various reduction, abstraction, and compositional techniques have been developed to help scale software verification techniques to industrial-sized systems. Although such techniques have greatly increased the size and complexity of systems that can be checked, analysis of large software systems remains costly. Regression analysis techniques, e.g., regression testing [16], regression model checking [22], and regression verification [19], restrict the scope of the analysis by leveraging the differences between program versions. These techniques are based on the idea that if code is checked early in development, then subsequent versions can be checked against a prior (checked) version, leveraging the results of the previous analysis to reduce analysis cost of the current version. Regression verification addresses the problem of proving equivalence of closely related program

  18. Convex Regression with Interpretable Sharp Partitions

    PubMed Central

    Petersen, Ashley; Simon, Noah; Witten, Daniela

    2016-01-01

    We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.

  19. Phylogenetic Tree Reconstruction Accuracy and Model Fit when Proportions of Variable Sites Change across the Tree

    PubMed Central

    Grievink, Liat Shavit; Penny, David; Hendy, Michael D.; Holland, Barbara R.

    2010-01-01

    Commonly used phylogenetic models assume a homogeneous process through time in all parts of the tree. However, it is known that these models can be too simplistic as they do not account for nonhomogeneous lineage-specific properties. In particular, it is now widely recognized that as constraints on sequences evolve, the proportion and positions of variable sites can vary between lineages causing heterotachy. The extent to which this model misspecification affects tree reconstruction is still unknown. Here, we evaluate the effect of changes in the proportions and positions of variable sites on model fit and tree estimation. We consider 5 current models of nucleotide sequence evolution in a Bayesian Markov chain Monte Carlo framework as well as maximum parsimony (MP). We show that for a tree with 4 lineages where 2 nonsister taxa undergo a change in the proportion of variable sites tree reconstruction under the best-fitting model, which is chosen using a relative test, often results in the wrong tree. In this case, we found that an absolute test of model fit is a better predictor of tree estimation accuracy. We also found further evidence that MP is not immune to heterotachy. In addition, we show that increased sampling of taxa that have undergone a change in proportion and positions of variable sites is critical for accurate tree reconstruction. PMID:20525636

  20. Spontaneous regression of bronchogenic cyst accompanied by pneumonia.

    PubMed

    Himuro, Naoya; Minakata, Takao; Oshima, Yutaka; Kataoka, Daisuke; Yamamoto, Shigeru; Kadokura, Mitsutaka

    2015-12-01

    Bronchogenic cysts arise from abnormal budding of the ventral diverticulum of the foregut or tracheobronchial tree during embryogenesis, are the most common cystic masses in the mediastinum, and are generally asymptomatic. A spontaneous regression in a mediastinal bronchogenic cyst (MBC) with pneumonia is rare. A 30-year-old male had a tumor shadow in the middle mediastinum. When he visited our hospital, he had a mild fever with coughing and sputum. A chest computed tomography (CT) scan showed a decrease in the tumor size and the existence of right pneumonia. MBC may be involved in the etiology of pneumonia; therefore, bronchogenic cysts need to be resected as soon as possible. PMID:26943430

  1. Mortality rates associated with crown health for eastern forest tree species.

    PubMed

    Morin, Randall S; Randolph, KaDonna C; Steinman, Jim

    2015-03-01

    The condition of tree crowns is an important indicator of tree and forest health. Crown conditions have been evaluated during inventories of the US Forest Service Forest Inventory and Analysis (FIA) program since 1999. In this study, remeasured data from 55,013 trees on 2616 FIA plots in the eastern USA were used to assess the probability of survival among various tree species using the suite of FIA crown condition variables. Logistic regression procedures were employed to develop models for predicting tree survival. Results of the regression analyses indicated that crown dieback was the most important crown condition variable for predicting tree survival for all species combined and for many of the 15 individual species in the study. The logistic models were generally successful in representing recent tree mortality responses to multiyear infestations of beech bark disease and hemlock woolly adelgid. Although our models are only applicable to trees growing in a forest setting, the utility of models that predict impending tree mortality goes beyond forest inventory or traditional forestry growth and yield models and includes any application where managers need to assess tree health or predict tree mortality including urban forest, recreation, wildlife, and pest management. PMID:25655130

  2. Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs

    PubMed Central

    Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.

    2013-01-01

    Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118

  3. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  4. Evolutionary tree reconstruction

    NASA Technical Reports Server (NTRS)

    Cheeseman, Peter; Kanefsky, Bob

    1990-01-01

    It is described how Minimum Description Length (MDL) can be applied to the problem of DNA and protein evolutionary tree reconstruction. If there is a set of mutations that transform a common ancestor into a set of the known sequences, and this description is shorter than the information to encode the known sequences directly, then strong evidence for an evolutionary relationship has been found. A heuristic algorithm is described that searches for the simplest tree (smallest MDL) that finds close to optimal trees on the test data. Various ways of extending the MDL theory to more complex evolutionary relationships are discussed.

  5. The gravity apple tree

    NASA Astrophysics Data System (ADS)

    Espinosa Aldama, Mariana

    2015-04-01

    The gravity apple tree is a genealogical tree of the gravitation theories developed during the past century. The graphic representation is full of information such as guides in heuristic principles, names of main proponents, dates and references for original articles (See under Supplementary Data for the graphic representation). This visual presentation and its particular classification allows a quick synthetic view for a plurality of theories, many of them well validated in the Solar System domain. Its diachronic structure organizes information in a shape of a tree following similarities through a formal concept analysis. It can be used for educational purposes or as a tool for philosophical discussion.

  6. Food additives.

    PubMed

    Berglund, F

    1978-01-01

    The use of additives to food fulfils many purposes, as shown by the index issued by the Codex Committee on Food Additives: Acids, bases and salts; Preservatives, Antioxidants and antioxidant synergists; Anticaking agents; Colours; Emulfifiers; Thickening agents; Flour-treatment agents; Extraction solvents; Carrier solvents; Flavours (synthetic); Flavour enhancers; Non-nutritive sweeteners; Processing aids; Enzyme preparations. Many additives occur naturally in foods, but this does not exclude toxicity at higher levels. Some food additives are nutrients, or even essential nutritents, e.g. NaCl. Examples are known of food additives causing toxicity in man even when used according to regulations, e.g. cobalt in beer. In other instances, poisoning has been due to carry-over, e.g. by nitrate in cheese whey - when used for artificial feed for infants. Poisonings also occur as the result of the permitted substance being added at too high levels, by accident or carelessness, e.g. nitrite in fish. Finally, there are examples of hypersensitivity to food additives, e.g. to tartrazine and other food colours. The toxicological evaluation, based on animal feeding studies, may be complicated by impurities, e.g. orthotoluene-sulfonamide in saccharin; by transformation or disappearance of the additive in food processing in storage, e.g. bisulfite in raisins; by reaction products with food constituents, e.g. formation of ethylurethane from diethyl pyrocarbonate; by metabolic transformation products, e.g. formation in the gut of cyclohexylamine from cyclamate. Metabolic end products may differ in experimental animals and in man: guanylic acid and inosinic acid are metabolized to allantoin in the rat but to uric acid in man. The magnitude of the safety margin in man of the Acceptable Daily Intake (ADI) is not identical to the "safety factor" used when calculating the ADI. The symptoms of Chinese Restaurant Syndrome, although not hazardous, furthermore illustrate that the whole ADI

  7. Using Classification Trees to Predict Alumni Giving for Higher Education

    ERIC Educational Resources Information Center

    Weerts, David J.; Ronca, Justin M.

    2009-01-01

    As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…

  8. Regression analysis of networked data

    PubMed Central

    Zhou, Yan; Song, Peter X.-K.

    2016-01-01

    This paper concerns regression methodology for assessing relationships between multi-dimensional response variables and covariates that are correlated within a network. To address analytical challenges associated with the integration of network topology into the regression analysis, we propose a hybrid quadratic inference method that uses both prior and data-driven correlations among network nodes. A Godambe information-based tuning strategy is developed to allocate weights between the prior and data-driven network structures, so the estimator is efficient. The proposed method is conceptually simple and computationally fast, and has appealing large-sample properties. It is evaluated by simulation, and its application is illustrated using neuroimaging data from an association study of the effects of iron deficiency on auditory recognition memory in infants. PMID:27279658

  9. Observational Studies: Matching or Regression?

    PubMed

    Brazauskas, Ruta; Logan, Brent R

    2016-03-01

    In observational studies with an aim of assessing treatment effect or comparing groups of patients, several approaches could be used. Often, baseline characteristics of patients may be imbalanced between groups, and adjustments are needed to account for this. It can be accomplished either via appropriate regression modeling or, alternatively, by conducting a matched pairs study. The latter is often chosen because it makes groups appear to be comparable. In this article we considered these 2 options in terms of their ability to detect a treatment effect in time-to-event studies. Our investigation shows that a Cox regression model applied to the entire cohort is often a more powerful tool in detecting treatment effect as compared with a matched study. Real data from a hematopoietic cell transplantation study is used as an example. PMID:26712591

  10. Tree-Ring Based May-July Temperature Reconstruction Since AD 1630 on the Western Loess Plateau, China

    PubMed Central

    Song, Huiming; Liu, Yu; Li, Qiang; Gao, Na; Ma, Yongyong; Zhang, Yanhua

    2014-01-01

    Tree-ring samples from Chinese Pine (Pinus tabulaeformis Carr.) collected at Mt. Shimen on the western Loess Plateau, China, were used to reconstruct the mean May–July temperature during AD 1630–2011. The regression model explained 48% of the adjusted variance in the instrumentally observed mean May–July temperature. The reconstruction revealed significant temperature variations at interannual to decadal scales. Cool periods observed in the reconstruction coincided with reduced solar activities. The reconstructed temperature matched well with two other tree-ring based temperature reconstructions conducted on the northern slope of the Qinling Mountains (on the southern margin of the Loess Plateau of China) for both annual and decadal scales. In addition, this study agreed well with several series derived from different proxies. This reconstruction improves upon the sparse network of high-resolution paleoclimatic records for the western Loess Plateau, China. PMID:24690885

  11. Loops and trees

    NASA Astrophysics Data System (ADS)

    Caron-Huot, S.

    2011-05-01

    We investigate relations between loop and tree amplitudes in quantum field theory that involve putting on-shell some loop propagators. This generalizes the so-called Feynman tree theorem which is satisfied at 1-loop. Exploiting retarded boundary conditions, we give a generalization to ℓ-loop expressing the loops as integrals over the on-shell phase space of exactly ℓ particles. We argue that the corresponding integrand for ℓ > 2 does not involve the forward limit of any physical tree amplitude, except in planar gauge theories. In that case we explicitly construct the relevant physical amplitude. Beyond the planar limit, abandoning direct integral representations, we propose that loops continue to be determined implicitly by the forward limit of physical connected trees, and we formulate a precise conjecture along this line. Finally, we set up technology to compute forward amplitudes in supersymmetric theories, in which specific simplifications occur.

  12. Structural Equation Model Trees

    PubMed Central

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2015-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree structures that separate a data set recursively into subsets with significantly different parameter estimates in a SEM. SEM Trees provide means for finding covariates and covariate interactions that predict differences in structural parameters in observed as well as in latent space and facilitate theory-guided exploration of empirical data. We describe the methodology, discuss theoretical and practical implications, and demonstrate applications to a factor model and a linear growth curve model. PMID:22984789

  13. Tree Nut Allergies

    MedlinePlus

    ... tree nut used on the label. Read all product labels carefully before purchasing and consuming any item. Ingredients ... Getting Started Newly Diagnosed Emergency Care Plan Food Labels Mislabeled Products Tips for Managing Food Allergies Resources For... Most ...

  14. The tree BVOC index.

    PubMed

    Simpson, J R; McPherson, E G

    2011-01-01

    Urban trees can produce a number of benefits, among them improved air quality. Biogenic volatile organic compounds (BVOCs) emitted by some species are ozone precursors. Modifying future tree planting to favor lower-emitting species can reduce these emissions and aid air management districts in meeting federally mandated emissions reductions for these compounds. Changes in BVOC emissions are calculated as the result of transitioning to a lower-emitting species mix in future planting. A simplified method for calculating the emissions reduction and a Tree BVOC index based on the calculated reduction is described. An example illustrates the use of the index as a tool for implementation and monitoring of a tree program designed to reduce BVOC emissions as a control measure being developed as part of the State Implementation Plan (SIP) for the Sacramento Federal Nonattainment Area. PMID:21435760

  15. Generalized constructive tree weights

    SciTech Connect

    Rivasseau, Vincent E-mail: adrian.tanasa@ens-lyon.org; Tanasa, Adrian E-mail: adrian.tanasa@ens-lyon.org

    2014-04-15

    The Loop Vertex Expansion (LVE) is a quantum field theory (QFT) method which explicitly computes the Borel sum of Feynman perturbation series. This LVE relies in a crucial way on symmetric tree weights which define a measure on the set of spanning trees of any connected graph. In this paper we generalize this method by defining new tree weights. They depend on the choice of a partition of a set of vertices of the graph, and when the partition is non-trivial, they are no longer symmetric under permutation of vertices. Nevertheless we prove they have the required positivity property to lead to a convergent LVE; in fact we formulate this positivity property precisely for the first time. Our generalized tree weights are inspired by the Brydges-Battle-Federbush work on cluster expansions and could be particularly suited to the computation of connected functions in QFT. Several concrete examples are explicitly given.

  16. Tea tree oil.

    PubMed

    Larson, David; Jacob, Sharon E

    2012-01-01

    Tea tree oil is an increasingly popular ingredient in a variety of household and cosmetic products, including shampoos, massage oils, skin and nail creams, and laundry detergents. Known for its potential antiseptic properties, it has been shown to be active against a variety of bacteria, fungi, viruses, and mites. The oil is extracted from the leaves of the tea tree via steam distillation. This essential oil possesses a sharp camphoraceous odor followed by a menthol-like cooling sensation. Most commonly an ingredient in topical products, it is used at a concentration of 5% to 10%. Even at this concentration, it has been reported to induce contact sensitization and allergic contact dermatitis reactions. In 1999, tea tree oil was added to the North American Contact Dermatitis Group screening panel. The latest prevalence rates suggest that 1.4% of patients referred for patch testing had a positive reaction to tea tree oil. PMID:22653070

  17. Tree-bank grammars

    SciTech Connect

    Charniak, E.

    1996-12-31

    By a {open_quotes}tree-bank grammar{close_quotes} we mean a context-free grammar created by reading the production rules directly from hand-parsed sentences in a tree bank. Common wisdom has it that such grammars do not perform well, though we know of no published data on the issue. The primary purpose of this paper is to show that the common wisdom is wrong. In particular, we present results on a tree-bank grammar based on the Penn Wall Street Journal tree bank. To the best of our knowledge, this grammar outperforms all other non-word-based statistical parsers/grammars on this corpus. That is, it outperforms parsers that consider the input as a string of tags and ignore the actual words of the corpus.

  18. Leonardo's Tree Theory.

    ERIC Educational Resources Information Center

    Werner, Suzanne K.

    2003-01-01

    Describes a series of activities exploring Leonardo da Vinci's tree theory that are designed to strengthen 8th grade students' data collection and problem solving skills in physical science classes. (KHR)

  19. Trees for reclamation

    SciTech Connect

    Not Available

    1980-01-01

    Land reclamation programs sponsored by several state forestry organizations are summarized in these presentations. The use of trees as a preferred specie for revegetation of surface mined lands is addressed. Modern methods of forestry can be used to make land economically and aesthetically acceptable. Tree planting techniques are presented and the role of Mycorrhizae is discussed. There are 30 papers included in this proceedings. States represented include: Alabama, Arkansas, Georgia, Illinois, Kansas, Kentucky, Maryland, Virginia, Iowa, Ohio, Pennsylvania, and West Virginia.

  20. Climatic response of annual tree-rings

    NASA Astrophysics Data System (ADS)

    Ageev, Boris G.; Gruzdev, Aleksandr N.; Ponomarev, Yurii N.; Sapozhnikova, Valeria A.

    2014-11-01

    Extensive literature devoted to investigations into the influence of environmental conditions on the plant respiration and respiration rate. It is generally accepted that the respired CO2 generated in a stem completely diffuses into the atmosphere. Results obtained from explorations into the CO2 content in disc tree rings by the method proposed in this work shows that a major part of CO2 remains in tree stems and exhibits inter-annual variability. Different methods are used to describe of CO2 and H2O distributions in disc tree rings. The relation of CO2 and H2O variations in a Siberian stone pine disc to meteorological parameters are analyzed with use of wavelet, spectral and cross-spectral techniques. According to a multiple linear regression model, the time evolution of the width of Siberian stone pine rings can be partly explained by a combined influence of air temperature, precipitation, cloudiness and solar activity. Conclusions are made regarding the response of the CO2 and H2O content in coniferous tree disc rings to various climatic factors. Suggested method of CO2, (CO2+H2O) detection can be used for studying of a stem respiration in ecological risk areas.

  1. Tree Topology Estimation

    PubMed Central

    Estrada, Rolando; Tomasi, Carlo; Schmidler, Scott C.; Farsiu, Sina

    2015-01-01

    Tree-like structures are fundamental in nature, and it is often useful to reconstruct the topology of a tree—what connects to what—from a two-dimensional image of it. However, the projected branches often cross in the image: the tree projects to a planar graph, and the inverse problem of reconstructing the topology of the tree from that of the graph is ill-posed. We regularize this problem with a generative, parametric tree-growth model. Under this model, reconstruction is possible in linear time if one knows the direction of each edge in the graph—which edge endpoint is closer to the root of the tree—but becomes NP-hard if the directions are not known. For the latter case, we present a heuristic search algorithm to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image. Experimental results on retinal vessel, plant root, and synthetic tree datasets show that our methodology is both accurate and efficient. PMID:26353004

  2. Exposure and effects of perfluoroalkyl substances in tree swallows nesting in Minnesota and Wisconsin, USA

    USGS Publications Warehouse

    Custer, Christine M.; Custer, Thomas W.; Dummer, Paul; Etterson, Matthew A.; Thogmartin, Wayne E.; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C.

    2013-01-01

    The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150–200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed.

  3. Exposure and effects of perfluoroalkyl substances in tree swallows nesting in Minnesota and Wisconsin, USA.

    PubMed

    Custer, Christine M; Custer, Thomas W; Dummer, Paul M; Etterson, Matthew A; Thogmartin, Wayne E; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C

    2014-01-01

    The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150-200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed. PMID:23860575

  4. How Trees Can Save Energy.

    ERIC Educational Resources Information Center

    Fazio, James R., Ed.

    1991-01-01

    This document might easily have been called "How To Use Trees To Save Energy". It presents the energy saving advantages of landscaping the home and community with trees. The discussion includes: (1) landscaping advice to obtain the benefits of tree shade; (2) the heat island phenomenon in cities; (3) how and where to properly plant trees for…

  5. Tree growth and competition in an old-growth Picea abies forest of boreal Sweden: influence of tree spatial patterning

    USGS Publications Warehouse

    Fraver, Shawn; D'Amato, Anthony W.; Bradford, John B.; Jonsson, Bengt Gunnar; Jönsson, Mari; Esseen, Per-Anders

    2013-01-01

    Question: What factors best characterize tree competitive environments in this structurally diverse old-growth forest, and do these factors vary spatially within and among stands? Location: Old-growth Picea abies forest of boreal Sweden. Methods: Using long-term, mapped permanent plot data augmented with dendrochronological analyses, we evaluated the effect of neighbourhood competition on focal tree growth by means of standard competition indices, each modified to include various metrics of trees size, neighbour mortality weighting (for neighbours that died during the inventory period), and within-neighbourhood tree clustering. Candidate models were evaluated using mixed-model linear regression analyses, with mean basal area increment as the response variable. We then analysed stand-level spatial patterns of competition indices and growth rates (via kriging) to determine if the relationship between these patterns could further elucidate factors influencing tree growth. Results: Inter-tree competition clearly affected growth rates, with crown volume being the size metric most strongly influencing the neighbourhood competitive environment. Including neighbour tree mortality weightings in models only slightly improved descriptions of competitive interactions. Although the within-neighbourhood clustering index did not improve model predictions, competition intensity was influenced by the underlying stand-level tree spatial arrangement: stand-level clustering locally intensified competition and reduced tree growth, whereas in the absence of such clustering, inter-tree competition played a lesser role in constraining tree growth. Conclusions: Our findings demonstrate that competition continues to influence forest processes and structures in an old-growth system that has not experienced major disturbances for at least two centuries. The finding that the underlying tree spatial pattern influenced the competitive environment suggests caution in interpreting traditional tree

  6. Potlining Additives

    SciTech Connect

    Rudolf Keller

    2004-08-10

    In this project, a concept to improve the performance of aluminum production cells by introducing potlining additives was examined and tested. Boron oxide was added to cathode blocks, and titanium was dissolved in the metal pool; this resulted in the formation of titanium diboride and caused the molten aluminum to wet the carbonaceous cathode surface. Such wetting reportedly leads to operational improvements and extended cell life. In addition, boron oxide suppresses cyanide formation. This final report presents and discusses the results of this project. Substantial economic benefits for the practical implementation of the technology are projected, especially for modern cells with graphitized blocks. For example, with an energy savings of about 5% and an increase in pot life from 1500 to 2500 days, a cost savings of $ 0.023 per pound of aluminum produced is projected for a 200 kA pot.

  7. Phosphazene additives

    SciTech Connect

    Harrup, Mason K; Rollins, Harry W

    2013-11-26

    An additive comprising a phosphazene compound that has at least two reactive functional groups and at least one capping functional group bonded to phosphorus atoms of the phosphazene compound. One of the at least two reactive functional groups is configured to react with cellulose and the other of the at least two reactive functional groups is configured to react with a resin, such as an amine resin of a polycarboxylic acid resin. The at least one capping functional group is selected from the group consisting of a short chain ether group, an alkoxy group, or an aryloxy group. Also disclosed are an additive-resin admixture, a method of treating a wood product, and a wood product.

  8. GRFT - Genetic Records Family Tree Web Applet.

    PubMed

    Pimentel, Samuel; Walbot, Virginia; Fernandes, John

    2011-01-01

    Current software for storing and displaying records of genetic crosses does not provide an easy way to determine the lineage of an individual. The genetic records family tree (GRFT) applet processes records of genetic crosses and allows researchers to quickly visualize lineages using a family tree construct and to access other information from these records using any Internet browser. Users select from three display features: (1) a family tree view which displays a color-coded family tree for an individual, (2) a sequential list of crosses, and (3) a list of crosses matching user-defined search criteria. Each feature contains options to specify the number of records shown and the latter two contain an option to filter results by the owner of the cross. The family tree feature is interactive, displaying a popup box with genetic information when the user mouses over an individual and allowing the user to draw a new tree by clicking on any individual in the current tree. The applet is written in JavaScript and reads genetic records from a tab-delimited text file on the server, so it is cross-platform, can be accessed by anyone with an Internet connection, and supports almost instantaneous generation of new trees and table lists. Researchers can use the tool with their own genetic cross records for any sexually reproducing organism. No additional software is required and with only minor modifications to the script, researchers can add their own custom columns. GRFT's speed, versatility, and low overhead make it an effective and innovative visualization method for genetic records. A sample tool is available at http://stanford.edu/walbot/grft-sample.html. PMID:22303311

  9. Heteroscedastic transformation cure regression models.

    PubMed

    Chen, Chyong-Mei; Chen, Chen-Hsin

    2016-06-30

    Cure models have been applied to analyze clinical trials with cures and age-at-onset studies with nonsusceptibility. Lu and Ying (On semiparametric transformation cure model. Biometrika 2004; 91:331?-343. DOI: 10.1093/biomet/91.2.331) developed a general class of semiparametric transformation cure models, which assumes that the failure times of uncured subjects, after an unknown monotone transformation, follow a regression model with homoscedastic residuals. However, it cannot deal with frequently encountered heteroscedasticity, which may result from dispersed ranges of failure time span among uncured subjects' strata. To tackle the phenomenon, this article presents semiparametric heteroscedastic transformation cure models. The cure status and the failure time of an uncured subject are fitted by a logistic regression model and a heteroscedastic transformation model, respectively. Unlike the approach of Lu and Ying, we derive score equations from the full likelihood for estimating the regression parameters in the proposed model. The similar martingale difference function to their proposal is used to estimate the infinite-dimensional transformation function. Our proposed estimating approach is intuitively applicable and can be conveniently extended to other complicated models when the maximization of the likelihood may be too tedious to be implemented. We conduct simulation studies to validate large-sample properties of the proposed estimators and to compare with the approach of Lu and Ying via the relative efficiency. The estimating method and the two relevant goodness-of-fit graphical procedures are illustrated by using breast cancer data and melanoma data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26887342

  10. Aging on Parisi's Tree

    NASA Astrophysics Data System (ADS)

    Bouchaud, J.-P.; Dean, D. S.

    1995-03-01

    We present a detailed study of simple “tree" models for off equilibrium dynamics and aging in glassy systems. The simplest tree describes the landscape of a random energy model, whereas multifurcating trees occur in the solution of the Sherrington-Kirkpatrick model. An important ingredient taken from these models is the exponential distribution of deep free-energies, which translate into a power-law distribution of the residence time within metastable “valleys". These power law distributions have infinite mean in the spin-glass phase and this leads to the aging phenomenon. To each level of the tree is associated an overlap and the exponent of the time distribution. We solve these models for a finite (but arbitrary) number of levels and show that a two-level tree accounts very well for many experimental observations (thermoremanent magnetization, a.c. susceptibility, second noise spectrum....). We introduce the idea that the deepest levels of the tree correspond to equilibrium dynamics whereas the upper levels correspond to aging. Temperature cycling experiments suggest that the borderline between the two is temperature dependent. The spin-glass transition corresponds to the temperature at which the uppermost level is put out of equilibrium but is subsequently followed by a sequence of (dynamical) phase transitions corresponding to non equilibrium dynamics within deeper and deeper levels. We tentatively try to relate this “tree" picture to the real space “droplet" model, and speculate on how the final description of spin-glasses might look like.

  11. Regression analysis of cytopathological data

    SciTech Connect

    Whittemore, A.S.; McLarty, J.W.; Fortson, N.; Anderson, K.

    1982-12-01

    Epithelial cells from the human body are frequently labelled according to one of several ordered levels of abnormality, ranging from normal to malignant. The label of the most abnormal cell in a specimen determines the score for the specimen. This paper presents a model for the regression of specimen scores against continuous and discrete variables, as in host exposure to carcinogens. Application to data and tests for adequacy of model fit are illustrated using sputum specimens obtained from a cohort of former asbestos workers.

  12. Meteorological Factors and Tree Characteristics Influencing the Initiation and Rate of Stemflow from Deciduous Trees in an Urban Park

    NASA Astrophysics Data System (ADS)

    Schooling, J. T.; Carlyle-Moses, D. E.

    2013-12-01

    Stemflow, SF, represents that portion of precipitation that is intercepted by a tree's canopy and diverted to the ground at the tree base by flowing along branches and down the bole. The focused input of water and nutrients associated with SF have been shown to be of hydrological and biogeochemical importance in a number of plant communities and forest environments. Although the concentrated water volume and the nutrient / pollutant fluxes associated with SF in urban areas may be highly relevant for stormwater quantity and quality management, they have received only minor study in built environments. In an urban park in Kamloops, British Columbia, Canada, SF volumes generated from 40 deciduous trees representing 22 species were sampled on a precipitation event basis over a period of 16 months. Using this data, we derived the threshold rainfall depth required for SF initiation from each tree by taking the absolute value of the y-intercept of the linear regression of SF volume versus rainfall depth divided by the slope of that regression. The SF discharge rate once the threshold rainfall depth had been reached was taken as the slope of the linear regression equation. Thus, a simplified SF equation was developed: SFv = QSF x (Pg = Pg''), where SFv is stemflow volume (litres), QSF is the discharge rate (litres / mm), and Pg and Pg' represent the precipitation depth and the threshold precipitation depth, respectively. We then examined the influence of meteorological factors (precipitation type [rain / snow / rain + snow], precipitation depth, rainfall intensity, wind speed and direction, and vapour pressure deficit), and tree characteristics (tree diameter at breast height, tree height, leaf size and orientation, bark roughness, crown projection area, leaf area index, canopy cover fraction, branching angle, the proportion of the crown that was comprised of branches, and overlap with other tree canopies) on QSF and Pg' in order to expand on the simplified model and

  13. How tree roots respond to drought

    PubMed Central

    Brunner, Ivano; Herzog, Claude; Dawes, Melissa A.; Arend, Matthias; Sperisen, Christoph

    2015-01-01

    The ongoing climate change is characterized by increased temperatures and altered precipitation patterns. In addition, there has been an increase in both the frequency and intensity of extreme climatic events such as drought. Episodes of drought induce a series of interconnected effects, all of which have the potential to alter the carbon balance of forest ecosystems profoundly at different scales of plant organization and ecosystem functioning. During recent years, considerable progress has been made in the understanding of how aboveground parts of trees respond to drought and how these responses affect carbon assimilation. In contrast, processes of belowground parts are relatively underrepresented in research on climate change. In this review, we describe current knowledge about responses of tree roots to drought. Tree roots are capable of responding to drought through a variety of strategies that enable them to avoid and tolerate stress. Responses include root biomass adjustments, anatomical alterations, and physiological acclimations. The molecular mechanisms underlying these responses are characterized to some extent, and involve stress signaling and the induction of numerous genes, leading to the activation of tolerance pathways. In addition, mycorrhizas seem to play important protective roles. The current knowledge compiled in this review supports the view that tree roots are well equipped to withstand drought situations and maintain morphological and physiological functions as long as possible. Further, the reviewed literature demonstrates the important role of tree roots in the functioning of forest ecosystems and highlights the need for more research in this emerging field. PMID:26284083

  14. MetaTreeMap: An Alternative Visualization Method for Displaying Metagenomic Phylogenic Trees

    PubMed Central

    Taylor, Todd D.

    2016-01-01

    Metagenomic samples can contain hundreds or thousands of different species. The most common method to identify these species is to sequence the samples and then classify the reads to nodes along a phylogenic tree. Linear representations of trees with so many nodes face legibility issues. In addition, such views are not optimal for appreciating the read quantity assigned to each node. The problem is exaggerated when comparison between multiple samples is needed. MetaTreeMap adapts a visualization method that addresses these weaknesses. The tree is represented by nested rectangles that illustrate the number or percentage of assigned reads. MetaTreeMap implements various options specific to phylogenic trees that allow for quick overview and investigation of the information. More generally, the goal of this software is to provide the user with the ability to easily display phylogenic trees based on various quantities assigned to the nodes, such as read number, percentage or other values. The tool can be used online at http://metasystems.riken.jp/visualization/treemap/. PMID:27336370

  15. Multiatlas segmentation as nonparametric regression.

    PubMed

    Awate, Suyash P; Whitaker, Ross T

    2014-09-01

    This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems. PMID:24802528

  16. Reinforcement Learning Trees

    PubMed Central

    Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

    2015-01-01

    In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687

  17. Tree Testing of Hierarchical Menu Structures for Health Applications

    PubMed Central

    Le, Thai; Chaudhuri, Shomir; Chung, Jane; Thompson, Hilaire J; Demiris, George

    2014-01-01

    To address the need for greater evidence-based evaluation of Health Information Technology (HIT) systems we introduce a method of usability testing termed tree testing. In a tree test, participants are presented with an abstract hierarchical tree of the system taxonomy and asked to navigate through the tree in completing representative tasks. We apply tree testing to a commercially available health application, demonstrating a use case and providing a comparison with more traditional in-person usability testing methods. Online tree tests (N=54) and in-person usability tests (N=15) were conducted from August to September 2013. Tree testing provided a method to quantitatively evaluate the information structure of a system using various navigational metrics including completion time, task accuracy, and path length. The results of the analyses compared favorably to the results seen from the traditional usability test. Tree testing provides a flexible, evidence-based approach for researchers to evaluate the information structure of HITs. In addition, remote tree testing provides a quick, flexible, and high volume method of acquiring feedback in a structured format that allows for quantitative comparisons. With the diverse nature and often large quantities of health information available, addressing issues of terminology and concept classifications during the early development process of a health information system will improve navigation through the system and save future resources. Tree testing is a usability method that can be used to quickly and easily assess information hierarchy of health information systems. PMID:24582924

  18. Mapping geogenic radon potential by regression kriging.

    PubMed

    Pásztor, László; Szabó, Katalin Zsuzsanna; Szatmári, Gábor; Laborczi, Annamária; Horváth, Ákos

    2016-02-15

    Radon ((222)Rn) gas is produced in the radioactive decay chain of uranium ((238)U) which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on the physical and meteorological parameters of the soil and can enter and accumulate in buildings. Health risks originating from indoor radon concentration can be attributed to natural factors and is characterized by geogenic radon potential (GRP). Identification of areas with high health risks require spatial modeling, that is, mapping of radon risk. In addition to geology and meteorology, physical soil properties play a significant role in the determination of GRP. In order to compile a reliable GRP map for a model area in Central-Hungary, spatial auxiliary information representing GRP forming environmental factors were taken into account to support the spatial inference of the locally measured GRP values. Since the number of measured sites was limited, efficient spatial prediction methodologies were searched for to construct a reliable map for a larger area. Regression kriging (RK) was applied for the interpolation using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly, the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Overall accuracy of the map was tested by Leave-One-Out Cross-Validation. Furthermore the spatial reliability of the resultant map is also estimated by the calculation of the 90% prediction interval of the local prediction values. The applicability of the applied method as well as that of the map is discussed briefly. PMID:26706761

  19. The Fault Tree Compiler (FTC): Program and mathematics

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Martensen, Anna L.

    1989-01-01

    The Fault Tree Compiler Program is a new reliability tool used to predict the top-event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, AND m OF n gates. The high-level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precisely (within the limits of double precision floating point arithmetic) within a user specified number of digits accuracy. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Equipment Corporation (DEC) VAX computer with the VMS operation system.

  20. Tree nut allergens.

    PubMed

    Roux, Kenneth H; Teuber, Suzanne S; Sathe, Shridhar K

    2003-08-01

    Allergic reactions to tree nuts can be serious and life threatening. Considerable research has been conducted in recent years in an attempt to characterize those allergens that are most responsible for allergy sensitization and triggering. Both native and recombinant nut allergens have been identified and characterized and, for some, the IgE-reactive epitopes described. Some allergens, such as lipid transfer proteins, profilins, and members of the Bet v 1-related family, represent minor constituents in tree nuts. These allergens are frequently cross-reactive with other food and pollen homologues, and are considered panallergens. Others, such as legumins, vicilins, and 2S albumins, represent major seed storage protein constituents of the nuts. The allergenic tree nuts discussed in this review include those most commonly responsible for allergic reactions such as hazelnut, walnut, cashew, and almond as well as those less frequently associated with allergies including pecan, chestnut, Brazil nut, pine nut, macadamia nut, pistachio, coconut, Nangai nut, and acorn. PMID:12915766

  1. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods

    PubMed Central

    Huynh-Thu, Vân Anh; Irrthum, Alexandre; Wehenkel, Louis; Geurts, Pierre

    2010-01-01

    One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. PMID:20927193

  2. Inferring regulatory networks from expression data using tree-based methods.

    PubMed

    Huynh-Thu, Vân Anh; Irrthum, Alexandre; Wehenkel, Louis; Geurts, Pierre

    2010-01-01

    One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. PMID:20927193

  3. Long Tree-Ring Chronologies Provide Evidence of Recent Tree Growth Decrease in a Central African Tropical Forest

    PubMed Central

    Battipaglia, Giovanna; Zalloni, Enrica; Castaldi, Simona; Marzaioli, Fabio; Cazzolla- Gatti, Roberto; Lasserre, Bruno; Tognetti, Roberto; Marchetti, Marco; Valentini, Riccardo

    2015-01-01

    It is still unclear whether the exponential rise of atmospheric CO2 concentration has produced a fertilization effect on tropical forests, thus incrementing their growth rate, in the last two centuries. As many factors affect tree growth patterns, short -term studies might be influenced by the confounding effect of several interacting environmental variables on plant growth. Long-term analyses of tree growth can elucidate long-term trends of plant growth response to dominant drivers. The study of annual rings, applied to long tree-ring chronologies in tropical forest trees enables such analysis. Long-term tree-ring chronologies of three widespread African species were measured in Central Africa to analyze the growth of trees over the last two centuries. Growth trends were correlated to changes in global atmospheric CO2 concentration and local variations in the main climatic drivers, temperature and rainfall. Our results provided no evidence for a fertilization effect of CO2 on tree growth. On the contrary, an overall growth decline was observed for all three species in the last century, which appears to be significantly correlated to the increase in local temperature. These findings provide additional support to the global observations of a slowing down of C sequestration in the trunks of forest trees in recent decades. Data indicate that the CO2 increase alone has not been sufficient to obtain a tree growth increase in tropical trees. The effect of other changing environmental factors, like temperature, may have overridden the fertilization effect of CO2. PMID:25806946

  4. Basal physiological parameters in domesticated tree shrews (Tupaia belangeri chinensis).

    PubMed

    Wang, Jing; Xu, Xin-Li; Ding, Ze-Yang; Mao, Rong-Rong; Zhou, Qi-Xin; Lü, Long-Bao; Wang, Li-Ping; Wang, Shuang; Zhang, Chen; Xu, Lin; Yang, Yue-Xiong

    2013-04-01

    Establishing non-human primate models of human diseases is an efficient way to narrow the large gap between basic studies and translational medicine. Multifold advantages such as simplicity of breeding, low cost of feeding and facility of operating make the tree shrew an ideal non-human primate model proxy. Additional features like vulnerability to stress and spontaneous diabetic characteristics also indicate that the tree shrew could be a potential new animal model of human diseases. However, basal physiological indexes of tree shrew, especially those related to human disease, have not been systematically reported. Accordingly, we established important basal physiological indexes of domesticated tree shrews including several factors: (1) body weight, (2) core body temperature and rhythm, (3) diet metabolism, (4) locomotor rhythm, (5) electroencephalogram, (6) glycometabolism and (7) serum and urinary hormone level and urinary cortisol rhythm. We compared the physiological parameters of domesticated tree shrew with that of rats and macaques. Results showed that (a) the core body temperature of the tree shrew was 39.59±0.05 ℃, which was higher than that of rats and macaques; (b) Compared with wild tree shrews, with two activity peaks, domesticated tree shrews had only one activity peak from 17:30 to 19:30; (c) Compared with rats, tree shrews had poor carbohydrate metabolism ability; and (d) Urinary cortisol rhythm indicated there were two peaks at 8:00 and 17:00 in domesticated tree shrews, which matched activity peaks in wild tree shrews. These results provided basal physiological indexes for domesticated tree shrews and laid an important foundation for diabetes and stress-related disease models established on tree shrews. PMID:23572369

  5. A Gibbs sampler for multivariate linear regression

    NASA Astrophysics Data System (ADS)

    Mantz, Adam B.

    2016-04-01

    Kelly described an efficient algorithm, using Gibbs sampling, for performing linear regression in the fairly general case where non-zero measurement errors exist for both the covariates and response variables, where these measurements may be correlated (for the same data point), where the response variable is affected by intrinsic scatter in addition to measurement error, and where the prior distribution of covariates is modelled by a flexible mixture of Gaussians rather than assumed to be uniform. Here, I extend the Kelly algorithm in two ways. First, the procedure is generalized to the case of multiple response variables. Secondly, I describe how to model the prior distribution of covariates using a Dirichlet process, which can be thought of as a Gaussian mixture where the number of mixture components is learned from the data. I present an example of multivariate regression using the extended algorithm, namely fitting scaling relations of the gas mass, temperature, and luminosity of dynamically relaxed galaxy clusters as a function of their mass and redshift. An implementation of the Gibbs sampler in the R language, called LRGS, is provided.

  6. Multivariate Regression with Block-structured Predictors

    NASA Astrophysics Data System (ADS)

    Ye, Saier

    We study the problem of predicting multiple responses with a common set of predicting variables. Applying generalized Ordinary Least Squares (OLS) criterion on the responses altogether is practically equivalent to OLS estimation on the responses separately. Possible correlations between the response variables are overlooked. In order to take advantage of these interrelationships, Reduced-Rank Regression (RRR) imposes rank constraint on the coefficient matrix. RRR constructs latent factors from the original predicting variables, and the latent factors are the effective predictors. RRR reduces number of parameters to be estimated, and improves estimation efficiency. In the present work, we explore a novel regression model to incorporate "block-structured" predicting variables, where the predictors can be naturally partitioned into several groups or blocks. Variables in the same block share similar characteristics. It is reasonable to assume that in addition to an overall impact, predictors also have block-specific effects on the responses. Furthermore, we impose rank constraints on the coefficient matrices. In our framework, we construct two types of latent factors that drive the variation in the responses. We have joint factors, which are formed by all predictors across all blocks; and individual factors, which are formed by variables within individual blocks. The proposed method exceeds RRR in terms of prediction accuracy and ease of interpretation in the presence of block structure in the predicting variables.

  7. Practical Session: Multiple Linear Regression

    NASA Astrophysics Data System (ADS)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).

  8. Residuals and regression diagnostics: focusing on logistic regression.

    PubMed

    Zhang, Zhongheng

    2016-05-01

    Up to now I have introduced most steps in regression model building and validation. The last step is to check whether there are observations that have significant impact on model coefficient and specification. The article firstly describes plotting Pearson residual against predictors. Such plots are helpful in identifying non-linearity and provide hints on how to transform predictors. Next, I focus on observations of outlier, leverage and influence that may have significant impact on model building. Outlier is such an observation that its response value is unusual conditional on covariate pattern. Leverage is an observation with covariate pattern that is far away from the regressor space. Influence is the product of outlier and leverage. That is, when influential observation is dropped from the model, there will be a significant shift of the coefficient. Summary statistics for outlier, leverage and influence are studentized residuals, hat values and Cook's distance. They can be easily visualized with graphs and formally tested using the car package. PMID:27294091

  9. The Inference of Gene Trees with Species Trees

    PubMed Central

    Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

    2015-01-01

    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970

  10. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  11. Tree attenuation at 20 GHz: Foliage effects

    NASA Astrophysics Data System (ADS)

    Vogel, Wolfhard J.; Goldhirsh, Julius

    1993-08-01

    Static tree attenuation measurements at 20 GHz (K-Band) on a 30 deg slant path through a mature Pecan tree with and without leaves showed median fades exceeding approximately 23 dB and 7 dB, respectively. The corresponding 1% probability fades were 43 dB and 25 dB. Previous 1.6 GHz (L-Band) measurements for the bare tree case showed fades larger than those at K-Band by 3.4 dB for the median and smaller by approximately 7 dB at the 1% probability. While the presence of foliage had only a small effect on fading at L-Band (approximately 1 dB additional for the median to 1% probability range), the attenuation increase was significant at K-Band, where it increased by about 17 dB over the same probability range.

  12. Tree attenuation at 20 GHz: Foliage effects

    NASA Technical Reports Server (NTRS)

    Vogel, Wolfhard J.; Goldhirsh, Julius

    1993-01-01

    Static tree attenuation measurements at 20 GHz (K-Band) on a 30 deg slant path through a mature Pecan tree with and without leaves showed median fades exceeding approximately 23 dB and 7 dB, respectively. The corresponding 1% probability fades were 43 dB and 25 dB. Previous 1.6 GHz (L-Band) measurements for the bare tree case showed fades larger than those at K-Band by 3.4 dB for the median and smaller by approximately 7 dB at the 1% probability. While the presence of foliage had only a small effect on fading at L-Band (approximately 1 dB additional for the median to 1% probability range), the attenuation increase was significant at K-Band, where it increased by about 17 dB over the same probability range.

  13. Forward estimation for game-tree search

    SciTech Connect

    Zhang, Weixiong

    1996-12-31

    It is known that bounds on the minimax values of nodes in a game tree can be used to reduce the computational complexity of minimax search for two-player games. We describe a very simple method to estimate bounds on the minimax values of interior nodes of a game tree, and use the bounds to improve minimax search. The new algorithm, called forward estimation, does not require additional domain knowledge other than a static node evaluation function, and has small constant overhead per node expansion. We also propose a variation of forward estimation, which provides a tradeoff between computational complexity and decision quality. Our experimental results show that forward estimation outperforms alpha-beta pruning on random game trees and the game of Othello.

  14. Tree-Ties.

    ERIC Educational Resources Information Center

    Gresczyk, Rick

    Created to help students understand how plants were used for food, for medicine, and for arts and crafts among the Ojibwe (Chippewa) Indians, the game Tree-Ties combines earth and social sciences within a specific culture. The game requires mutual respect, understanding, and agreement to succeed. Sounding like the word "treaties", the title is a…

  15. Christmas Tree Category Manual.

    ERIC Educational Resources Information Center

    Bowman, James S.; Turmel, Jon P.

    This manual provides information needed to meet the standards for pesticide applicator certification. Pests and diseases of christmas tree plantations are identified and discussed. Section one deals with weeds and woody plants and the application, formulation and effects of herbicides in controlling them. Section two discusses specific diseases…

  16. Tree theorem for inflation

    SciTech Connect

    Weinberg, Steven

    2008-09-15

    It is shown that the generating function for tree graphs in the ''in-in'' formalism may be calculated by solving the classical equations of motion subject to certain constraints. This theorem is illustrated by application to the evolution of a single inflaton field in a Robertson-Walker background.

  17. A Universal Phylogenetic Tree.

    ERIC Educational Resources Information Center

    Offner, Susan

    2001-01-01

    Presents a universal phylogenetic tree suitable for use in high school and college-level biology classrooms. Illustrates the antiquity of life and that all life is related, even if it dates back 3.5 billion years. Reflects important evolutionary relationships and provides an exciting way to learn about the history of life. (SAH)

  18. MPI File Tree Walk

    2007-04-30

    MPI-FTW is a scalable MPI based software application that navigates a directory tree by dynamically allocating processes to navigate sub-directories found. Upon completion, MPI-FTW provides statistics on the number of directories found, files found, and time to complete. Inaddition, commands can be executed at each directory level.

  19. Starting Trees from Cuttings.

    ERIC Educational Resources Information Center

    Kramer, David C.

    1983-01-01

    Describes a procedure for starting tree cuttings from woody plants, explaining "lag time," recommending materials, and giving step-by-step instructions for rooting and planting. Points out species which are likely candidates for cuttings and provides tips for teachers for developing a unit. (JM)

  20. The Medicine Tree.

    ERIC Educational Resources Information Center

    Brokenleg, Martin

    2000-01-01

    Demographic changes in population continue to bring children of different cultural backgrounds to classrooms. This article provides suggestions teachers and counselors can use to bridge cultures. Using the parable of a medicine tree, it explains how no society can endure without caring for its young. (Author/JDM)

  1. Phylogenics & Tree-Thinking

    ERIC Educational Resources Information Center

    Baum, David A.; Offner, Susan

    2008-01-01

    Phylogenetic trees, which are depictions of the inferred evolutionary relationships among a set of species, now permeate almost all branches of biology and are appearing in increasing numbers in biology textbooks. While few state standards explicitly require knowledge of phylogenetics, most require some knowledge of evolutionary biology, and many…

  2. Trees at the Center.

    ERIC Educational Resources Information Center

    Flannery, Maura

    1998-01-01

    Recommends introducing students to biology using a topical focus that can offer intriguing perspectives on the discipline. Describes a biology course that uses trees as a topical focus. Presents a list of literary resources and reviews student interactions. Contains 50 references. (DDR)

  3. Arbutus unedo, Strawberry Tree

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Encylopedia of Fruit and Nuts is designed as a research reference source on temperate and tropical fruit and nut crops. Strawberry tree or madrone is native to the Mediterranean region of southern Europe (Arbutus unedo L., Ericaceae) with a relict population in Ireland, as well as in North Ameri...

  4. The Sacred Tree.

    ERIC Educational Resources Information Center

    Lethbridge Univ. (Alberta).

    Designed as a text for high school students and adults, this illustrated book presents ethical concepts and teachings of Native societies throughout North America concerning the nature and possibilities of human existence. The final component of a course in self-discovery and development, the book begins with the legend of the "Sacred Tree"…

  5. Digging Deeper with Trees.

    ERIC Educational Resources Information Center

    Growing Ideas, 2001

    2001-01-01

    Describes hands-on science areas that focus on trees. A project on leaf pigmentation involves putting crushed leaves in a test tube with solvent acetone to dissolve pigment. In another project, students learn taxonomy by sorting and classifying leaves based on observable characteristics. Includes a language arts connection. (PVD)

  6. Hug a Tree!

    ERIC Educational Resources Information Center

    Rockwell, Robert E.; And Others

    1983-01-01

    Methods for teaching pupils to use their senses to explore colors, shapes, textures, and sounds of the great outdoors are described. Ideas include: (1) having children hug their own special tree; (2) looking for geometric shapes in nature; (3) taking nocturnal nature walks; (4) building a track for racing insects; and (5) collecting objects with…

  7. Semiparametric regression during 2003–2007*

    PubMed Central

    Ruppert, David; Wand, M.P.; Carroll, Raymond J.

    2010-01-01

    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800

  8. Unravelling the limits to tree height: a major role for water and nutrient trade-offs.

    PubMed

    Cramer, Michael D

    2012-05-01

    Competition for light has driven forest trees to grow exceedingly tall, but the lack of a single universal limit to tree height indicates multiple interacting environmental limitations. Because soil nutrient availability is determined by both nutrient concentrations and soil water, water and nutrient availabilities may interact in determining realised nutrient availability and consequently tree height. In SW Australia, which is characterised by nutrient impoverished soils that support some of the world's tallest forests, total [P] and water availability were independently correlated with tree height (r = 0.42 and 0.39, respectively). However, interactions between water availability and each of total [P], pH and [Mg] contributed to a multiple linear regression model of tree height (r = 0.72). A boosted regression tree model showed that maximum tree height was correlated with water availability (24%), followed by soil properties including total P (11%), Mg (10%) and total N (9%), amongst others, and that there was an interaction between water availability and total [P] in determining maximum tree height. These interactions indicated a trade-off between water and P availability in determining maximum tree height in SW Australia. This is enabled by a species assemblage capable of growing tall and surviving (some) disturbances. The mechanism for this trade-off is suggested to be through water enabling mass-flow and diffusive mobility of P, particularly of relatively mobile organic P, although water interactions with microbial activity could also play a role. PMID:22038061

  9. Urban Forest Dynamics: Remote Sensing of Temporal Variability of Urban Tree Cover

    NASA Astrophysics Data System (ADS)

    Johnston, A. K.

    2013-12-01

    Urban forests are increasingly a focus of interest as urbanized populations grow and urban areas expand. Urban forests change as trees are planted, grow, die, and are removed. These processes alter a city's tree cover over time, but this inherent dynamism is poorly understood. Better understanding of how tree cover is a variable land cover component of the urban environment will enhance knowledge of links between human activity and ecosystem processes, and provide new perspectives for management of urban resources. As part of a study of past tree cover variability within a major urban center over a 20-year period, alternate remote sensing methodologies were tested to improve accuracy of urban tree cover mapping. Changes in tree cover proportion were measured in the District of Columbia every two years between 1984-2004 utilizing highly calibrated Landsat satellite remote sensing data. The satellite remote sensing data used in this study were calibrated to minimize the impact of atmospheric scattering as part of another ongoing study to monitor forest disturbance patterns. Two methods were applied to Landsat data. Results from spectral mixture analysis, widely applied in previous remote sensing studies of urban areas, were compared to results from application of support vector regression to estimate urban tree cover. Testing demonstrated that an approach utilizing support vector regression provided higher and more consistent accuracy across land use types when compared to spectral mixture analysis. Consistent reliability across land use types provides an important advantage, allowing application of results for identifying tree cover changes between different regions within a city. Tree cover maps were validated using aerial photography imagery and data from field surveys. The District of Columbia did not experience an overall increase or decrease in total tree canopy area. Between 1984-2004, the city-wide tree cover remained between 22.1(+/-2.9)% and 28

  10. Building Regression Models: The Importance of Graphics.

    ERIC Educational Resources Information Center

    Dunn, Richard

    1989-01-01

    Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)

  11. Regression Analysis by Example. 5th Edition

    ERIC Educational Resources Information Center

    Chatterjee, Samprit; Hadi, Ali S.

    2012-01-01

    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  12. Bayesian Unimodal Density Regression for Causal Inference

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2011-01-01

    Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…

  13. Standards for Standardized Logistic Regression Coefficients

    ERIC Educational Resources Information Center

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  14. Developmental Regression in Autism Spectrum Disorders

    ERIC Educational Resources Information Center

    Rogers, Sally J.

    2004-01-01

    The occurrence of developmental regression in autism is one of the more puzzling features of this disorder. Although several studies have documented the validity of parental reports of regression using home videos, accumulating data suggest that most children who demonstrate regression also demonstrated previous, subtle, developmental differences.…

  15. Nonparametric instrumental regression with non-convex constraints

    NASA Astrophysics Data System (ADS)

    Grasmair, M.; Scherzer, O.; Vanhems, A.

    2013-03-01

    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.

  16. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  17. A Beta-splitting model for evolutionary trees.

    PubMed

    Sainudiin, Raazesh; Véber, Amandine

    2016-05-01

    In this article, we construct a generalization of the Blum-François Beta-splitting model for evolutionary trees, which was itself inspired by Aldous' Beta-splitting model on cladograms. The novelty of our approach allows for asymmetric shares of diversification rates (or diversification 'potential') between two sister species in an evolutionarily interpretable manner, as well as the addition of extinction to the model in a natural way. We describe the incremental evolutionary construction of a tree with n leaves by splitting or freezing extant lineages through the generating, organizing and deleting processes. We then give the probability of any (binary rooted) tree under this model with no extinction, at several resolutions: ranked planar trees giving asymmetric roles to the first and second offspring species of a given species and keeping track of the order of the speciation events occurring during the creation of the tree, unranked planar trees, ranked non-planar trees and finally (unranked non-planar) trees. We also describe a continuous-time equivalent of the generating, organizing and deleting processes where tree topology and branch lengths are jointly modelled and provide code in SageMath/Python for these algorithms. PMID:27293780

  18. A Beta-splitting model for evolutionary trees

    PubMed Central

    Sainudiin, Raazesh

    2016-01-01

    In this article, we construct a generalization of the Blum–François Beta-splitting model for evolutionary trees, which was itself inspired by Aldous' Beta-splitting model on cladograms. The novelty of our approach allows for asymmetric shares of diversification rates (or diversification ‘potential’) between two sister species in an evolutionarily interpretable manner, as well as the addition of extinction to the model in a natural way. We describe the incremental evolutionary construction of a tree with n leaves by splitting or freezing extant lineages through the generating, organizing and deleting processes. We then give the probability of any (binary rooted) tree under this model with no extinction, at several resolutions: ranked planar trees giving asymmetric roles to the first and second offspring species of a given species and keeping track of the order of the speciation events occurring during the creation of the tree, unranked planar trees, ranked non-planar trees and finally (unranked non-planar) trees. We also describe a continuous-time equivalent of the generating, organizing and deleting processes where tree topology and branch lengths are jointly modelled and provide code in SageMath/Python for these algorithms. PMID:27293780

  19. Estimating equivalence with quantile regression.

    PubMed

    Cade, Brian S

    2011-01-01

    Equivalence testing and corresponding confidence interval estimates are used to provide more enlightened statistical statements about parameter estimates by relating them to intervals of effect sizes deemed to be of scientific or practical importance rather than just to an effect size of zero. Equivalence tests and confidence interval estimates are based on a null hypothesis that a parameter estimate is either outside (inequivalence hypothesis) or inside (equivalence hypothesis) an equivalence region, depending on the question of interest and assignment of risk. The former approach, often referred to as bioequivalence testing, is often used in regulatory settings because it reverses the burden of proof compared to a standard test of significance, following a precautionary principle for environmental protection. Unfortunately, many applications of equivalence testing focus on establishing average equivalence by estimating differences in means of distributions that do not have homogeneous variances. I discuss how to compare equivalence across quantiles of distributions using confidence intervals on quantile regression estimates that detect differences in heterogeneous distributions missed by focusing on means. I used one-tailed confidence intervals based on inequivalence hypotheses in a two-group treatment-control design for estimating bioequivalence of arsenic concentrations in soils at an old ammunition testing site and bioequivalence of vegetation biomass at a reclaimed mining site. Two-tailed confidence intervals based both on inequivalence and equivalence hypotheses were used to examine quantile equivalence for negligible trends over time for a continuous exponential model of amphibian abundance. PMID:21516905

  20. Streamflow forecasting using functional regression

    NASA Astrophysics Data System (ADS)

    Masselot, Pierre; Dabo-Niang, Sophie; Chebana, Fateh; Ouarda, Taha B. M. J.

    2016-07-01

    Streamflow, as a natural phenomenon, is continuous in time and so are the meteorological variables which influence its variability. In practice, it can be of interest to forecast the whole flow curve instead of points (daily or hourly). To this end, this paper introduces the functional linear models and adapts it to hydrological forecasting. More precisely, functional linear models are regression models based on curves instead of single values. They allow to consider the whole process instead of a limited number of time points or features. We apply these models to analyse the flow volume and the whole streamflow curve during a given period by using precipitations curves. The functional model is shown to lead to encouraging results. The potential of functional linear models to detect special features that would have been hard to see otherwise is pointed out. The functional model is also compared to the artificial neural network approach and the advantages and disadvantages of both models are discussed. Finally, future research directions involving the functional model in hydrology are presented.

  1. Insulin resistance: regression and clustering.

    PubMed

    Yoon, Sangho; Assimes, Themistocles L; Quertermous, Thomas; Hsiao, Chin-Fu; Chuang, Lee-Ming; Hwu, Chii-Min; Rajaratnam, Bala; Olshen, Richard A

    2014-01-01

    In this paper we try to define insulin resistance (IR) precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI) or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ), a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT). We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with "main effects" is not satisfactory, but prediction that includes interactions may be. PMID:24887437

  2. Optimizing Urban Tree Soil Substrate for the City of Vienna

    NASA Astrophysics Data System (ADS)

    Murer, Erwin; Strauss, Peter; Schmidt, Stefan

    2015-04-01

    Many of the city garden managements in Central Europe encounter problems with the sustainable growing of trees in the cities. Tree root space is more and more limited by pavements and roads and is polluted by salt application during winter time. Thus, the life expectancy of the city trees is decreasing because the trees become more susceptible to diseases. Diseased trees are a safety risk. These challenges are additionally enforced by lower budgets to re-establish new trees. To actively react on this challenge a new soil substrate for city trees has been developed and tested combining cost effectiveness with improved characteristics for water retention and nutrient delivery on one side and drainage capabilities on the other side. The new substrate should be inexpensive, easy and simple to produce and well miscible. Therefore, easily available materials have been tested which are river sediments that are delivered by annual floods; compost produced by a city owned composting plant and low cost dolomite chippings from quarries near Vienna. The final composition of the new Vienna tree substrate consists of 3 mineral components and one organic component. These are mixed in a relationship of 4 parts dolomite chippings, 3 parts sand and 3 parts of fluvial fine sediment and 2 parts of compost. After a laboratory phase to develop the new substrate, field testing of the newly developed substrate is presently carried out in three different types of field experiments consisting of 20 implementation sites distributed over the city of Vienna, with annual checking for the growth of trees, 2 implementation sites with sensors to measure the water and salt balance and 6 city lysimeters with implementation of enhanced facilities to monitor substrate and water behaviour. These facilities will be used to relate the growing factors in connection with the site properties, to developing of a fertilizer recommendation for urban trees and to make tests for the compatibility of the trees

  3. Building Your Own Abseil Tree.

    ERIC Educational Resources Information Center

    Barnett, Des

    2002-01-01

    The foot and mouth crisis forced many British outdoor education providers to develop new options. The construction of an abseiling tree is described, which requires a living, healthy, straight tree with a trunk thick enough to remain stable under load and with few branches in the lower 15-20 meters. An abseil tree code of practice is presented.…

  4. The Re-Think Tree.

    ERIC Educational Resources Information Center

    Gear, Jim

    1993-01-01

    The Re-Think Tree is a simple framework to help individuals assess and improve their behaviors related to environmental issues. The branches of the tree in order of priority are refuse, reduce, re-use, and recycle. Roots of the tree include such things as public opinion, education, and watchdog groups. (KS)

  5. The Tree Worker's Manual. [Revised.

    ERIC Educational Resources Information Center

    Lilly, S. J.

    This manual acquaints readers with the general operations of the tree care industry. The manual covers subjects important to a tree worker and serves as a training aid for workers at the entry level as tree care professionals. Each chapter begins with a set of objectives and may include figures, tables, and photographs. Ten chapters are included:…

  6. The Hopi Fruit Tree Book.

    ERIC Educational Resources Information Center

    Nyhuis, Jane

    Referring as often as possible to traditional Hopi practices and to materials readily available on the reservation, the illustrated booklet provides information on the care and maintenance of young fruit trees. An introduction to fruit trees explains the special characteristics of new trees, e.g., grafting, planting pits, and watering. The…

  7. Building up rhetorical structure trees

    SciTech Connect

    Marcu, D.

    1996-12-31

    I use the distinction between the nuclei and the satellites that pertain to discourse relations to introduce a compositionality criterion for discourse trees. I provide a first-order formalization of rhetorical structure trees and, on its basis, I derive an algorithm that constructs all the valid rhetorical trees that can be associated with a given discourse.

  8. New Life From Dead Trees

    ERIC Educational Resources Information Center

    DeGraaf, Richard M.

    1978-01-01

    There are numerous bird species that will nest only in dead or dying trees. Current forestry practices include clearing forests of these snags, or dead trees. This practice is driving many species out of the forests. An illustrated example of bird succession in and on a tree is given. (MA)

  9. Direction of Effects in Multiple Linear Regression Models.

    PubMed

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741

  10. Developmental regression in autism spectrum disorder

    PubMed Central

    Al Backer, Nouf Backer

    2015-01-01

    The occurrence of developmental regression in autism spectrum disorder (ASD) is one of the most puzzling phenomena of this disorder. A little is known about the nature and mechanism of developmental regression in ASD. About one-third of young children with ASD lose some skills during the preschool period, usually speech, but sometimes also nonverbal communication, social or play skills are also affected. There is a lot of evidence suggesting that most children who demonstrate regression also had previous, subtle, developmental differences. It is difficult to predict the prognosis of autistic children with developmental regression. It seems that the earlier development of social, language, and attachment behaviors followed by regression does not predict the later recovery of skills or better developmental outcomes. The underlying mechanisms that lead to regression in autism are unknown. The role of subclinical epilepsy in the developmental regression of children with autism remains unclear. PMID:27493417

  11. A Survey of UML Based Regression Testing

    NASA Astrophysics Data System (ADS)

    Fahad, Muhammad; Nadeem, Aamer

    Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.

  12. PoInTree: a polar and interactive phylogenetic tree.

    PubMed

    Carreras, Marco; Marco, Cerreras; Gianti, Eleonora; Eleonora, Gianti; Sartori, Luca; Luca, Sartori; Plyte, Simon Edward; Edward, Plyte Simon; Isacchi, Antonella; Antonella, Isacchi; Bosotti, Roberta; Roberta, Bosotti

    2005-02-01

    PoInTree (Polar and Interactive Tree) is an application that allows to build, visualize and customize phylogenetic trees in a polar interactive and highly flexible view. It takes as input a FASTA file or multiple alignment formats. Phylogenetic tree calculation is based on a sequence distance method and utilizes the Neighbor Joining (NJ) algorithm. It also allows displaying precalculated trees of the major protein families based on Pfam classification. In PoInTree, nodes can be dynamically opened and closed and distances between genes are graphically represented. Tree root can be centered on a selected leaf. Text search mechanism, color-coding and labeling display are integrated. The visualizer can be connected to an Oracle database containing information on sequences and other biological data, helping to guide their interpretation within a given protein family across multiple species. The application is written in Borland Delphi and based on VCL Teechart Pro 6 graphical component (Steema software). PMID:16144524

  13. Tree Colors: Color Schemes for Tree-Structured Data.

    PubMed

    Tennekes, Martijn; de Jonge, Edwin

    2014-12-01

    We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations. PMID:26356921

  14. Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure.

    PubMed

    DeGiorgio, Michael; Rosenberg, Noah A

    2016-08-01

    In the last few years, several statistically consistent consensus methods for species tree inference have been devised that are robust to the gene tree discordance caused by incomplete lineage sorting in unstructured ancestral populations. One source of gene tree discordance that has only recently been identified as a potential obstacle for phylogenetic inference is ancestral population structure. In this article, we describe a general model of ancestral population structure, and by relying on a single carefully constructed example scenario, we show that the consensus methods Democratic Vote, STEAC, STAR, R(∗) Consensus, Rooted Triple Consensus, Minimize Deep Coalescences, and Majority-Rule Consensus are statistically inconsistent under the model. We find that among the consensus methods evaluated, the only method that is statistically consistent in the presence of ancestral population structure is GLASS/Maximum Tree. We use simulations to evaluate the behavior of the various consensus methods in a model with ancestral population structure, showing that as the number of gene trees increases, estimates on the basis of GLASS/Maximum Tree approach the true species tree topology irrespective of the level of population structure, whereas estimates based on the remaining methods only approach the true species tree topology if the level of structure is low. However, through simulations using species trees both with and without ancestral population structure, we show that GLASS/Maximum Tree performs unusually poorly on gene trees inferred from alignments with little information. This practical limitation of GLASS/Maximum Tree together with the inconsistency of other methods prompts the need for both further testing of additional existing methods and development of novel methods under conditions that incorporate ancestral population structure. PMID:27086043

  15. Using GA-Ridge regression to select hydro-geological parameters influencing groundwater pollution vulnerability.

    PubMed

    Ahn, Jae Joon; Kim, Young Min; Yoo, Keunje; Park, Joonhong; Oh, Kyong Joo

    2012-11-01

    For groundwater conservation and management, it is important to accurately assess groundwater pollution vulnerability. This study proposed an integrated model using ridge regression and a genetic algorithm (GA) to effectively select the major hydro-geological parameters influencing groundwater pollution vulnerability in an aquifer. The GA-Ridge regression method determined that depth to water, net recharge, topography, and the impact of vadose zone media were the hydro-geological parameters that influenced trichloroethene pollution vulnerability in a Korean aquifer. When using these selected hydro-geological parameters, the accuracy was improved for various statistical nonlinear and artificial intelligence (AI) techniques, such as multinomial logistic regression, decision trees, artificial neural networks, and case-based reasoning. These results provide a proof of concept that the GA-Ridge regression is effective at determining influential hydro-geological parameters for the pollution vulnerability of an aquifer, and in turn, improves the AI performance in assessing groundwater pollution vulnerability. PMID:22124584

  16. There is no temperature dependence of net biochemical fractionation of hydrogen and oxygen isotopes in tree-ring cellulose.

    PubMed

    Roden, J S; Ehleringer, J R

    2000-01-01

    The isotopic composition of tree-ring cellulose was obtained over a two-year period from small diameter, riparian zone trees along an elevational transect in Big Cottonwood Canyon, Utah, USA to test for a possible temperature dependence of net biological fractionation during cellulose synthesis. The isotope ratios of stream water varied by only 3.6% and 0.2% in deltaD and delta18O, respectively, over an elevation change of 810m. The similarity in stream water and macroenvironment over the short (13km) transect produced nearly constant stem and leaf water deltaD and delta18O values. In addition, what few seasonal variations observed in the isotopic composition of source water and atmospheric water vapor or in leaf water evaporative enrichment were experienced equally by all sites along the elevational transect. The temperature at each site along the transect spanned a range of > or = 5 degrees C as calculated using the adiabatic lapse rate. Since the deltaD and delta18O values of stem and leaf water varied little for these trees over this elevation/temperature transect, any differences in tree-ring cellulose deltaD and delta18O values should have been associated with temperature effects on net biological fractionation. However, the slopes of the regressions of elevation versus the deltaD and delta18O values of tree-ring cellulose were not significantly different from zero indicating little or no temperature dependence of net biological fractionation. Therefore, cross-site climatic reconstruction studies using the isotope ratios of cellulose need not be concerned that temperatures during the growing season have influenced results. PMID:11501707

  17. An approach for reconstructing past streamflows using a water balance model and tree-ring records in the upper West Walker River basin, California

    NASA Astrophysics Data System (ADS)

    Vittori, J. C.; Saito, L.; Biondi, F.

    2010-12-01

    Historical streamflows in a given river basin can be useful for determining regional patterns of drought and climate, yet such measured data are typically available for the last 100 years at most. To extend the measured record, observed streamflows can be regressed against tree-ring data that serve as proxies for streamflow. This empirical approach, however, cannot account for or test factors that do not directly affect tree-ring growth but may influence streamflow. To reconstruct past streamflows in a more mechanistic way, a seasonal water balance model has been developed for the upper West Walker River basin that uses proxy precipitation and air temperature data derived from tree-ring records as input. The model incorporates simplistic relationships between precipitation and other components of the hydrologic cycle, as well as a component for modeling snow, and operates at a seasonal time scale. The model allows for flexibility in manipulating various hydrologic and land use characteristics, and can be applied to other watersheds. The intent is for the model to investigate sources of uncertainty in streamflow reconstructions, and how factors such as wildfire or changes in vegetation cover could impact estimates of past flows, something regression-based models are not able to do. In addition, the use of a mechanistic water balance model calibrated against proxy climate records can provide information on changes in various components of the water cycle, including the interaction between evapotranspiration, snowmelt, and runoff under warmer climatic regimes.

  18. Narrowing historical uncertainty: probabilistic classification of ambiguously identified tree species in historical forest survey data

    USGS Publications Warehouse

    Mladenoff, D.J.; Dahir, S.E.; Nordheim, E.V.; Schulte, L.A.; Guntenspergen, G.R.

    2002-01-01

    Historical data have increasingly become appreciated for insight into the past conditions of ecosystems. Uses of such data include assessing the extent of ecosystem change; deriving ecological baselines for management, restoration, and modeling; and assessing the importance of past conditions on the composition and function of current systems. One historical data set of this type is the Public Land Survey (PLS) of the United States General Land Office, which contains data on multiple tree species, sizes, and distances recorded at each survey point, located at half-mile (0.8 km) intervals on a 1-mi (1.6 km) grid. This survey method was begun in the 1790s on US federal lands extending westward from Ohio. Thus, the data have the potential of providing a view of much of the US landscape from the mid-1800s, and they have been used extensively for this purpose. However, historical data sources, such as those describing the species composition of forests, can often be limited in the detail recorded and the reliability of the data, since the information was often not originally recorded for ecological purposes. Forest trees are sometimes recorded ambiguously, using generic or obscure common names. For the PLS data of northern Wisconsin, USA, we developed a method to classify ambiguously identified tree species using logistic regression analysis, using data on trees that were clearly identified to species and a set of independent predictor variables to build the models. The models were first created on partial data sets for each species and then tested for fit against the remaining data. Validations were conducted using repeated, random subsets of the data. Model prediction accuracy ranged from 81% to 96% in differentiating congeneric species among oak, pine, ash, maple, birch, and elm. Major predictor variables were tree size, associated species, landscape classes indicative of soil type, and spatial location within the study region. Results help to clarify ambiguities

  19. Predicting 'very poor' beach water quality gradings using classification tree.

    PubMed

    Thoe, Wai; Choi, King Wah; Lee, Joseph Hun-wei

    2016-02-01

    A beach water quality prediction system has been developed in Hong Kong using multiple linear regression (MLR) models. However, linear models are found to be weak at capturing the infrequent 'very poor' water quality occasions when Escherichia coli (E. coli) concentration exceeds 610 counts/100 mL. This study uses a classification tree to increase the accuracy in predicting the 'very poor' water quality events at three Hong Kong beaches affected either by non-point source or point source pollution. Binary-output classification trees (to predict whether E. coli concentration exceeds 610 counts/100 mL) are developed over the periods before and after the implementation of the Harbour Area Treatment Scheme, when systematic changes in water quality were observed. Results show that classification trees can capture more 'very poor' events in both periods when compared to the corresponding linear models, with an increase in correct positives by an average of 20%. Classification trees are also developed at two beaches to predict the four-category Beach Water Quality Indices. They perform worse than the binary tree and give excessive false alarms of 'very poor' events. Finally, a combined modelling approach using both MLR model and classification tree is proposed to enhance the beach water quality prediction system for Hong Kong. PMID:26837834

  20. Global value trees.

    PubMed

    Zhu, Zhen; Puliga, Michelangelo; Cerina, Federica; Chessa, Alessandro; Riccaboni, Massimo

    2015-01-01

    The fragmentation of production across countries has become an important feature of the globalization in recent decades and is often conceptualized by the term "global value chains" (GVCs). When empirically investigating the GVCs, previous studies are mainly interested in knowing how global the GVCs are rather than how the GVCs look like. From a complex networks perspective, we use the World Input-Output Database (WIOD) to study the evolution of the global production system. We find that the industry-level GVCs are indeed not chain-like but are better characterized by the tree topology. Hence, we compute the global value trees (GVTs) for all the industries available in the WIOD. Moreover, we compute an industry importance measure based on the GVTs and compare it with other network centrality measures. Finally, we discuss some future applications of the GVTs. PMID:25978067

  1. Doubly robust survival trees.

    PubMed

    Steingrimsson, Jon Arni; Diao, Liqun; Molinaro, Annette M; Strawderman, Robert L

    2016-09-10

    Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing 'loss based' recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new 'doubly robust' extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27037609

  2. Global Value Trees

    PubMed Central

    Zhu, Zhen; Puliga, Michelangelo; Cerina, Federica; Chessa, Alessandro; Riccaboni, Massimo

    2015-01-01

    The fragmentation of production across countries has become an important feature of the globalization in recent decades and is often conceptualized by the term “global value chains” (GVCs). When empirically investigating the GVCs, previous studies are mainly interested in knowing how global the GVCs are rather than how the GVCs look like. From a complex networks perspective, we use the World Input-Output Database (WIOD) to study the evolution of the global production system. We find that the industry-level GVCs are indeed not chain-like but are better characterized by the tree topology. Hence, we compute the global value trees (GVTs) for all the industries available in the WIOD. Moreover, we compute an industry importance measure based on the GVTs and compare it with other network centrality measures. Finally, we discuss some future applications of the GVTs. PMID:25978067

  3. Ensemble of Causal Trees

    NASA Astrophysics Data System (ADS)

    Bialas, Piotr

    2003-10-01

    We discuss the geometry of trees endowed with a causal structure using the conventional framework of equilibrium statistical mechanics. We show how this ensemble is related to popular growing network models. In particular we demonstrate that on a class of afine attachment kernels the two models are identical but they can differ substantially for other choice of weights. We show that causal trees exhibit condensation even for asymptotically linear kernels. We derive general formulae describing the degree distribution, the ancestor--descendant correlation and the probability that a randomly chosen node lives at a given geodesic distance from the root. It is shown that the Hausdorff dimension dH of the causal networks is generically infinite.

  4. Flexible regression models over river networks

    PubMed Central

    O’Donnell, David; Rushworth, Alastair; Bowman, Adrian W; Marian Scott, E; Hallard, Mark

    2014-01-01

    Many statistical models are available for spatial data but the vast majority of these assume that spatial separation can be measured by Euclidean distance. Data which are collected over river networks constitute a notable and commonly occurring exception, where distance must be measured along complex paths and, in addition, account must be taken of the relative flows of water into and out of confluences. Suitable models for this type of data have been constructed based on covariance functions. The aim of the paper is to place the focus on underlying spatial trends by adopting a regression formulation and using methods which allow smooth but flexible patterns. Specifically, kernel methods and penalized splines are investigated, with the latter proving more suitable from both computational and modelling perspectives. In addition to their use in a purely spatial setting, penalized splines also offer a convenient route to the construction of spatiotemporal models, where data are available over time as well as over space. Models which include main effects and spatiotemporal interactions, as well as seasonal terms and interactions, are constructed for data on nitrate pollution in the River Tweed. The results give valuable insight into the changes in water quality in both space and time. PMID:25653460

  5. Tree Rings: Timekeepers of the Past.

    ERIC Educational Resources Information Center

    Phipps, R. L.; McGowan, J.

    One of a series of general interest publications on science issues, this booklet describes the uses of tree rings in historical and biological recordkeeping. Separate sections cover the following topics: dating of tree rings, dating with tree rings, tree ring formation, tree ring identification, sample collections, tree ring cross dating, tree…

  6. How To Write a Municipal Tree Ordinance.

    ERIC Educational Resources Information Center

    Fazio, James R., Ed.

    1990-01-01

    At the heart of the Tree City USA program are four basic requirements: The community must have the following: (1) a tree board or department; (2) an annual community forestry program with financial provisions for trees and tree care; (3) an annual Arbor Day proclamation and observance; and (4) a tree ordinance. Sections of a model tree ordinance…

  7. PhyBin: binning trees by topology.

    PubMed

    Newton, Ryan R; Newton, Irene L G

    2013-01-01

    A major goal of many evolutionary analyses is to determine the true evolutionary history of an organism. Molecular methods that rely on the phylogenetic signal generated by a few to a handful of loci can be used to approximate the evolution of the entire organism but fall short of providing a global, genome-wide, perspective on evolutionary processes. Indeed, individual genes in a genome may have different evolutionary histories. Therefore, it is informative to analyze the number and kind of phylogenetic topologies found within an orthologous set of genes across a genome. Here we present PhyBin: a flexible program for clustering gene trees based on topological structure. PhyBin can generate bins of topologies corresponding to exactly identical trees or can utilize Robinson-Fould's distance matrices to generate clusters of similar trees, using a user-defined threshold. Additionally, PhyBin allows the user to adjust for potential noise in the dataset (as may be produced when comparing very closely related organisms) by pre-processing trees to collapse very short branches or those nodes not meeting a defined bootstrap threshold. As a test case, we generated individual trees based on an orthologous gene set from 10 Wolbachia species across four different supergroups (A-D) and utilized PhyBin to categorize the complete set of topologies produced from this dataset. Using this approach, we were able to show that although a single topology generally dominated the analysis, confirming the separation of the supergroups, many genes supported alternative evolutionary histories. Because PhyBin's output provides the user with lists of gene trees in each topological cluster, it can be used to explore potential reasons for discrepancies between phylogenies including homoplasies, long-branch attraction, or horizontal gene transfer events. PMID:24167782

  8. PhyBin: binning trees by topology

    PubMed Central

    Newton, Ryan R.

    2013-01-01

    A major goal of many evolutionary analyses is to determine the true evolutionary history of an organism. Molecular methods that rely on the phylogenetic signal generated by a few to a handful of loci can be used to approximate the evolution of the entire organism but fall short of providing a global, genome-wide, perspective on evolutionary processes. Indeed, individual genes in a genome may have different evolutionary histories. Therefore, it is informative to analyze the number and kind of phylogenetic topologies found within an orthologous set of genes across a genome. Here we present PhyBin: a flexible program for clustering gene trees based on topological structure. PhyBin can generate bins of topologies corresponding to exactly identical trees or can utilize Robinson-Fould’s distance matrices to generate clusters of similar trees, using a user-defined threshold. Additionally, PhyBin allows the user to adjust for potential noise in the dataset (as may be produced when comparing very closely related organisms) by pre-processing trees to collapse very short branches or those nodes not meeting a defined bootstrap threshold. As a test case, we generated individual trees based on an orthologous gene set from 10 Wolbachia species across four different supergroups (A–D) and utilized PhyBin to categorize the complete set of topologies produced from this dataset. Using this approach, we were able to show that although a single topology generally dominated the analysis, confirming the separation of the supergroups, many genes supported alternative evolutionary histories. Because PhyBin’s output provides the user with lists of gene trees in each topological cluster, it can be used to explore potential reasons for discrepancies between phylogenies including homoplasies, long-branch attraction, or horizontal gene transfer events. PMID:24167782

  9. Mathematical analysis and modeling of epidemics of rubber tree root diseases: Probability of infection of an individual tree

    SciTech Connect

    Chadoeuf, J.; Joannes, H.; Nandris, D.; Pierrat, J.C.

    1988-12-01

    The spread of root diseases in rubber tree (Hevea brasiliensis) due to Rigidoporus lignosus and Phellinus noxius was investigated epidemiologically using data collected every 6 month during a 6-year survey in a plantation. The aim of the present study is to see what factors could predict whether a given tree would be infested at the following inspection. Using a qualitative regression method we expressed the probability of pathogenic attack on a tree in terms of three factors: the state of health of the surrounding trees, the method used to clear the forest prior to planting, and evolution with time. The effects of each factor were ranked, and the roles of the various classes of neighbors were established and quantified. Variability between successive inspections was small, and the method of forest clearing was important only while primary inocula in the soil were still infectious. The state of health of the immediate neighbors was most significant; more distant neighbors in the same row had some effect; interrow spread was extremely rare. This investigation dealt only with trees as individuals, and further study of the interrelationships of groups of trees is needed.

  10. Exact solutions for species tree inference from discordant gene trees.

    PubMed

    Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

    2013-10-01

    Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions. PMID:24131054

  11. Regression in schizophrenia and its therapeutic value.

    PubMed

    Yazaki, N

    1992-03-01

    Using the regression evaluation scale, 25 schizophrenic patients were classified into three groups of Dissolution/autism (DAUG), Dissolution----attachment (DATG) and Non-regression (NRG). The regression of DAUG was of the type in which autism occurred when destructiveness emerged, while the regression of DATG was of the type in which attachment occurred when destructiveness emerged. This suggests that the regressive phenomena are an actualized form of the approach complex. In order to determine the factors distinguishing these two groups, I investigated psychiatric symptoms, mother-child relationships, premorbid personalities and therapeutic interventions. I believe that these factors form a continuity in which they interrelatedly determine the regressive state. Foremost among them, I stressed the importance of the mother-child relationship. PMID:1353128

  12. Tamarind tree seed dispersal by ring-tailed lemurs.

    PubMed

    Mertl-Millhollen, Anne S; Blumenfeld-Jones, Kathryn; Raharison, Sahoby Marin; Tsaramanana, Donald Raymond; Rasamimanana, Hantanirina

    2011-10-01

    In Madagascar, the gallery forests of the south are among the most endangered. Tamarind trees (Tamarindus indica) dominate these riverine forests and are a keystone food resource for ring-tailed lemurs (Lemur catta). At Berenty Reserve, the presence of tamarind trees is declining, and there is little recruitment of young trees. Because mature tamarinds inhibit growth under their crowns, seeds must be dispersed away from adult trees if tree recruitment is to occur. Ring-tailed lemurs are likely seed dispersers; however, because they spend much of their feeding, siesta, and sleeping time in tamarinds, they may defecate a majority of the tamarind seeds under tamarind trees. To determine whether they disperse tamarind seeds away from overhanging tamarind tree crowns, we observed two troops for 10 days each, noted the locations of feeding and defecation, and collected seeds from feces and fruit for germination. We also collected additional data on tamarind seedling recruitment under natural conditions, in which seedling germination was abundant after extensive rain, including under the canopy. However, seedling survival to 1 year was lower when growing under mature tamarind tree crowns than when growing away from an overhanging crown. Despite low fruit abundance averaging two fruits/m(3) in tamarind crowns, lemurs fed on tamarind fruit for 32% of their feeding samples. Daily path lengths averaged 1,266 m, and lemurs deposited seeds throughout their ranges. Fifty-eight percent of the 417 recorded lemur defecations were on the ground away from overhanging tamarind tree crowns. Tamarind seeds collected from both fruit and feces germinated. Because lemurs deposited viable seeds on the ground away from overhanging mature tamarind tree crowns, we conclude that ring-tailed lemurs provide tamarind tree seed dispersal services. PMID:21629992

  13. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  14. Concept formation vs. logistic regression: predicting death in trauma patients.

    PubMed

    Hadzikadic, M; Hakenewerth, A; Bohren, B; Norton, J; Mehta, B; Andrews, C

    1996-10-01

    This study compares two classification models used to predict survival of injured patients entering the emergency department. Concept formation is a machine learning technique that summarizes known examples cases in the form of a tree. After the tree is constructed, it can then be used to predict the classification of new cases. Logistic regression, on the other hand, is a statistical model that allows for a quantitative relationship for a dichotomous event with several independent variables. The outcome (dependent) variable must have only two choices, e.g. does or does not occur, alive or dead, etc. The result of this model is an equation which is then used to predict the probability of class membership of a new case. The two models were evaluated on a trauma registry database composed of information on all trauma patients admitted in 1992 to a Level I trauma center. A total of 2155 records. representing all trauma patients admitted for more than 24 h or who died in the Emergency Department, were grouped into two databases as follows: (1) discharge status of 'died' (containing 151 records), and (2) any discharge status other than 'died' (containing 2004 records). Both databases contained the same variables. PMID:8955858

  15. LRGS: Linear Regression by Gibbs Sampling

    NASA Astrophysics Data System (ADS)

    Mantz, Adam B.

    2016-02-01

    LRGS (Linear Regression by Gibbs Sampling) implements a Gibbs sampler to solve the problem of multivariate linear regression with uncertainties in all measured quantities and intrinsic scatter. LRGS extends an algorithm by Kelly (2007) that used Gibbs sampling for performing linear regression in fairly general cases in two ways: generalizing the procedure for multiple response variables, and modeling the prior distribution of covariates using a Dirichlet process.

  16. Geodesic least squares regression on information manifolds

    SciTech Connect

    Verdoolaege, Geert

    2014-12-05

    We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply this to scaling laws in magnetic confinement fusion.

  17. Quantile regression applied to spectral distance decay

    USGS Publications Warehouse

    Rocchini, D.; Cade, B.S.

    2008-01-01

    Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.

  18. Hybrid fuzzy regression with trapezoidal fuzzy data

    NASA Astrophysics Data System (ADS)

    Razzaghnia, T.; Danesh, S.; Maleki, A.

    2011-12-01

    In this regard, this research deals with a method for hybrid fuzzy least-squares regression. The extension of symmetric triangular fuzzy coefficients to asymmetric trapezoidal fuzzy coefficients is considered as an effective measure for removing unnecessary fuzziness of the linear fuzzy model. First, trapezoidal fuzzy variable is applied to derive a bivariate regression model. In the following, normal equations are formulated to solve the four parts of hybrid regression coefficients. Also the model is extended to multiple regression analysis. Eventually, method is compared with Y-H.O. chang's model.

  19. Investigating how students communicate tree-thinking

    NASA Astrophysics Data System (ADS)

    Boyce, Carrie Jo

    Learning is often an active endeavor that requires students work at building conceptual understandings of complex topics. Personal experiences, ideas, and communication all play large roles in developing knowledge of and understanding complex topics. Sometimes these experiences can promote formation of scientifically inaccurate or incomplete ideas. Representations are tools used to help individuals understand complex topics. In biology, one way that educators help people understand evolutionary histories of organisms is by using representations called phylogenetic trees. In order to understand phylogenetics trees, individuals need to understand the conventions associated with phylogenies. My dissertation, supported by the Tree-Thinking Representational Competence and Word Association frameworks, is a mixed-methods study investigating the changes in students' tree-reading, representational competence and mental association of phylogenetic terminology after participation in varied instruction. Participants included 128 introductory biology majors from a mid-sized southern research university. Participants were enrolled in either Introductory Biology I, where they were not taught phylogenetics, or Introductory Biology II, where they were explicitly taught phylogenetics. I collected data using a pre- and post-assessment consisting of a word association task and tree-thinking diagnostic (n=128). Additionally, I recruited a subset of students from both courses (n=37) to complete a computer simulation designed to teach students about phylogenetic trees. I then conducted semi-structured interviews consisting of a word association exercise with card sort task, a retrospective pre-assessment discussion, a post-assessment discussion, and interview questions. I found that students who received explicit lecture instruction had a significantly higher increase in scores on a tree-thinking diagnostic than students who did not receive lecture instruction. Students who received both

  20. Joint regression analysis and AMMI model applied to oat improvement

    NASA Astrophysics Data System (ADS)

    Oliveira, A.; Oliveira, T. A.; Mejza, S.

    2012-09-01

    In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.

  1. Chilling and heat requirements for flowering in temperate fruit trees

    NASA Astrophysics Data System (ADS)

    Guo, Liang; Dai, Junhu; Ranjitkar, Sailesh; Yu, Haiying; Xu, Jianchu; Luedeling, Eike

    2014-08-01

    Climate change has affected the rates of chilling and heat accumulation, which are vital for flowering and production, in temperate fruit trees, but few studies have been conducted in the cold-winter climates of East Asia. To evaluate tree responses to variation in chill and heat accumulation rates, partial least squares regression was used to correlate first flowering dates of chestnut ( Castanea mollissima Blume) and jujube ( Zizyphus jujube Mill.) in Beijing, China, with daily chill and heat accumulation between 1963 and 2008. The Dynamic Model and the Growing Degree Hour Model were used to convert daily records of minimum and maximum temperature into horticulturally meaningful metrics. Regression analyses identified the chilling and forcing periods for chestnut and jujube. The forcing periods started when half the chilling requirements were fulfilled. Over the past 50 years, heat accumulation during tree dormancy increased significantly, while chill accumulation remained relatively stable for both species. Heat accumulation was the main driver of bloom timing, with effects of variation in chill accumulation negligible in Beijing's cold-winter climate. It does not seem likely that reductions in chill will have a major effect on the studied species in Beijing in the near future. Such problems are much more likely for trees grown in locations that are substantially warmer than their native habitats, such as temperate species in the subtropics and tropics.

  2. Use of probabilistic weights to enhance linear regression myoelectric control

    NASA Astrophysics Data System (ADS)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  3. Informing tree-ring reconstructions with automated dendrometer data: the case of single-leaf pinyon (Pinus monophylla) from Great Basin National Park, Nevada, USA

    NASA Astrophysics Data System (ADS)

    Biondi, F.

    2012-12-01

    One of the most pressing issues in modern tree-ring science is to reduce uncertainty of reconstructions while emphasizing that the composition and dynamics of modern ecosystems cannot be understood from the present alone. I present here the latest results from research on the environmental factors that control radial growth of single-leaf pinyon (Pinus monophylla) in the Great Basin of North America using dendrometer data collected at half-hour intervals during two full growing season, 2010 and 2011. Automated (solar-powered) sensors at the site consisted of 8 point dendrometers installed on 7 trees to measure stem size, together with environmental probes that recorded air temperature, soil temperature and soil moisture. Additional meteorological variables at hourly timesteps were available from the EPA-CASTNET station located within 100 m of the dendrometer site. Daily cycles of stem expansion and contraction were quantified using the approach of Deslauriers et al. 2011, and the amount of daily radial stem increment was regressed against environmental variables. Graphical and numerical results showed that tree growth is relatively insensitive to surface soil moisture during the growing season. This finding corroborates empirical dendroclimatic results that showed how tree-ring chronologies of single-leaf pinyon are mostly a proxy for the balance between winter-spring precipitation supply and growing season evapotranspiration demand, thereby making it an ideal species for drought reconstructions.

  4. On Determining if Tree-based Networks Contain Fixed Trees.

    PubMed

    Anaya, Maria; Anipchenko-Ulaj, Olga; Ashfaq, Aisha; Chiu, Joyce; Kaiser, Mahedi; Ohsawa, Max Shoji; Owen, Megan; Pavlechko, Ella; St John, Katherine; Suleria, Shivam; Thompson, Keith; Yap, Corrine

    2016-05-01

    We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is [Formula: see text]-hard to decide, by reduction from 3-Dimensional Matching (3DM) and further that the problem is fixed-parameter tractable. PMID:27125655

  5. Pesticides in Urban Multiunit Dwellings: Hazard IdentificationUsing Classification and Regression Tree (CART) Analysis

    EPA Science Inventory

    Many units in public housing or other low-income urban dwellings may have elevated pesticide residues, given recurring infestation, but it would be logistically and economically infeasible to sample a large number of units to identify highly exposed households to design interven...

  6. First analysis of risk factors associated with bee colony collapse disorder by classification and regression trees

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sudden losses of managed honey bee (Apis mellifera L.) colonies are considered an important problem worldwide but the underlying cause or causes of these losses are currently unknown. In the United States, this syndrome was termed Colony Collapse Disorder (CCD), since the defining trait was a rapid ...

  7. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  8. Rethinking the linear regression model for spatial ecological data.

    PubMed

    Wagner, Helene H

    2013-11-01

    The linear regression model, with its numerous extensions including multivariate ordination, is fundamental to quantitative research in many disciplines. However, spatial or temporal structure in the data may invalidate the regression assumption of independent residuals. Spatial structure at any spatial scale can be modeled flexibly based on a set of uncorrelated component patterns (e.g., Moran's eigenvector maps, MEM) that is derived from the spatial relationships between sampling locations as defined in a spatial weight matrix. Spatial filtering thus addresses spatial autocorrelation in the residuals by adding such component patterns (spatial eigenvectors) as predictors to the regression model. However, space is not an ecologically meaningful predictor, and commonly used tests for selecting significant component patterns do not take into account the specific nature of these variables. This paper proposes "spatial component regression" (SCR) as a new way of integrating the linear regression model with Moran's eigenvector maps. In its unconditioned form, SCR decomposes the relationship between response and predictors by component patterns, whereas conditioned SCR provides an alternative method of spatial filtering, taking into account the statistical properties of component patterns in the design of statistical hypothesis tests. Application to the well-known multivariate mite data set illustrates how SCR may be used to condition for significant residual spatial structure and to identify additional predictors associated with residual spatial structure. Finally, I argue that all variance is spatially structured, hence spatial independence is best characterized by a lack of excess variance at any spatial scale, i.e., spatial white noise. PMID:24400490

  9. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  10. Elevation-dependent responses of tree mast seeding to climate change over 45 years

    PubMed Central

    Allen, Robert B; Hurst, Jennifer M; Portier, Jeanne; Richardson, Sarah J

    2014-01-01

    We use seed count data from a New Zealand mono-specific mountain beech forest to test for decadal trends in seed production along an elevation gradient in relation to changes in climate. Seedfall was collected (1965 to 2009) from seed trays located on transect lines at fixed elevations along an elevation gradient (1020 to 1370 m). We counted the number of seeds in the catch of each tray, for each year, and determined the number of viable seeds. Climate variables were obtained from a nearby (<2 km) climate station (914-m elevation). Variables were the sum or mean of daily measurements, using periods within each year known to correlate with subsequent interannual variation in seed production. To determine trends in mean seed production, at each elevation, and climate variables, we used generalized least squares (GLS) regression. We demonstrate a trend of increasing total and viable seed production, particularly at higher elevations, which emerged from marked interannual variation. Significant changes in four seasonal climate variables had GLS regression coefficients consistent with predictions of increased seed production. These variables subsumed the effect of year in GLS regressions with a greater influence on seed production with increasing elevation. Regression models enforce a view that the sequence of climate variables was additive in their influence on seed production throughout a reproductive cycle spanning more than 2 years and including three summers. Models with the most support always included summer precipitation as the earliest variable in the sequence followed by summer maximum daily temperatures. We interpret this as reflecting precipitation driven increases in soil nutrient availability enhancing seed production at higher elevations rather than the direct effects of climate, stand development or rising atmospheric CO2 partial pressures. Greater sensitivity of tree seeding at higher elevations to changes in climate reveals how ecosystem responses to

  11. Spatial vulnerability assessments by regression kriging

    NASA Astrophysics Data System (ADS)

    Pásztor, László; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor

    2016-04-01

    information representing IEW or GRP forming environmental factors were taken into account to support the spatial inference of the locally experienced IEW frequency and measured GRP values respectively. An efficient spatial prediction methodology was applied to construct reliable maps, namely regression kriging (RK) using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Application of RK also provides the possibility of inherent accuracy assessment. The resulting maps are characterized by global and local measures of its accuracy. Additionally the method enables interval estimation for spatial extension of the areas of predefined risk categories. All of these outputs provide useful contribution to spatial planning, action planning and decision making. Acknowledgement: Our work was partly supported by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).

  12. Error analysis of leaf area estimates made from allometric regression models

    NASA Technical Reports Server (NTRS)

    Feiveson, A. H.; Chhikara, R. S.

    1986-01-01

    Biological net productivity, measured in terms of the change in biomass with time, affects global productivity and the quality of life through biochemical and hydrological cycles and by its effect on the overall energy balance. Estimating leaf area for large ecosystems is one of the more important means of monitoring this productivity. For a particular forest plot, the leaf area is often estimated by a two-stage process. In the first stage, known as dimension analysis, a small number of trees are felled so that their areas can be measured as accurately as possible. These leaf areas are then related to non-destructive, easily-measured features such as bole diameter and tree height, by using a regression model. In the second stage, the non-destructive features are measured for all or for a sample of trees in the plots and then used as input into the regression model to estimate the total leaf area. Because both stages of the estimation process are subject to error, it is difficult to evaluate the accuracy of the final plot leaf area estimates. This paper illustrates how a complete error analysis can be made, using an example from a study made on aspen trees in northern Minnesota. The study was a joint effort by NASA and the University of California at Santa Barbara known as COVER (Characterization of Vegetation with Remote Sensing).

  13. PM10 forecasting using clusterwise regression

    NASA Astrophysics Data System (ADS)

    Poggi, Jean-Michel; Portier, Bruno

    2011-12-01

    In this paper, we are interested in the statistical forecasting of the daily mean PM10 concentration. Hourly concentrations of PM10 have been measured in the city of Rouen, in Haute-Normandie, France. Located at northwest of Paris, near the south side of Manche sea and heavily industrialised. We consider three monitoring stations reflecting the diversity of situations: an urban background station, a traffic station and an industrial station near the cereal harbour of Rouen. We have focused our attention on data for the months that register higher values, from December to March, on years 2004-2009. The models are obtained from the winter days of the four seasons 2004/2005 to 2007/2008 (training data) and then the forecasting performance is evaluated on the winter days of the season 2008/2009 (test data). We show that it is possible to accurately forecast the daily mean concentration by fitting a function of meteorological predictors and the average concentration measured on the previous day. The values of observed meteorological variables are used for fitting the models and are also considered for the test data. We have compared the forecasts produced by three different methods: persistence, generalized additive nonlinear models and clusterwise linear regression models. This last method gives very impressive results and the end of the paper tries to analyze the reasons of such a good behavior.

  14. Deriving the Regression Equation without Using Calculus

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.; Gordon, Florence S.

    2004-01-01

    Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…

  15. Regression Analysis and the Sociological Imagination

    ERIC Educational Resources Information Center

    De Maio, Fernando

    2014-01-01

    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  16. Illustration of Regression towards the Means

    ERIC Educational Resources Information Center

    Govindaraju, K.; Haslett, S. J.

    2008-01-01

    This article presents a procedure for generating a sequence of data sets which will yield exactly the same fitted simple linear regression equation y = a + bx. Unless rescaled, the generated data sets will have progressively smaller variability for the two variables, and the associated response and covariate will "regress" towards their…

  17. Stepwise versus Hierarchical Regression: Pros and Cons

    ERIC Educational Resources Information Center

    Lewis, Mitzi

    2007-01-01

    Multiple regression is commonly used in social and behavioral data analysis. In multiple regression contexts, researchers are very often interested in determining the "best" predictors in the analysis. This focus may stem from a need to identify those predictors that are supportive of theory. Alternatively, the researcher may simply be interested…

  18. Cross-Validation, Shrinkage, and Multiple Regression.

    ERIC Educational Resources Information Center

    Hynes, Kevin

    One aspect of multiple regression--the shrinkage of the multiple correlation coefficient on cross-validation is reviewed. The paper consists of four sections. In section one, the distinction between a fixed and a random multiple regression model is made explicit. In section two, the cross-validation paradigm and an explanation for the occurrence…

  19. Principles of Quantile Regression and an Application

    ERIC Educational Resources Information Center

    Chen, Fang; Chalhoub-Deville, Micheline

    2014-01-01

    Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more…

  20. Regression Analysis: Legal Applications in Institutional Research

    ERIC Educational Resources Information Center

    Frizell, Julie A.; Shippen, Benjamin S., Jr.; Luna, Andrew L.

    2008-01-01

    This article reviews multiple regression analysis, describes how its results should be interpreted, and instructs institutional researchers on how to conduct such analyses using an example focused on faculty pay equity between men and women. The use of multiple regression analysis will be presented as a method with which to compare salaries of…

  1. Dealing with Outliers: Robust, Resistant Regression

    ERIC Educational Resources Information Center

    Glasser, Leslie

    2007-01-01

    Least-squares linear regression is the best of statistics and it is the worst of statistics. The reasons for this paradoxical claim, arising from possible inapplicability of the method and the excessive influence of "outliers", are discussed and substitute regression methods based on median selection, which is both robust and resistant, are…

  2. A Practical Guide to Regression Discontinuity

    ERIC Educational Resources Information Center

    Jacob, Robin; Zhu, Pei; Somers, Marie-Andrée; Bloom, Howard

    2012-01-01

    Regression discontinuity (RD) analysis is a rigorous nonexperimental approach that can be used to estimate program impacts in situations in which candidates are selected for treatment based on whether their value for a numeric rating exceeds a designated threshold or cut-point. Over the last two decades, the regression discontinuity approach has…

  3. Sulphasalazine and regression of rheumatoid nodules.

    PubMed

    Englert, H J; Hughes, G R; Walport, M J

    1987-03-01

    The regression of small rheumatoid nodules was noted in four patients after starting sulphasalazine therapy. This coincided with an improvement in synovitis and also falls in erythrocyte sedimentation rate (ESR) and C reactive protein (CRP). The relation between the nodule regression and the sulphasalazine therapy is discussed. PMID:2883940

  4. A Simulation Investigation of Principal Component Regression.

    ERIC Educational Resources Information Center

    Allen, David E.

    Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…

  5. Three-Dimensional Modeling in Linear Regression.

    ERIC Educational Resources Information Center

    Herman, James D.

    Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…

  6. Trends and Tipping Points of Drought-induced Tree Mortality

    NASA Astrophysics Data System (ADS)

    Huang, K.; Yi, C.; Wu, D.; Zhou, T.; Zhao, X.; Blanford, W. J.; Wei, S.; Wu, H.; Du, L.

    2014-12-01

    Drought-induced tree mortality worldwide has been recently reported in a review of the literature by Allen et al. (2010). However, a quantitative relationship between widespread loss of forest from mortality and drought is still a key knowledge gap. Specifically, the field lacks quantitative knowledge of tipping point in trees when coping with water stress, which inhibits the assessments of how climate change affects the forest ecosystem. We investigate the statistical relationships for different (seven) conifer species between Ring Width Index (RWI) and Standardized Precipitation Evapotranspiration Index (SPEI), based on 411 chronologies from the International Tree-Ring Data Bank across 11 states of the western United States. We found robust species-specific relationships between RWI and SPEI for all seven conifer species at dry condition. The regression models show that the RWI decreases with SPEI decreasing (drying) and more than 76% variation of tree growth (RWI) can be explained by the drought index (SPEI). However, when soil water is sufficient (i.e., SPEI>SPEIu), soil water is no longer a restrictive factor for tree growth and, therefore, the RWI shows a weak correlation with SPEI. Based on the statistical models, we derived the tipping point of SPEI (SPEItp) where the RWI equals 0, which means the carbon efflux by tree respiration equals carbon influx by tree photosynthesis. When the severity of drought exceeds this tipping point(i.e. SPEItrees might not be able to sustain their lives as the carbon assimilated by photosynthesis could not suffice the lowest need of trees maintain respiration. The ranges of the tipping points for seven species-specific trees vary between -2.45 and -1.40. The lower value of a tipping point represents the stronger ability to endure drought. The predicted tipping points can be used as reference of tree mortality for assessment of forest mortality risk under climate change.This work was supported by the Fund for

  7. Barking up the Right Tree

    ERIC Educational Resources Information Center

    Houston, Paul D.

    2006-01-01

    There is a childhood saying about a confused dog who thinks he sees a possum in a tree. The problem is that the possum is actually in a different tree so the dog barks up the wrong tree. American education is constantly playing both dog and possum. Sometimes they are the prey, and sometimes they are just confused about what and where the prey is.…

  8. Error Tree: A Tree Structure for Hamming and Edit Distances and Wildcards Matching.

    PubMed

    Al-Okaily, Anas

    2015-12-01

    Approximate pattern matching is a fundamental problem in the bioinformatics and information retrieval applications. The problem involves different matching relations such as Hamming distance, edit distances, and the wildcards matching problem. The input is usually a text of length n over a fixed alphabet of length Σ, a pattern of length m, and an integer k. The output is to find all positions that have ≤ k Hamming distance, edit distance, or wildcards matching with P. Many algorithms and indexes have been proposed to solve the problems more efficiently, but due to the space and time complexities of the problems, most tools adopted heuristics approaches based on, for instance, suffix tree, suffix array, or Burrows Wheeler Transform to reach practical implementations. Error Tree is a novel tree structure that is mainly oriented to solve the approximate pattern matching problems, using less space and faster computation time. The algorithm proposes for Hamming distance and wildcards matching a tree structure that needs [Formula: see text] words and takes [Formula: see text] in the average case) of query time for any online/offline pattern, where occ is the number of outputs. In addition, a tree structure of [Formula: see text] words and [Formula: see text] in the average case) query time for edit distance for any online/offline pattern. PMID:26402070

  9. Integration of Classification Tree Analyses and Spatial Metrics to Assess Changes in Supraglacial Lakes in the Karakoram Himalaya

    NASA Astrophysics Data System (ADS)

    Bulley, H. N.; Bishop, M. P.; Shroder, J. F.; Haritashya, U. K.

    2007-12-01

    Alpine glacier responses to climate chnage reveal increases in retreat with corresponding increases in production of glacier melt water and development of supraglacial lakes. The rate of occurrence and spatial extent of lakes in the Himalaya are difficult to determine because current spectral-based image analysis of glacier surfaces are limited through anisotropic reflectance and lack of high quality digital elevation models. Additionally, the limitations of multivariate classification algorithms to adequately segregate glacier features in satellite imagery have led to an increased interest in non-parametric methods, such as classification and regression trees. Our objectives are to demonstrate the utility of a semi-automated approach that integrates classification- tree-based image segmentation and object-oriented analysis to differentiate supraglacial lakes from glacier debris, ice cliffs, lateral and medial moraines. The classification-tree process involves a binary, recursive, partitioning non-parametric method that can account for non-linear relationships. We used 2002 and 2004 ASTER VNIR and SWIR imagery to assess the Baltoro Glacier in the Karakoram Himalaya. Other input variables include the normalized difference water index (NDWI), ratio images, Moran's I image, and fractal dimension. The classification tree was used to generate initial image segments and it was particularly effective in differentiating glacier features. The object-oriented analysis included the use of shape and spatial metrics to refine the classification-tree output. Classification-tree results show that NDWI is the most important single variable for characterizing the glacier-surface features, followed by NIR/IR ratio, IR band, and IR/Red ratio variables. Lake features extracted from both images show there were 142 lakes in 2002 as compared to 188 lakes in 2004. In general, there was a significant increase in planimetric area from 2002 to 2004, and we documented the formation of 46 new

  10. Distributed Merge Trees

    SciTech Connect

    Morozov, Dmitriy; Weber, Gunther

    2013-01-08

    Improved simulations and sensors are producing datasets whose increasing complexity exhausts our ability to visualize and comprehend them directly. To cope with this problem, we can detect and extract significant features in the data and use them as the basis for subsequent analysis. Topological methods are valuable in this context because they provide robust and general feature definitions. As the growth of serial computational power has stalled, data analysis is becoming increasingly dependent on massively parallel machines. To satisfy the computational demand created by complex datasets, algorithms need to effectively utilize these computer architectures. The main strength of topological methods, their emphasis on global information, turns into an obstacle during parallelization. We present two approaches to alleviate this problem. We develop a distributed representation of the merge tree that avoids computing the global tree on a single processor and lets us parallelize subsequent queries. To account for the increasing number of cores per processor, we develop a new data structure that lets us take advantage of multiple shared-memory cores to parallelize the work on a single node. Finally, we present experiments that illustrate the strengths of our approach as well as help identify future challenges.

  11. Tree Modeling and Dynamics Simulation

    NASA Astrophysics Data System (ADS)

    Tian-shuang, Fu; Yi-bing, Li; Dong-xu, Shen

    This paper introduces the theory about tree modeling and dynamic movements simulation in computer graphics. By comparing many methods we choose Geometry-based rendering as our method. The tree is decomposed into branches and leaves, under the rotation and quaternion methods we realize the tree animation and avoid the Gimbals Lock in Euler rotation. We take Orge 3D as render engine, which has good graphics programming ability. By the end we realize the tree modeling and dynamic movements simulation, achieve realistic visual quality with little computation cost.

  12. Human decision error (HUMDEE) trees

    SciTech Connect

    Ostrom, L.T.

    1993-08-01

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision.

  13. Generalized linear and generalized additive models in studies of species distributions: Setting the scene

    USGS Publications Warehouse

    Guisan, A.; Edwards, T.C., Jr.; Hastie, T.

    2002-01-01

    An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.

  14. Comparing ANNs, EAs, and Trees: a basic machine-learning approach to predictive environmental models.

    NASA Astrophysics Data System (ADS)

    Williams, J.; Poff, N.

    2005-05-01

    Machine learning techniques for ecological applications or "eco-informatics" are becoming increasingly useful and accessible for ecologists. We evaluated the predictive ability of three commercially available (i.e. user-friendly) software packages for artificial neural networks (ANNs), evolutionary algorithms (EAs), and classification/regression trees (Trees). We analyzed fish and habitat data for streams in the mid-Atlantic region of the U.S., which was collected by the U.S. Environmental Protection Agency (EPA). The data includes over 200 environmental descriptors summarizing watershed, stream, and water chemistry characteristics in addition to derived fish community metrics (i.e. richness, IBI scores, % exotics). In our analysis we predicted individual species presence/absence and fish community metrics as a function of these local and regional scale habitat variables. Predictive ability is evaluated with independent validation data. These approaches could prove especially useful for conservation or management applications where ecologists seek to utilize the most comprehensive data to make predictions at various scales. By employing "user-friendly" software we hope to show that ecologists, without extensive knowledge of computational science, can benefit from these techniques by extracting more information about complex ecosystems. Relative strengths and weaknesses of these three approaches are compared and recommendations for their use in conservation applications are presented.

  15. Elevation, Not Deforestation, Promotes Genetic Differentiation in a Pioneer Tropical Tree.

    PubMed

    Castilla, Antonio R; Pope, Nathaniel; Jaffé, Rodolfo; Jha, Shalene

    2016-01-01

    The regeneration of disturbed forest is an essential part of tropical forest ecology, both with respect to natural disturbance regimes and large-scale human-mediated logging, grazing, and agriculture. Pioneer tree species are critical for facilitating the transition from deforested land to secondary forest because they stabilize terrain and enhance connectivity between forest fragments by increasing matrix permeability and initiating disperser community assembly. Despite the ecological importance of early successional species, little is known about their ability to maintain gene flow across deforested landscapes. Utilizing highly polymorphic microsatellite markers, we examined patterns of genetic diversity and differentiation for the pioneer understory tree Miconia affinis across the Isthmus of Panama. Furthermore, we investigated the impact of geographic distance, forest cover, and elevation on genetic differentiation among populations using circuit theory and regression modeling within a landscape genetics framework. We report marked differences in historical and contemporary migration rates and moderately high levels of genetic differentiation in M. affinis populations across the Isthmus of Panama. Genetic differentiation increased significantly with elevation and geographic distance among populations; however, we did not find that forest cover enhanced or reduced genetic differentiation in the study region. Overall, our results reveal strong dispersal for M. affinis across human-altered landscapes, highlighting the potential use of this species for reforestation in tropical regions. Additionally, this study demonstrates the importance of considering topography when designing programs aimed at conserving genetic diversity within degraded tropical landscapes. PMID:27280872

  16. Elevation, Not Deforestation, Promotes Genetic Differentiation in a Pioneer Tropical Tree

    PubMed Central

    Castilla, Antonio R.; Pope, Nathaniel; Jaffé, Rodolfo; Jha, Shalene

    2016-01-01

    The regeneration of disturbed forest is an essential part of tropical forest ecology, both with respect to natural disturbance regimes and large-scale human-mediated logging, grazing, and agriculture. Pioneer tree species are critical for facilitating the transition from deforested land to secondary forest because they stabilize terrain and enhance connectivity between forest fragments by increasing matrix permeability and initiating disperser community assembly. Despite the ecological importance of early successional species, little is known about their ability to maintain gene flow across deforested landscapes. Utilizing highly polymorphic microsatellite markers, we examined patterns of genetic diversity and differentiation for the pioneer understory tree Miconia affinis across the Isthmus of Panama. Furthermore, we investigated the impact of geographic distance, forest cover, and elevation on genetic differentiation among populations using circuit theory and regression modeling within a landscape genetics framework. We report marked differences in historical and contemporary migration rates and moderately high levels of genetic differentiation in M. affinis populations across the Isthmus of Panama. Genetic differentiation increased significantly with elevation and geographic distance among populations; however, we did not find that forest cover enhanced or reduced genetic differentiation in the study region. Overall, our results reveal strong dispersal for M. affinis across human-altered landscapes, highlighting the potential use of this species for reforestation in tropical regions. Additionally, this study demonstrates the importance of considering topography when designing programs aimed at conserving genetic diversity within degraded tropical landscapes. PMID:27280872

  17. Dieback and episodic mortality of Cercidium microphyllum (foothill paloverde), a dominant Sonoran Desert tree

    USGS Publications Warehouse

    Bowers, Janice E.; Turner, R.M.

    2001-01-01

    Past and current dieback of Cercidium microphyllum, a dominant, drought-deciduous tree in the Sonoran Desert, was investigated at Tumamoc Hill, Tucson, Arizona, USA. Logistic regression predicted that the odds of a Cercidium plant being alive should decrease with increasing circumference, association with the columnar cactus Carnegiea gigantea, and occurrence on steep slopes. Slope azimuth, parasitization by Phoradendron californicum, and distance to nearest Cercidium within 5 m did not significantly affect the odds of survival. Carnegiea was a source of background mortality rather than a primary cause of dieback. Of the >1,000 living and dead plants sampled, 7.7% had died within the past 5 to 7 years. An additional 12.8% died in the more distant past. Diebacks tended to occur during severe deficits in annual, especially summer, rain. More than half of the dead plants in the sample were ???50 cm in girth. In current and past diebacks on Tumamoc Hill, it seems likely that severe drought interacted with natural senescence of an aging population, weakening large, old trees and hastening their deaths.

  18. Distributed game-tree searching

    SciTech Connect

    Schaeffer, J. )

    1989-02-01

    Conventional parallelizations of the alpha-beta ({alpha}{beta}) algorithm have met with limited success. Implementations suffer primarily from the synchronization and search overheads of parallelization. This paper describes a parallel {alpha}{beta} searching program that achieves high performance through the use of four different types of processes: Controllers, Searchers, Table Managers, and Scouts. Synchronization is reduced by having Controller process reassigning idle processes to help out busy ones. Search overhead is reduced by having two types of parallel table management: global Table Managers and the periodic merging and redistribution of local tables. Experiments show that nine processors can achieve 5.67-fold speedups but beyond that, additional processors provide diminishing returns. Given that additional resources are of little benefit, speculative computing is introduced as a means of extending the effective number of processors that can be utilized. Scout processes speculatively search ahead in the tree looking for interesting features and communicate this information back to the {alpha}{beta} program. In this way, the effective search depth is extended. These ideas have been tested experimentally and empirically as part of the chess program ParaPhoenix.

  19. Relating phylogenetic trees to transmission trees of infectious disease outbreaks.

    PubMed

    Ypma, Rolf J F; van Ballegooijen, W Marijn; Wallinga, Jacco

    2013-11-01

    Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology. PMID:24037268

  20. A comparison of regression and regression-kriging for soil characterization using remote sensing imagery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In precision agriculture regression has been used widely to quality the relationship between soil attributes and other environmental variables. However, spatial correlation existing in soil samples usually makes the regression model suboptimal. In this study, a regression-kriging method was attemp...

  1. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  2. Investigating bias in squared regression structure coefficients

    PubMed Central

    Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce

    2015-01-01

    The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273

  3. Supercooling Capacity Increases from Sea Level to Tree Line in the Hawaiian Tree Species Metrosideros polymorpha.

    PubMed

    Melcher; Cordell; Jones; Scowcroft; Niemczura; Giambelluca; Goldstein

    2000-05-01

    Population-specific differences in the freezing resistance of Metrosideros polymorpha leaves were studied along an elevational gradient from sea level to tree line (located at ca. 2500 m above sea level) on the east flank of the Mauna Loa volcano in Hawaii. In addition, we also studied 8-yr-old saplings grown in a common garden from seeds collected from the same field populations. Leaves of low-elevation field plants exhibited damage at -2 degrees C, before the onset of ice formation, which occurred at -5.7 degrees C. Leaves of high-elevation plants exhibited damage at ca. -8.5 degrees C, concurrent with ice formation in the leaf tissue, which is typical of plants that avoid freezing in their natural environment by supercooling. Nuclear magnetic resonance studies revealed that water molecules of both extra- and intracellular leaf water fractions from high-elevation plants had restricted mobility, which is consistent with their low water content and their high levels of osmotically active solutes. Decreased mobility of water molecules may delay ice nucleation and/or ice growth and may therefore enhance the ability of plant tissues to supercool. Leaf traits that correlated with specific differences in supercooling capacity were in part genetically determined and in part environmentally induced. Evidence indicated that lower apoplastic water content and smaller intercellular spaces were associated with the larger supercooling capacity of the plant's foliage at tree line. The irreversible tissue-damage temperature decreased by ca. 7 degrees C from sea level to tree line in leaves of field populations. However, this decrease appears to be only large enough to allow M. polymorpha trees to avoid leaf tissue damage from freezing up to a level of ca. 2500 m elevation, which is also the current tree line location on the east flank of Mauna Loa. The limited freezing resistance of M. polymorpha leaves may be partially responsible for the occurrence of tree line at a relatively

  4. Automatic localization of bifurcations and vessel crossings in digital fundus photographs using location regression

    NASA Astrophysics Data System (ADS)

    Niemeijer, Meindert; Dumitrescu, Alina V.; van Ginneken, Bram; Abrámoff, Michael D.

    2011-03-01

    Parameters extracted from the vasculature on the retina are correlated with various conditions such as diabetic retinopathy and cardiovascular diseases such as stroke. Segmentation of the vasculature on the retina has been a topic that has received much attention in the literature over the past decade. Analysis of the segmentation result, however, has only received limited attention with most works describing methods to accurately measure the width of the vessels. Analyzing the connectedness of the vascular network is an important step towards the characterization of the complete vascular tree. The retinal vascular tree, from an image interpretation point of view, originates at the optic disc and spreads out over the retina. The tree bifurcates and the vessels also cross each other. The points where this happens form the key to determining the connectedness of the complete tree. We present a supervised method to detect the bifurcations and crossing points of the vasculature of the retina. The method uses features extracted from the vasculature as well as the image in a location regression approach to find those locations of the segmented vascular tree where the bifurcation or crossing occurs (from here, POI, points of interest). We evaluate the method on the publicly available DRIVE database in which an ophthalmologist has marked the POI.

  5. Modelling the ecological consequences of whole tree harvest for bioenergy production

    NASA Astrophysics Data System (ADS)

    Skår, Silje; Lange, Holger; Sogn, Trine

    2013-04-01

    There is an increasing demand for energy from biomass as a substitute to fossil fuels worldwide, and the Norwegian government plans to double the production of bioenergy to 9% of the national energy production or to 28 TWh per year by 2020. A large part of this increase may come from forests, which have a great potential with respect to biomass supply as forest growth increasingly has exceeded harvest in the last decades. One feasible option is the utilization of forest residues (needles, twigs and branches) in addition to stems, known as Whole Tree Harvest (WTH). As opposed to WTH, the residues are traditionally left in the forest with Conventional Timber Harvesting (CH). However, the residues contain a large share of the treés nutrients, indicating that WTH may possibly alter the supply of nutrients and organic matter to the soil and the forest ecosystem. This may potentially lead to reduced tree growth. Other implications can be nutrient imbalance, loss of carbon from the soil and changes in species composition and diversity. This study aims to identify key factors and appropriate strategies for ecologically sustainable WTH in Norway spruce (Picea abies) and Scots pine (Pinus sylvestris) forest stands in Norway. We focus on identifying key factors driving soil organic matter, nutrients, biomass, biodiversity etc. Simulations of the effect on the carbon and nitrogen budget with the two harvesting methods will also be conducted. Data from field trials and long-term manipulation experiments are used to obtain a first overview of key variables. The relationships between the variables are hitherto unknown, but it is by no means obvious that they could be assumed as linear; thus, an ordinary multiple linear regression approach is expected to be insufficient. Here we apply two advanced and highly flexible modelling frameworks which hardly have been used in the context of tree growth, nutrient balances and biomass removal so far: Generalized Additive Models (GAMs) and

  6. Predicting species’ range limits from functional traits for the tree flora of North America

    PubMed Central

    Stahl, Ulrike; Reu, Björn; Wirth, Christian

    2014-01-01

    Using functional traits to explain species’ range limits is a promising approach in functional biogeography. It replaces the idiosyncrasy of species-specific climate ranges with a generic trait-based predictive framework. In addition, it has the potential to shed light on specific filter mechanisms creating large-scale vegetation patterns. However, its application to a continental flora, spanning large climate gradients, has been hampered by a lack of trait data. Here, we explore whether five key plant functional traits (seed mass, wood density, specific leaf area (SLA), maximum height, and longevity of a tree)—indicative of life history, mechanical, and physiological adaptations—explain the climate ranges of 250 North American tree species distributed from the boreal to the subtropics. Although the relationship between traits and the median climate across a species range is weak, quantile regressions revealed strong effects on range limits. Wood density and seed mass were strongly related to the lower but not upper temperature range limits of species. Maximum height affects the species range limits in both dry and humid climates, whereas SLA and longevity do not show clear relationships. These results allow the definition and delineation of climatic “no-go areas” for North American tree species based on key traits. As some of these key traits serve as important parameters in recent vegetation models, the implementation of trait-based climatic constraints has the potential to predict both range shifts and ecosystem consequences on a more functional basis. Moreover, for future trait-based vegetation models our results provide a benchmark for model evaluation. PMID:25225398

  7. Predicting species' range limits from functional traits for the tree flora of North America.

    PubMed

    Stahl, Ulrike; Reu, Björn; Wirth, Christian

    2014-09-23

    Using functional traits to explain species' range limits is a promising approach in functional biogeography. It replaces the idiosyncrasy of species-specific climate ranges with a generic trait-based predictive framework. In addition, it has the potential to shed light on specific filter mechanisms creating large-scale vegetation patterns. However, its application to a continental flora, spanning large climate gradients, has been hampered by a lack of trait data. Here, we explore whether five key plant functional traits (seed mass, wood density, specific leaf area (SLA), maximum height, and longevity of a tree)--indicative of life history, mechanical, and physiological adaptations--explain the climate ranges of 250 North American tree species distributed from the boreal to the subtropics. Although the relationship between traits and the median climate across a species range is weak, quantile regressions revealed strong effects on range limits. Wood density and seed mass were strongly related to the lower but not upper temperature range limits of species. Maximum height affects the species range limits in both dry and humid climates, whereas SLA and longevity do not show clear relationships. These results allow the definition and delineation of climatic "no-go areas" for North American tree species based on key traits. As some of these key traits serve as important parameters in recent vegetation models, the implementation of trait-based climatic constraints has the potential to predict both range shifts and ecosystem consequences on a more functional basis. Moreover, for future trait-based vegetation models our results provide a benchmark for model evaluation. PMID:25225398

  8. Regression of altitude-produced cardiac hypertrophy.

    NASA Technical Reports Server (NTRS)

    Sizemore, D. A.; Mcintyre, T. W.; Van Liere, E. J.; Wilson , M. F.

    1973-01-01

    The rate of regression of cardiac hypertrophy with time has been determined in adult male albino rats. The hypertrophy was induced by intermittent exposure to simulated high altitude. The percentage hypertrophy was much greater (46%) in the right ventricle than in the left (16%). The regression could be adequately fitted to a single exponential function with a half-time of 6.73 plus or minus 0.71 days (90% CI). There was no significant difference in the rates of regression for the two ventricles.

  9. L-moments under nuisance regression

    NASA Astrophysics Data System (ADS)

    Picek, Jan; Schindler, Martin

    2016-06-01

    The L-moments are analogues of the conventional moments and have similar interpretations. They are calculated using linear combinations of the expectation of ordered data. In practice, L-moments must usually be estimated from a random sample drawn from an unknown distribution as a linear combination of ordered statistics. Jureckova and Picek (2014) showed that averaged regression quantile is asymptotically equivalent to the location quantile. We therefore propose a generalization of L-moments in the model with nuisance regression using the averaged regression quantiles.

  10. Sparse Multivariate Regression With Covariance Estimation

    PubMed Central

    Rothman, Adam J.; Levina, Elizaveta; Zhu, Ji

    2014-01-01

    We propose a procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for correlation of the response variables. This method, which we call multivariate regression with covariance estimation (MRCE), involves penalized likelihood with simultaneous estimation of the regression coefficients and the covariance structure. An efficient optimization algorithm and a fast approximation are developed for computing MRCE. Using simulation studies, we show that the proposed method outperforms relevant competitors when the responses are highly correlated. We also apply the new method to a finance example on predicting asset returns. An R-package containing this dataset and code for computing MRCE and its approximation are available online. PMID:24963268

  11. Spontaneous Regression of Primitive Merkel Cell Carcinoma

    PubMed Central

    2015-01-01

    Merkel cell carcinoma (MCC) is a rare, aggressive skin tumor that mainly occurs in the elderly with a generally poor prognosis. Like all skin cancers, its incidence is rising. Despite the poor prognosis, a few reports of spontaneous regression have been published. We describe the case of a 89-year-old male patient who presented two MCC lesions of the scalp. Following biopsy the lesions underwent complete regression with no clinical evidence of residual tumor up to 24 months. The current knowledge of MCC and the other cases of spontaneous regression described in the literature are reviewed. PMID:26788270

  12. Intermediate tree cover can maximize groundwater recharge in the seasonally dry tropics

    NASA Astrophysics Data System (ADS)

    Ilstedt, U.; Bargués Tobella, A.; Bazié, H. R.; Bayala, J.; Verbeeten, E.; Nyberg, G.; Sanou, J.; Benegas, L.; Murdiyarso, D.; Laudon, H.; Sheil, D.; Malmer, A.

    2016-02-01

    Water scarcity contributes to the poverty of around one-third of the world’s people. Despite many benefits, tree planting in dry regions is often discouraged by concerns that trees reduce water availability. Yet relevant studies from the tropics are scarce, and the impacts of intermediate tree cover remain unexplored. We developed and tested an optimum tree cover theory in which groundwater recharge is maximized at an intermediate tree density. Below this optimal tree density the benefits from any additional trees on water percolation exceed their extra water use, leading to increased groundwater recharge, while above the optimum the opposite occurs. Our results, based on groundwater budgets calibrated with measurements of drainage and transpiration in a cultivated woodland in West Africa, demonstrate that groundwater recharge was maximised at intermediate tree densities. In contrast to the prevailing view, we therefore find that moderate tree cover can increase groundwater recharge, and that tree planting and various tree management options can improve groundwater resources. We evaluate the necessary conditions for these results to hold and suggest that they are likely to be common in the seasonally dry tropics, offering potential for widespread tree establishment and increased benefits for hundreds of millions of people.

  13. Intermediate tree cover can maximize groundwater recharge in the seasonally dry tropics

    PubMed Central

    Ilstedt, U.; Bargués Tobella, A.; Bazié, H. R.; Bayala, J.; Verbeeten, E.; Nyberg, G.; Sanou, J.; Benegas, L.; Murdiyarso, D.; Laudon, H.; Sheil, D.; Malmer, A.

    2016-01-01

    Water scarcity contributes to the poverty of around one-third of the world’s people. Despite many benefits, tree planting in dry regions is often discouraged by concerns that trees reduce water availability. Yet relevant studies from the tropics are scarce, and the impacts of intermediate tree cover remain unexplored. We developed and tested an optimum tree cover theory in which groundwater recharge is maximized at an intermediate tree density. Below this optimal tree density the benefits from any additional trees on water percolation exceed their extra water use, leading to increased groundwater recharge, while above the optimum the opposite occurs. Our results, based on groundwater budgets calibrated with measurements of drainage and transpiration in a cultivated woodland in West Africa, demonstrate that groundwater recharge was maximised at intermediate tree densities. In contrast to the prevailing view, we therefore find that moderate tree cover can increase groundwater recharge, and that tree planting and various tree management options can improve groundwater resources. We evaluate the necessary conditions for these results to hold and suggest that they are likely to be common in the seasonally dry tropics, offering potential for widespread tree establishment and increased benefits for hundreds of millions of people. PMID:26908158

  14. Intermediate tree cover can maximize groundwater recharge in the seasonally dry tropics.

    PubMed

    Ilstedt, U; Bargués Tobella, A; Bazié, H R; Bayala, J; Verbeeten, E; Nyberg, G; Sanou, J; Benegas, L; Murdiyarso, D; Laudon, H; Sheil, D; Malmer, A

    2016-01-01

    Water scarcity contributes to the poverty of around one-third of the world's people. Despite many benefits, tree planting in dry regions is often discouraged by concerns that trees reduce water availability. Yet relevant studies from the tropics are scarce, and the impacts of intermediate tree cover remain unexplored. We developed and tested an optimum tree cover theory in which groundwater recharge is maximized at an intermediate tree density. Below this optimal tree density the benefits from any additional trees on water percolation exceed their extra water use, leading to increased groundwater recharge, while above the optimum the opposite occurs. Our results, based on groundwater budgets calibrated with measurements of drainage and transpiration in a cultivated woodland in West Africa, demonstrate that groundwater recharge was maximised at intermediate tree densities. In contrast to the prevailing view, we therefore find that moderate tree cover can increase groundwater recharge, and that tree planting and various tree management options can improve groundwater resources. We evaluate the necessary conditions for these results to hold and suggest that they are likely to be common in the seasonally dry tropics, offering potential for widespread tree establishment and increased benefits for hundreds of millions of people. PMID:26908158

  15. Two Trees: Migrating Fault Trees to Decision Trees for Real Time Fault Detection on International Space Station

    NASA Technical Reports Server (NTRS)

    Lee, Charles; Alena, Richard L.; Robinson, Peter

    2004-01-01

    We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.

  16. The Group Tree of Experience.

    ERIC Educational Resources Information Center

    Ping, Ki

    1994-01-01

    Describes a group activity that uses a tree as a metaphor to reflect both group and personal growth during adventure activities. The tree's roots represent the group's formation, the branches and leaves represent the group's diversity and capabilities, and the seeds represent the personal learning and growth that took place within the group.…

  17. Studying Evergreen Trees in December.

    ERIC Educational Resources Information Center

    Platt, Dorothy K.

    1991-01-01

    This lesson plan uses evergreen trees on sale in cities and villages during the Christmas season to teach identification techniques. Background information, activities, and recommended references guides deal with historical, symbolic and current uses of evergreen trees, physical characteristics, selection, care, and suggestions for post-Christmas…

  18. Fractions, trees and unfinished business

    NASA Astrophysics Data System (ADS)

    Shraiman, Boris

    In this talk, mourning the loss of a teacher and a dear friend, I would like to share some unfinished thoughts loosely connecting - via Farey fraction trees - Kadanoff's study of universality of quasi-periodic route to chaos with the effort to understand universal features of genealogical trees.

  19. Tree Hydraulics: How Sap Rises

    ERIC Educational Resources Information Center

    Denny, Mark

    2012-01-01

    Trees transport water from roots to crown--a height that can exceed 100 m. The physics of tree hydraulics can be conveyed with simple fluid dynamics based upon the Hagen-Poiseuille equation and Murray's law. Here the conduit structure is modelled as conical pipes and as branching pipes. The force required to lift sap is generated mostly by…

  20. Hydrocarbons from plants and trees

    SciTech Connect

    Calvin, M.

    1982-07-01

    The way energy was used in the US in 1980 was examined. A diagram shows the development of energy from its source to its end use. The following are described: the carbon dioxide problem - the greenhouse effect, sugar cane as an energy source, hydrocarbon-producing plants and trees, and isoprenoids from plants and trees. (MHR)