Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).
Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W
2016-07-20
Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022
Self-Adaptive Induction of Regression Trees.
Fidalgo-Merino, Raúl; Núñez, Marlon
2011-08-01
A new algorithm for incremental construction of binary regression trees is presented. This algorithm, called SAIRT, adapts the induced model when facing data streams involving unknown dynamics, like gradual and abrupt function drift, changes in certain regions of the function, noise, and virtual drift. It also handles both symbolic and numeric attributes. The proposed algorithm can automatically adapt its internal parameters and model structure to obtain new patterns, depending on the current dynamics of the data stream. SAIRT can monitor the usefulness of nodes and can forget examples from selected regions, storing the remaining ones in local windows associated to the leaves of the tree. On these conditions, current regression methods need a careful configuration depending on the dynamics of the problem. Experimentation suggests that the proposed algorithm obtains better results than current algorithms when dealing with data streams that involve changes with different speeds, noise levels, sampling distribution of examples, and partial or complete changes of the underlying function. PMID:21263164
Growth in Mathematics Achievement: Analysis with Classification and Regression Trees
ERIC Educational Resources Information Center
Ma, Xin
2005-01-01
A recently developed statistical technique, often referred to as classification and regression trees (CART), holds great potential for researchers to discover how student-level (and school-level) characteristics interactively affect growth in mathematics achievement. CART is a host of advanced statistical methods that statistically cluster…
Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.
2016-01-01
Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P < 0.01). Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
NASA Astrophysics Data System (ADS)
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…
Applications of tree-structured regression for regional precipitation prediction
NASA Astrophysics Data System (ADS)
Li, Xiangshang
2000-11-01
This thesis presents a Tree-Structured Regression (TSR) method to relate daily precipitation with a variety of free atmosphere variables. Historical data were used to identify distinct weather patterns associated with differing types of precipitation events. Models were developed using 67% of the data for training and the remaining data for model validation. Seasonal models were built for each of four U.S. sites; New Orleans Louisiana, San Antonio and Amarillo of Texas as well as San Francisco California. The average correlation by site between observed and simulated daily precipitation data series range from 0.69 to 0.79 for the training set, and 0.64 to 0.79 for the validation set. Relative humidity related variables were found to be the dominant variables in these TSR models. Output from an NCAR Climate System Model (CSM) transient simulation of climate change were then used to drive the TSR models for predicting precipitation characteristics under climate change. A preliminary screening of the GCM output variables for current climate, however, revealed significant problems for the New Orleans, San Antonio and Amarillo sites. Specifically, the CSM missed the annual trends in humidity for the grid cells containing these sites. CSM output for the San Francisco site was found to be much more reliable. Therefore, we present future precipitation estimates only for the San Francisco site. While both GCM and TSR predict very small change in overall annual precipitation, they differ significantly from season to season.
Prediction of fishing effort distributions using boosted regression trees.
Soykan, Candan U; Eguchi, Tomoharu; Kohin, Suzanne; Dewar, Heidi
2014-01-01
Concerns about bycatch of protected species have become a dominant factor shaping fisheries management. However, efforts to mitigate bycatch are often hindered by a lack of data on the distributions of fishing effort and protected species. One approach to overcoming this problem has been to overlay the distribution of past fishing effort with known locations of protected species, often obtained through satellite telemetry and occurrence data, to identify potential bycatch hotspots. This approach, however, generates static bycatch risk maps, calling into question their ability to forecast into the future, particularly when dealing with spatiotemporally dynamic fisheries and highly migratory bycatch species. In this study, we use boosted regression trees to model the spatiotemporal distribution of fishing effort for two distinct fisheries in the North Pacific Ocean, the albacore (Thunnus alalunga) troll fishery and the California drift gillnet fishery that targets swordfish (Xiphias gladius). Our results suggest that it is possible to accurately predict fishing effort using < 10 readily available predictor variables (cross-validated correlations between model predictions and observed data -0.6). Although the two fisheries are quite different in their gears and fishing areas, their respective models had high predictive ability, even when input data sets were restricted to a fraction of the full time series. The implications for conservation and management are encouraging: Across a range of target species, fishing methods, and spatial scales, even a relatively short time series of fisheries data may suffice to accurately predict the location of fishing effort into the future. In combination with species distribution modeling of bycatch species, this approach holds promise as a mitigation tool when observer data are limited. Even in data-rich regions, modeling fishing effort and bycatch may provide more accurate estimates of bycatch risk than partial observer coverage
Estimation of adjusted rate differences using additive negative binomial regression.
Donoghoe, Mark W; Marschner, Ian C
2016-08-15
Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27073156
Hemmateenejad, Bahram; Shamsipur, Mojtaba; Zare-Shahabadi, Vali; Akhond, Morteza
2011-10-17
The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure. PMID:21907021
Donmez, Cenk; Berberoglu, Suha; Erdogan, Mehmet Akif; Tanriover, Anil Akin; Cilek, Ahmet
2015-02-01
Percent tree cover is the percentage of the ground surface area covered by a vertical projection of the outermost perimeter of the plants. It is an important indicator to reveal the condition of forest systems and has a significant importance for ecosystem models as a main input. The aim of this study is to estimate the percent tree cover of various forest stands in a Mediterranean environment based on an empirical relationship between tree coverage and remotely sensed data in Goksu Watershed located at the Eastern Mediterranean coast of Turkey. A regression tree algorithm was used to simulate spatial fractions of Pinus nigra, Cedrus libani, Pinus brutia, Juniperus excelsa and Quercus cerris using multi-temporal LANDSAT TM/ETM data as predictor variables and land cover information. Two scenes of high resolution GeoEye-1 images were employed for training and testing the model. The predictor variables were incorporated in addition to biophysical variables estimated from the LANDSAT TM/ETM data. Additionally, normalised difference vegetation index (NDVI) was incorporated to LANDSAT TM/ETM band settings as a biophysical variable. Stepwise linear regression (SLR) was applied for selecting the relevant bands to employ in regression tree process. SLR-selected variables produced accurate results in the model with a high correlation coefficient of 0.80. The output values ranged from 0 to 100 %. The different tree species were mapped in 30 m resolution in respect to elevation. Percent tree cover map as a final output was derived using LANDSAT TM/ETM image over Goksu Watershed and the biophysical variables. The results were tested using high spatial resolution GeoEye-1 images. Thus, the combination of the RT algorithm and higher resolution data for percent tree cover mapping were tested and examined in a complex Mediterranean environment. PMID:25604062
2014-01-01
Background There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. Methods We simulate a continuous outcome from real data on 27 environmental contaminants, some of which are correlated, and test the method’s ability to uncover the simulated interactions. The simulated outcome contains one four-way interaction, one non-linear effect and one interaction between a continuous variable and a binary variable. Four scenarios reflecting different strengths of association are simulated. We illustrate the method using real data. Results The method succeeded in identifying the true interactions in all scenarios except where the association was weakest. Some spurious interactions were also found, however. The method was also capable to identify interactions in the real data set. Conclusions We conclude that boosted regression trees can be used to uncover complex interaction effects in epidemiological studies. PMID:24993424
Shin, Yoonseok
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Janssen, I.; Stebbings, J.H.
1990-01-01
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and {approximately}200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs.
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Which Phylogenetic Networks are Merely Trees with Additional Arcs?
Francis, Andrew R; Steel, Mike
2015-09-01
A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on "2-SAT") that efficiently determines whether or not any given network can be realized in this way. Moreover, the proof provides a polynomial-time algorithm for finding one or more trees (when they exist) on which the network can be based. A number of interesting consequences are presented as corollaries; these lead to some further relevant questions and observations, which we outline in the conclusion. PMID:26070685
Evaluating multimedia chemical persistence: Classification and regression tree analysis
Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.
2000-04-01
For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Widely distributed persistent chemicals are impossible to remove from the environment and remediation by natural processes may take decades, which is problematic if adverse health or ecological effects are discovered after prolonged release into the environment. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as persistent or nonpersistent based on the chemical properties. In this approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robust for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and can be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.
Huang, C.; Townshend, J.R.G.
2003-01-01
A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Data mining in psychological treatment research: a primer on classification and regression trees.
King, Matthew W; Resick, Patricia A
2014-10-01
Data mining of treatment study results can reveal unforeseen but critical insights, such as who receives the most benefit from treatment and under what circumstances. The usefulness and legitimacy of exploratory data analysis have received relatively little recognition, however, and analytic methods well suited to the task are not widely known in psychology. With roots in computer science and statistics, statistical learning approaches offer a credible option: These methods take a more inductive approach to building a model than is done in traditional regression, allowing the data greater role in suggesting the correct relationships between variables rather than imposing them a priori. Classification and regression trees are presented as a powerful, flexible exemplar of statistical learning methods. Trees allow researchers to efficiently identify useful predictors of an outcome and discover interactions between predictors without the need to anticipate and specify these in advance, making them ideal for revealing patterns that inform hypotheses about treatment effects. Trees can also provide a predictive model for forecasting outcomes as an aid to clinical decision making. This primer describes how tree models are constructed, how the results are interpreted and evaluated, and how trees overcome some of the complexities of traditional regression. Examples are drawn from randomized clinical trial data and highlight some interpretations of particular interest to treatment researchers. The limitations of tree models are discussed, and suggestions for further reading and choices in software are offered. PMID:24588404
Technology Transfer Automated Retrieval System (TEKTRAN)
Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
Technology Transfer Automated Retrieval System (TEKTRAN)
Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Use of Tree-Based Regression in the Analyses of L2 Reading Test Items
ERIC Educational Resources Information Center
Gao, Lingyun; Rogers, W. Todd
2011-01-01
The purpose of this study was to explore whether the results of Tree Based Regression (TBR) analyses, informed by a validated cognitive model, would enhance the interpretation of item difficulties in terms of the cognitive processes involved in answering the reading items included in two forms of the Michigan English Language Assessment Battery…
Predicting the limits to tree height using statistical regressions of leaf traits.
Burgess, Stephen S O; Dawson, Todd E
2007-01-01
Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized. PMID:17447917
Nitrogen Addition Enhances Drought Sensitivity of Young Deciduous Tree Species
Dziedek, Christoph; Härdtle, Werner; von Oheimb, Goddert; Fichtner, Andreas
2016-01-01
Understanding how trees respond to global change drivers is central to predict changes in forest structure and functions. Although there is evidence on the mode of nitrogen (N) and drought (D) effects on tree growth, our understanding of the interplay of these factors is still limited. Simultaneously, as mixtures are expected to be less sensitive to global change as compared to monocultures, we aimed to investigate the combined effects of N addition and D on the productivity of three tree species (Fagus sylvatica, Quercus petraea, Pseudotsuga menziesii) in relation to functional diverse species mixtures using data from a 4-year field experiment in Northwest Germany. Here we show that species mixing can mitigate the negative effects of combined N fertilization and D events, but the community response is mainly driven by the combination of certain traits rather than the tree species richness of a community. For beech, we found that negative effects of D on growth rates were amplified by N fertilization (i.e., combined treatment effects were non-additive), while for oak and fir, the simultaneous effects of N and D were additive. Beech and oak were identified as most sensitive to combined N+D effects with a strong size-dependency observed for beech, suggesting that the negative impact of N+D becomes stronger with time as beech grows larger. As a consequence, the net biodiversity effect declined at the community level, which can be mainly assigned to a distinct loss of complementarity in beech-oak mixtures. This pattern, however, was not evident in the other species-mixtures, indicating that neighborhood composition (i.e., trait combination), but not tree species richness mediated the relationship between tree diversity and treatment effects on tree growth. Our findings point to the importance of the qualitative role (‘trait portfolio’) that biodiversity play in determining resistance of diverse tree communities to environmental changes. As such, they provide
Nitrogen Addition Enhances Drought Sensitivity of Young Deciduous Tree Species.
Dziedek, Christoph; Härdtle, Werner; von Oheimb, Goddert; Fichtner, Andreas
2016-01-01
Understanding how trees respond to global change drivers is central to predict changes in forest structure and functions. Although there is evidence on the mode of nitrogen (N) and drought (D) effects on tree growth, our understanding of the interplay of these factors is still limited. Simultaneously, as mixtures are expected to be less sensitive to global change as compared to monocultures, we aimed to investigate the combined effects of N addition and D on the productivity of three tree species (Fagus sylvatica, Quercus petraea, Pseudotsuga menziesii) in relation to functional diverse species mixtures using data from a 4-year field experiment in Northwest Germany. Here we show that species mixing can mitigate the negative effects of combined N fertilization and D events, but the community response is mainly driven by the combination of certain traits rather than the tree species richness of a community. For beech, we found that negative effects of D on growth rates were amplified by N fertilization (i.e., combined treatment effects were non-additive), while for oak and fir, the simultaneous effects of N and D were additive. Beech and oak were identified as most sensitive to combined N+D effects with a strong size-dependency observed for beech, suggesting that the negative impact of N+D becomes stronger with time as beech grows larger. As a consequence, the net biodiversity effect declined at the community level, which can be mainly assigned to a distinct loss of complementarity in beech-oak mixtures. This pattern, however, was not evident in the other species-mixtures, indicating that neighborhood composition (i.e., trait combination), but not tree species richness mediated the relationship between tree diversity and treatment effects on tree growth. Our findings point to the importance of the qualitative role ('trait portfolio') that biodiversity play in determining resistance of diverse tree communities to environmental changes. As such, they provide further
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087
Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L. Monika
2012-01-01
The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620
Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L Monika
2012-01-01
The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R(2) 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R(2) 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R(2) 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R(2) 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620
NASA Astrophysics Data System (ADS)
Coimbra, Rute; Rodriguez-Galiano, Victor; Olóriz, Federico; Chica-Olmo, Mario
2014-12-01
Research based on ancient carbonate geochemical records is often assisted by multivariate statistical analysis, among others, used for data mining. This contribution reports a complementary approach that can be applied to paleoenvironmental research. The choice to use a machine learning method, here regression trees (RT), relied in the ability to learn complex patterns, integrating multiple types of data with different statistical distributions to obtain a knowledge model of geochemical behavior along a paleo-platform. The Late Jurassic epioceanic deposits under scope are represented by six stratigraphic sections located in SE Spain and on the Majorca Island. The used database comprises a total of 1960 data points corresponding to eight variables (stable C and O isotopes, the elements Ca, Mg, Sr, Fe, Mn and skeletal content). This study uses RT models in which the predictive variables are the geochemical proxies, whilst skeletal content is used as a target variable. The resulting model is data driven, explaining variations in the target variable and providing additional information on the relative importance of each variable to each prediction, as well as its corresponding threshold values. The obtained RT revealed a structured distribution of samples, organized either by stratigraphic section or sets of nearby sections. Averaged estimated skeletal abundance confirmed the initial observations of higher skeletal content for the most distal sections with estimated values from 18% to 27%. In contrast, lower skeletal abundance from 5% to 15% is proposed for the remaining sections. The geochemical variable that best discriminates this major trend is δ18O, at a threshold value of -0.2‰, interpreted as evidence for separation of water-mass properties across the studied areas. Other four variables were considered relevant by the obtained decision tree: C isotopes, Ca, Sr and Mn, providing new insights for further differentiation between sets of samples.
Modelling dissimilarity: generalizing ultrametric and additive tree representations.
Hubert, L; Arabie, P; Meulman, J
2001-05-01
Methods for the hierarchical clustering of an object set produce a sequence of nested partitions such that object classes within each successive partition are constructed from the union of object classes present at the previous level. Any such sequence of nested partitions can in turn be characterized by an ultrametric. An approach to generalizing an (ultrametric) representation is proposed in which the nested character of the partition sequence is relaxed and replaced by the weaker requirement that the classes within each partition contain objects consecutive with respect to a fixed ordering of the objects. A method for fitting such a structure to a given proximity matrix is discussed, along with several alternative strategies for graphical representation. Using this same ultrametric extension, additive tree representations can also be generalized by replacing the ultrametric component in the decomposition of an additive tree (into an ultrametric and a centroid metric). A common numerical illustration is developed and maintained throughout the paper. PMID:11393895
Prioritizing Highway Safety Manual's crash prediction variables using boosted regression trees.
Saha, Dibakar; Alluri, Priyanka; Gan, Albert
2015-06-01
The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data. PMID:25823903
Analysis of Maryland Poisoning Deaths Using Classification And Regression Tree (CART) Analysis
Pamer, Carol; Serpi, Tracey; Finkelstein, Joseph
2008-01-01
Our study is a cross-sectional analysis of Maryland poisoning deaths for years 2003 and 2004. We used Classification and Regression Tree (CART) methodology to classify 1,204 Maryland undetermined intent poisoning deaths as either unintentional or suicidal poisonings. The predictive ability of the selected set of variables (i.e., poisoned in the home or workplace, location type where poisoned, place of death, poison type, victim race and age, year of death) was extremely good. Of the 301 test cases, only eight were misclassified by the CART regression tree. Of 1,204 undetermined intent poisoning deaths, CART classified 903 as suicides and 301 as unintentional deaths. The major strength of our study is the use of CART to differentiate with a high degree of accuracy between unintentional and suicidal poisoning deaths among Maryland undetermined intent poisoning deaths. PMID:18999168
Estimating Basin Snow Volume Using Aerial LiDAR and Binary Regression Trees (Invited)
NASA Astrophysics Data System (ADS)
Shallcross, A. T.; McNamara, J. P.; Flores, A. N.; Marshall, H.; Marks, D. G.; Glenn, N. F.
2010-12-01
Snow cover derived from airborne LiDAR (Light Detection And Ranging) is combined with binary regression trees to improve the prediction of total basin snow volume for the Dry Creek Experimental Watershed (DCEW), ID. These methods are used to identify site-specific topographic controls on the spatial distribution of snow so that future point measurements of snow depth can be distributed through space efficiently. LiDAR is used to map snow cover by differencing the digital elevation models (DEMs) obtained from a snow-covered overflight and a snow-free overflight. Topographic parameters known to control snow distribution are calculated from the snow free LiDAR dataset. Here, mean vegetation height, slope, aspect, solar radiation, and elevation are used to predict snow depth via a binary regression tree using ten-fold cross-validation. The branches leading to the terminal nodes of the regression tree are used to segment the watershed into homogeneous snow distribution units. Preliminary results indicate that 23 statistically significant discrete units exist. Thus, during future field campaigns, point measurements of snow depth can be gathered and distributed throughout these units. Mean measured SWE/depth of each unit can be summed to determine the total basin snow volume. This method should decrease field time and improve the accuracy of basin snow volume estimates for watershed analyses.
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period. PMID:27400527
NASA Astrophysics Data System (ADS)
Park, Yongeun; Pachepsky, Yakov A.; Cho, Kyung Hwa; Jeon, Dong Jin; Kim, Joon Ha
2015-10-01
To control algal blooms, the stressor-response relationships between water quality metrics, environmental variables, and algal growth need to be better understood and modeled. Machine-learning methods have been suggested as means to express the stressor-response relationships that are found when applying mechanistic water quality models. The objective of this work was to evaluate the efficiency of regression trees in the development of a stressor-response model for chlorophyll-a (Chl-a) concentrations, using the results from site-specific mechanistic water quality modeling. The 2-dimensional hydrodynamic and water quality model (CE-QUAL-W2) model was applied to simulate water quality using four-year observational data and additional scenarios of air temperature increases for the Yeongsan Reservoir in South Korea. Regression tree modeling was applied to the results of these simulations. Given the well-expressed seasonality in the simulated Chl-a dynamics, separate regression trees were developed for months from May to September. The regression trees provided a reasonably accurate representation of the stressor-response dependence generated by the CE-QUAL-W2 model. Different stressors were then selected as split variables for different months, and, in most cases, splits by the same stressor variable yielded the same correlation sign between the variable and the Chl-a concentration. Compared to physical variables, nutrient content appeared to better predict Chl-a responses. The highest Chl-a temperature sensitivities were found for May and June. Regression tree splits based on ammonium concentration resulted in a consistent trend of greater sensitivity in the groups of samples with higher ammonium concentrations. Regression tree models provided a transparent visual representation of the stressor-response relationships for Chl-a and its sensitivity. Overall, the representation of relationships using classification and regression tools can be considered a useful
A regression tree approach to identifying subgroups with differential treatment effects.
Loh, Wei-Yin; He, Xu; Man, Michael
2015-05-20
In the fight against hard-to-treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then, we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the generalized unbiased interaction detection and estimation (GUIDE) approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi-squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by-products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data. PMID:25656439
A regression tree approach to identifying subgroups with differential treatment effects
Loh, Wei-Yin; He, Xu; Man, Michael
2015-01-01
In the fight against hard-to-treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the GUIDE approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi-squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by-products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data. PMID:25656439
Gao, Jun; Lavergne, M. Ruth; McIntyre, Paul
2013-01-01
Classification and regression tree (CART) analysis was used to identify subpopulations with lower palliative care program (PCP) enrolment rates. CART analysis uses recursive partitioning to group predictors. The PCP enrolment rate was 72 percent for the 6,892 adults who died of cancer from 2000 and 2005 in two counties in Nova Scotia, Canada. The lowest PCP enrolment rates were for nursing home residents over 82 years (27 percent), a group residing more than 43 kilometres from the PCP (31 percent), and another group living less than two weeks after their cancer diagnosis (37 percent). The highest rate (86 percent) was for the 2,118 persons who received palliative radiation. Findings from multiple logistic regression (MLR) were provided for comparison. CART findings identified low PCP enrolment subpopulations that were defined by interactions among demographic, social, medical, and health system predictors. PMID:21805944
NASA Astrophysics Data System (ADS)
Devineni, N.; Lall, U.; Cook, E.; Pederson, N.
2011-12-01
We present the application of a linear model in a Hierarchical Bayesian Regression (HBR) framework for reconstructing the summer seasonal averaged streamflow at five stations in the Delaware River Basin using eight newly developed regional tree ring chronologies. This technique directly provides estimates of the posterior probability distribution of each reconstructed streamflow value, considering model parameter uncertainty. The methodology also allows us to shrink the model parameters towards a common mean to incorporate the predictive ability of each tree chronology on multiple stations. We present the results from HBR analysis along with the results from traditional Point by Point Regression (PPR) analysis to demonstrate the benefits of developing the reconstructions under a Bayesian modeling framework. Further, we also present the comparative results of the model validation using various performance evaluation metrics such as reduction in error (RE) and coefficient of efficiency (CE). The reconstructed streamflow at various stations can be utilized to examine the frequency and recurrence attributes of extreme droughts in the region and their potential connections to known low frequency climate modes.
Warkentin, Karen M
2002-01-01
The physiological role of the embryonic external gills in anurans is equivocal. In some species, diffusion alone is clearly sufficient to supply oxygen throughout the embryonic period. In others, morphological elaboration and environmental regulation of the external gills suggest functional importance. Since oxygen stress is a common trigger of hatching, I examined the relationships among hatching timing, oxygen stress, and external gill loss. I worked with the red-eyed tree frog, Agalychnis callidryas, a species with arboreal eggs and aquatic tadpoles in which gill regression is associated with hatching, and hatching timing affects posthatching survival with aquatic predators. Both exposure to a hypoxic gas mixture and submergence in water, a natural context in which hypoxic stress can occur, induced early hatching. Exposure to hyperoxic gas mixtures induced regression of external gills, and subsequent exposure to air induced early hatching. Prostaglandin-induced external gill regression also induced hatching, and this effect was partially ameliorated by exposure to hyperoxic gas. Together, these results suggest that external gills enhance the oxygen uptake of embryos and are necessary to extend embryonic development past the onset of hatching competence. PMID:12024291
NASA Astrophysics Data System (ADS)
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA
Comparison of universal kriging and regression tree modelling for soil property mapping
NASA Astrophysics Data System (ADS)
Kempen, Bas
2013-04-01
Geostatistical modelling approaches have been dominating the field of digital soil mapping (DSM) since its inception in the early 1980s. In recent years, however, machine learning methods such as classification and regression trees, random forests, and neural networks have quickly gained popularity among researchers in the DSM community. The increased use of these methods has largely gone at the cost of geostatistical approaches. Despite the apparent shift in the application of DSM methods from geostatistics to machine learning, quantitative comparisons of the prediction performance of these methods are largely lacking. The aims of this research, therefore, are: i) to map two soil properties (topsoil organic matter content and thickness of the peat layer in the soil profile) using regression tree (RT) modelling and universal kriging (UK), and ii) to compare the prediction performance of these methods with independent data obtained by probability sampling. Using such data for validation does not only yield a statistically valid and unbiased estimates of the map accuracy, but it also allows a statistical comparison of the accuracies of the maps generated by the two methods. The topsoil organic matter content and the thickness of the peat layer were mapped for a 14,000 ha area in the province of Drenthe, The Netherlands. The calibration dataset contained soil property observations at 1,715 sites. The covariates used include layers derived from soil and paleogeography maps, land cover, relative elevation, drainage class, land reclamation period, elevation change, and historic land use. The validation dataset contained 125 observations selected by stratified simple random sampling of the study area. The root mean squared error (RMSE) of the soil organic matter map obtained by RT modelling was 0.603 log(%), that of the map obtained by UK 0.595 log(%). The difference in map accuracy was not significant (p = 0.377). The RMSE of the peat thickness map obtained by RT
Cohen, Ira L; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N S; Romanczyk, Raymond G; Karmel, Bernard Z; Gardner, Judith M
2016-09-01
In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80 %, generalized to an independent validation set, and generalized across age groups and sites, and agreed well with ADOS classifications. Parent PDDBIs yielded better results than teacher PDDBIs but, when CART predictions agreed across informants, sensitivity increased. Results also revealed three subtypes of ASD: minimally verbal, verbal, and atypical; and two, relatively common subtypes of non-ASD children: social pragmatic problems and good social skills. These subgroups corresponded to differences in behavior profiles and associated bio-medical findings. PMID:27318809
Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees
NASA Astrophysics Data System (ADS)
Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.
2015-12-01
In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.
Siddique, Ilyas; Vieira, Ima Célia Guimarães; Schmidt, Susanne; Lamb, David; Carvalho, Cláudio José Reis; Figueiredo, Ricardo de Oliveira; Blomberg, Simon; Davidson, Eric A
2010-07-01
Nutrient enrichment is increasingly affecting many tropical ecosystems, but there is no information on how this affects tree biodiversity. To examine dynamics in vegetation structure and tree species biomass and diversity, we annually remeasured tree species before and for six years after repeated additions of nitrogen (N) and phosphorus (P) in permanent plots of abandoned pasture in Amazonia. Nitrogen and, to a lesser extent, phosphorus addition shifted growth among woody species. Nitrogen stimulated growth of two common pioneer tree species and one common tree species adaptable to both high- and low-light environments, while P stimulated growth only of the dominant pioneer tree Rollinia exsucca (Annonaceae). Overall, N or P addition reduced tree assemblage evenness and delayed tree species accrual over time, likely due to competitive monopolization of other resources by the few tree species responding to nutrient enrichment with enhanced establishment and/or growth rates. Absolute tree growth rates were elevated for two years after nutrient addition. However, nutrient-induced shifts in relative tree species growth and reduced assemblage evenness persisted for more than three years after nutrient addition, favoring two nutrient-responsive pioneers and one early-secondary tree species. Surprisingly, N + P effects on tree biomass and species diversity were consistently weaker than N-only and P-only effects, because grass biomass increased dramatically in response to N + P addition. The resulting intensified competition probably prevented an expected positive N + P synergy in the tree assemblage. Thus, N or P enrichment may favor unknown tree functional response types, reduce the diversity of coexisting species, and delay species accrual during structurally and functionally complex tropical rainforest secondary succession. PMID:20715634
ERIC Educational Resources Information Center
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Arino, Masumi; Ito, Ataru; Fujiki, Shozo; Sugiyama, Seiichi; Hayashi, Mikako
2016-01-01
Dental caries is an important public health problem worldwide. This study aims to prove how preventive therapies reduce the onset of caries in adult patients, and to identify patients with high or low risk of caries by using Classification and Regression Trees based survival analysis (survival CART). A clinical data set of 732 patients aged 20 to 64 years in nine Japanese general practices was analyzed with the following parameters: age, DMFT, number of mutans streptococci (SM) and Lactobacilli (LB), secretion rate and buffer capacity of saliva, and compliance with a preventive program. Results showed the incidence of primary carious lesion was affected by SM, LB and compliance with a preventive program; secondary carious lesion was affected by DMFT, SM and LB. Survival CART identified high-risk patients for primary carious lesion according to their poor compliance with a preventive program and SM (≥106 CFU/ml) with a hazard ratio of 3.66 (p = 0.0002). In the case of secondary caries, patients with LB (≥105 CFU/ml) and DMFT (>15) were identified as high risk with a hazard ratio of 3.50 (p < 0.0001). We conclude that preventive programs can be effective in limiting the incidence of primary carious lesion. PMID:27381750
Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A
2015-11-01
The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods. PMID:26457815
Arino, Masumi; Ito, Ataru; Fujiki, Shozo; Sugiyama, Seiichi; Hayashi, Mikako
2016-01-01
Dental caries is an important public health problem worldwide. This study aims to prove how preventive therapies reduce the onset of caries in adult patients, and to identify patients with high or low risk of caries by using Classification and Regression Trees based survival analysis (survival CART). A clinical data set of 732 patients aged 20 to 64 years in nine Japanese general practices was analyzed with the following parameters: age, DMFT, number of mutans streptococci (SM) and Lactobacilli (LB), secretion rate and buffer capacity of saliva, and compliance with a preventive program. Results showed the incidence of primary carious lesion was affected by SM, LB and compliance with a preventive program; secondary carious lesion was affected by DMFT, SM and LB. Survival CART identified high-risk patients for primary carious lesion according to their poor compliance with a preventive program and SM (≥10(6) CFU/ml) with a hazard ratio of 3.66 (p = 0.0002). In the case of secondary caries, patients with LB (≥10(5) CFU/ml) and DMFT (>15) were identified as high risk with a hazard ratio of 3.50 (p < 0.0001). We conclude that preventive programs can be effective in limiting the incidence of primary carious lesion. PMID:27381750
Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree
NASA Astrophysics Data System (ADS)
Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao
2016-08-01
Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
Improving Automatic English Writing Assessment Using Regression Trees and Error-Weighting
NASA Astrophysics Data System (ADS)
Lee, Kong-Joo; Kim, Jee-Eun
The proposed automated scoring system for English writing tests provides an assessment result including a score and diagnostic feedback to test-takers without human's efforts. The system analyzes an input sentence and detects errors related to spelling, syntax and content similarity. The scoring model has adopted one of the statistical approaches, a regression tree. A scoring model in general calculates a score based on the count and the types of automatically detected errors. Accordingly, a system with higher accuracy in detecting errors raises the accuracy in scoring a test. The accuracy of the system, however, cannot be fully guaranteed for several reasons, such as parsing failure, incompleteness of knowledge bases, and ambiguous nature of natural language. In this paper, we introduce an error-weighting technique, which is similar to term-weighting widely used in information retrieval. The error-weighting technique is applied to judge reliability of the errors detected by the system. The score calculated with the technique is proven to be more accurate than the score without it.
NASA Astrophysics Data System (ADS)
Bachmair, Sophie; Stahl, Kerstin; Blauhut, Veit; Kohn, Irene
2014-05-01
impact occurrence. The applied data visualization and regression tree approach proved to be a valuable methodology for exploring the link between indicators and impacts. Nevertheless, the results are influenced by the uncertainty of identifying and quantifying drought impacts and vulnerability factors at a suitable spatial and temporal scale. This calls for more research on methodological issues of drought impact and vulnerability assessment, as well as for further developing impact inventories and exploiting the link between drought indicators and impacts.
Kitsantas, Panagiota
2009-01-01
Objective to be addressed The purpose of this study was to investigate the structural and organizational factors that contribute to the availability and increased capacity for substance abuse treatment programs in correctional settings. We used Classification and Regression Tree statistical procedures to identify how multi-level data can explain the variability in availability and capacity of substance abuse treatment programs in jails and probation/parole offices. Methods The data for this study combined the National Criminal Justice Treatment Practices survey (NCJTP) and the 2000 Census. The NCJTP survey was a nationally representative sample of correctional administrators for jails and probation/parole agencies. The sample size included 295 substance abuse treatment programs that were classified according to the intensity of their services: high, medium, and low. The independent variables included jurisdictional-level structural variables, attributes of the correctional administrators, and program and service delivery characteristics of the correctional agency. Results The two most important variables in predicting the availability of all three types of services were stronger working relationships with other organizations and the adoption of a standardized substance abuse screening tool by correctional agencies. For high and medium intensive programs, the capacity increased when an organizational learning strategy was used by administrators and the organization used a substance abuse screening tool. Implications on advancing treatment practices in correctional settings are discussed, including further work to test theories on how to better understand access to intensive treatment services. This study presents the first phase of understanding capacity-related issues regarding treatment programs offered in correctional settings. PMID:19395204
Rovlias, Aristedis; Theodoropoulos, Spyridon; Papoutsakis, Dimitrios
2015-01-01
Background: Chronic subdural hematoma (CSDH) is one of the most common clinical entities in daily neurosurgical practice which carries a most favorable prognosis. However, because of the advanced age and medical problems of patients, surgical therapy is frequently associated with various complications. This study evaluated the clinical features, radiological findings, and neurological outcome in a large series of patients with CSDH. Methods: A classification and regression tree (CART) technique was employed in the analysis of data from 986 patients who were operated at Asclepeion General Hospital of Athens from January 1986 to December 2011. Burr holes evacuation with closed system drainage has been the operative technique of first choice at our institution for 29 consecutive years. A total of 27 prognostic factors were examined to predict the outcome at 3-month postoperatively. Results: Our results indicated that neurological status on admission was the best predictor of outcome. With regard to the other data, age, brain atrophy, thickness and density of hematoma, subdural accumulation of air, and antiplatelet and anticoagulant therapy were found to correlate significantly with prognosis. The overall cross-validated predictive accuracy of CART model was 85.34%, with a cross-validated relative error of 0.326. Conclusions: Methodologically, CART technique is quite different from the more commonly used methods, with the primary benefit of illustrating the important prognostic variables as related to outcome. Since, the ideal therapy for the treatment of CSDH is still under debate, this technique may prove useful in developing new therapeutic strategies and approaches for patients with CSDH. PMID:26257985
NASA Astrophysics Data System (ADS)
Kaskhedikar, Apoorva Prakash
According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI
Cheong, Yoon Ling; Leitão, Pedro J; Lakes, Tobia
2014-07-01
The transmission of dengue disease is influenced by complex interactions among vector, host and virus. Land use such as water bodies or certain agricultural practices have been identified as likely risk factors for dengue because of the provision of suitable habitats for the vector. Many studies have focused on the land use factors of dengue vector abundance in small areas but have not yet studied the relationship between land use factors and dengue cases for large regions. This study aims to clarify if land use factors other than human settlements, e.g. different types of agricultural land use, water bodies and forest are associated with reported dengue cases from 2008 to 2010 in the state of Selangor, Malaysia. From the correlative relationship, we aim to generate a prediction risk map. We used Boosted Regression Trees (BRT) to account for nonlinearities and interactions between the factors with high predictive accuracies. Our model with a cross-validated performance score (Area Under the Receiver Operator Characteristic Curve, ROC AUC) of 0.81 showed that the most important land use factors are human settlements (model importance of 39.2%), followed by water bodies (16.1%), mixed horticulture (8.7%), open land (7.5%) and neglected grassland (6.7%). A risk map after 100 model runs with a cross-validated ROC AUC mean of 0.81 (±0.001 s.d.) is presented. Our findings may be an important asset for improving surveillance and control interventions for dengue. PMID:25113593
NASA Astrophysics Data System (ADS)
Styborski, Jeremy A.
This project was started in the interest of supplementing existing data on additives to composite solid propellants. The study on the addition of iron and aluminum nanoparticles to composite AP/HTPB propellants was conducted at the Combustion and Energy Systems Laboratory at RPI in the new strand-burner experiment setup. For this study, a large literature review was conducted on history of solid propellant combustion modeling and the empirical results of tests on binders, plasticizers, AP particle size, and additives. The study focused on the addition of nano-scale aluminum and iron in small concentrations to AP/HTPB solid propellants with an average AP particle size of 200 microns. Replacing 1% of the propellant's AP with 40-60 nm aluminum particles produced no change in combustive behavior. The addition of 1% 60-80 nm iron particles produced a significant increase in burn rate, although the increase was lesser at higher pressures. These results are summarized in Table 2. The increase in the burn rate at all pressures due to the addition of iron nanoparticles warranted further study on the effect of concentration of iron. Tests conducted at 10 atm showed that the mean regression rate varied with iron concentration, peaking at 1% and 3%. Regardless of the iron concentration, the regression rate was higher than the baseline AP/HTPB propellants. These results are summarized in Table 3.
NASA Astrophysics Data System (ADS)
Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh
2016-05-01
The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations
NASA Astrophysics Data System (ADS)
Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek
2016-04-01
The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of
Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng
2016-01-01
Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia. PMID:27002822
NASA Astrophysics Data System (ADS)
Kisi, Ozgur
2015-09-01
Pan evaporation (Ep) modeling is an important issue in reservoir management, regional water resources planning and evaluation of drinking-water supplies. The main purpose of this study is to investigate the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) in modeling Ep. The first part of the study focused on testing the ability of the LSSVM, MARS and M5Tree models in estimating the Ep data of Mersin and Antalya stations located in Mediterranean Region of Turkey by using cross-validation method. The LSSVM models outperformed the MARS and M5Tree models in estimating Ep of Mersin and Antalya stations with local input and output data. The average root mean square error (RMSE) of the M5Tree and MARS models was decreased by 24-32.1% and 10.8-18.9% using LSSVM models for the Mersin and Antalya stations, respectively. The ability of three different methods was examined in estimation of Ep using input air temperature, solar radiation, relative humidity and wind speed data from nearby station in the second part of the study (cross-station application without local input data). The results showed that the MARS models provided better accuracy than the LSSVM and M5Tree models with respect to RMSE, mean absolute error (MAE) and determination coefficient (R2) criteria. The average RMSE accuracy of the LSSVM and M5Tree was increased by 3.7% and 16.5% using MARS. In the case of without local input data, the average RMSE accuracy of the LSSVM and M5Tree was respectively increased by 11.4% and 18.4% using MARS. In the third part of the study, the ability of the applied models was examined in Ep estimation using input and output data of nearby station. The results reported that the MARS models performed better than the other models with respect to RMSE, MAE and R2 criteria. The average RMSE of the LSSVM and M5Tree was respectively decreased by 54% and 3.4% using MARS. The overall results indicated that
Hernandez, J E; Epstein, L D; Rodriguez, M H; Rodriguez, A D; Rejmankova, E; Roberts, D R
1997-03-01
We propose the use of generalized tree models (GTMs) to analyze data from entomological field studies. Generalized tree models can be used to characterize environments with different mosquito breeding capacity. A GTM simultaneously analyzes a set of predictor variables (e.g., vegetation coverage) in relation to a response variable (e.g., counts of Anopheles albimanus larvae), and how it varies with respect to a set of criterion variables (e.g., presence of predators). The algorithm produces a treelike graphical display with its root at the top and 2 branches stemming down from each node. At each node, conditions on the value of predictors partition the observations into subgroups (environments) in which the relation between response and criterion variables is most homogeneous. PMID:9152872
ERIC Educational Resources Information Center
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
A comparison of three additive tree algorithms that rely on a least-squares loss criterion.
Smith, T J
1998-11-01
The performances of three additive tree algorithms which seek to minimize a least-squares loss criterion were compared. The algorithms included the penalty-function approach of De Soete (1983), the iterative projection strategy of Hubert & Arabie (1995) and the two-stage ADDTREE algorithm, (Corter, 1982; Sattath & Tversky, 1977). Model fit, comparability of structure, processing time and metric recovery were assessed. Results indicated that the iterative projection strategy consistently located the best-fitting tree, but also displayed a wider range and larger number of local optima. PMID:9854946
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The
Brabant, Marie-Eve; Hébert, Martine; Chagnon, François
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions. PMID:23428149
Duarte, Elisa; de Sousa, Bruno; Cadarso-Suarez, Carmen; Rodrigues, Vitor; Kneib, Thomas
2014-05-01
Breast cancer risk is believed to be associated with several reproductive factors, such as early menarche and late menopause. This study is based on the registries of the first time a woman enters the screening program, and presents a spatio-temporal analysis of the variables age of menarche and age of menopause along with other reproductive and socioeconomic factors. The database was provided by the Portuguese Cancer League (LPCC), a private nonprofit organization dealing with multiple issues related to oncology of which the Breast Cancer Screening Program is one of its main activities. The registry consists of 259,652 records of women who entered the screening program for the first time between 1990 and 2007 (45-69-year age group). Structured Additive Regression (STAR) models were used to explore spatial and temporal correlations with a wide range of covariates. These models are flexible enough to deal with a variety of complex datasets, allowing us to reveal possible relationships among the variables considered in this study. The analysis shows that early menarche occurs in younger women and in municipalities located in the interior of central Portugal. Women living in inland municipalities register later ages for menopause, and those born in central Portugal after 1933 show a decreasing trend in the age of menopause. Younger ages of menarche and late menopause are observed in municipalities with a higher purchasing power index. The analysis performed in this study portrays the time evolution of the age of menarche and age of menopause and their spatial characterization, adding to the identification of factors that could be of the utmost importance in future breast cancer incidence research. PMID:24615881
NASA Astrophysics Data System (ADS)
Salonen, J. Sakari; Luoto, Miska; Alenius, Teija; Heikkilä, Maija; Seppä, Heikki; Telford, Richard J.; Birks, H. John B.
2014-03-01
We test and analyse a new calibration method, boosted regression trees (BRTs) in palaeoclimatic reconstructions based on fossil pollen assemblages. We apply BRTs to multiple Holocene and Lateglacial pollen sequences from northern Europe, and compare their performance with two commonly-used calibration methods: weighted averaging regression (WA) and the modern-analogue technique (MAT). Using these calibration methods and fossil pollen data, we present synthetic reconstructions of Holocene summer temperature, winter temperature, and water balance changes in northern Europe. Highly consistent trends are found for summer temperature, with a distinct Holocene thermal maximum at ca 8000-4000 cal. a BP, with a mean Tjja anomaly of ca +0.7 °C at 6 ka compared to 0.5 ka. We were unable to reconstruct reliably winter temperature or water balance, due to the confounding effects of summer temperature and the great between-reconstruction variability. We find BRTs to be a promising tool for quantitative reconstructions from palaeoenvironmental proxy data. BRTs show good performance in cross-validations compared with WA and MAT, can model a variety of taxon response types, find relevant predictors and incorporate interactions between predictors, and show some robustness with non-analogue fossil assemblages.
NASA Astrophysics Data System (ADS)
Rao, M.; George, L. A.
2012-12-01
Nitrogen dioxide (NO2), an atmospheric pollutant generated primarily by anthropogenic combustion processes, is typically found at higher concentrations in urban areas compared to non-urbanized environments. Elevated NO2 levels have multiple ecosystem effects at different spatial scales. At the local scale, elevated levels affect human health directly and through the formation of secondary pollutants such as ozone and aerosols; at the regional scale secondary pollutants such as nitric acid and organic nitrates have deleterious effects on non-urbanized areas; and, at the global scale, nitrogen oxide emissions significantly alter the natural biogeochemical nitrogen cycle. As cities globally become larger and larger sources of nitrogen oxide emissions, it is important to assess possible mitigation strategies to reduce the impact of emissions locally, regionally and globally. In this study, we build a national land-use regression (LUR) model to compare the impacts of deciduous and evergreen trees on urban NO2 levels in the United States. We use the EPA monitoring network values of NO2 levels for 2006, the 2006 NLCD tree canopy data for deciduous and evergreen canopies, and the US Census Bureau's TIGER shapefiles for roads, railroads, impervious area & population density as proxies for NO2 sources on-road traffic, railroad traffic, off-road and area sources respectively. Our preliminary LUR model corroborates previous LUR studies showing that the presence of trees is associated with reduced urban NO2 levels. Additionally, our model indicates that deciduous and evergreen trees reduce NO2 to different extents, and that the amount of NO2 reduced varies seasonally. The model indicates that every square kilometer of deciduous canopy within a 2km buffer is associated with a reduction in ambient NO2 levels of 0.64 ppb in summer and 0.46ppb in winter. Similarly, every square kilometer of evergreen tree canopy within a 2 km buffer is associated with a reduction in ambient NO2 by
Huang, Wenjuan; Zhou, Guoyi; Liu, Juxiu; Zhang, Deqiang; Liu, Shizhong; Chu, Guowei; Fang, Xiong
2015-01-01
Mineral elements in plants have been strongly affected by increased atmospheric carbon dioxide (CO2) concentrations and nitrogen (N) deposition due to human activities. However, such understanding is largely limited to N and phosphorus in grassland. Using open-top chambers, we examined the concentrations of potassium (K), calcium (Ca), magnesium (Mg), aluminum (Al), copper (Cu) and manganese (Mn) in the leaves and roots of the seedlings of five subtropical tree species in response to elevated CO2 (ca. 700 μmol CO2 mol(-1)) and N addition (100 kg N ha(-1) yr(-1)) from 2005 to 2009. These mineral elements in the roots responded more strongly to elevated CO2 and N addition than those in the leaves. Elevated CO2 did not consistently decrease the concentrations of plant mineral elements, with increases in K, Al, Cu and Mn in some tree species. N addition decreased K and had no influence on Cu in the five tree species. Given the shifts in plant mineral elements, Schima superba and Castanopsis hystrix were less responsive to elevated CO2 and N addition alone, respectively. Our results indicate that plant stoichiometry would be altered by increasing CO2 and N deposition, and K would likely become a limiting nutrient under increasing N deposition in subtropics. PMID:25794046
Huang, Wenjuan; Zhou, Guoyi; Liu, Juxiu; Zhang, Deqiang; Liu, Shizhong; Chu, Guowei; Fang, Xiong
2015-01-01
Mineral elements in plants have been strongly affected by increased atmospheric carbon dioxide (CO2) concentrations and nitrogen (N) deposition due to human activities. However, such understanding is largely limited to N and phosphorus in grassland. Using open-top chambers, we examined the concentrations of potassium (K), calcium (Ca), magnesium (Mg), aluminum (Al), copper (Cu) and manganese (Mn) in the leaves and roots of the seedlings of five subtropical tree species in response to elevated CO2 (ca. 700 μmol CO2 mol-1) and N addition (100 kg N ha-1 yr-1) from 2005 to 2009. These mineral elements in the roots responded more strongly to elevated CO2 and N addition than those in the leaves. Elevated CO2 did not consistently decrease the concentrations of plant mineral elements, with increases in K, Al, Cu and Mn in some tree species. N addition decreased K and had no influence on Cu in the five tree species. Given the shifts in plant mineral elements, Schima superba and Castanopsis hystrix were less responsive to elevated CO2 and N addition alone, respectively. Our results indicate that plant stoichiometry would be altered by increasing CO2 and N deposition, and K would likely become a limiting nutrient under increasing N deposition in subtropics. PMID:25794046
ERIC Educational Resources Information Center
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-10
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions. PMID:27188374
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The…
NASA Astrophysics Data System (ADS)
Akram, S.; Ghadiri, H.; Yu, B.
2013-12-01
Grass buffer strips are widely used and known as effective management practices for controlling sediment and particulate nutrients. They change the hydrology and hydraulics of the flow by increasing the infiltration rate and decreasing the flow velocity. It is essential to consider the effects of major factors on performance of grass strips in order to predict their efficiency in removing sediment. An artificial neural network model with a 'two-layer feedforward backpropagation' structure and an ensemble of 'bootstrap aggregation' regression trees were developed using data gathered from 35 different studies in order to predict the efficiency of grass strips on removing sediment in different conditions. Slope, length of strips, size distribution of the inflow sediment, antecedent soil moisture, and density and stiffness of the grass strips were the major factors considered in developing the models. The two model predictions of the efficiency of grass strips in trapping sediment compared reasonably well with independent data sets, giving low root mean square errors and high coefficients of model efficiency. The sensitivity analysis showed that particle size distribution, length of strips, and the antecedent soil moisture are the most effective factors upon the performance of grass strips in removing sediment.
NASA Astrophysics Data System (ADS)
Yang, Tiantian; Gao, Xiaogang; Sorooshian, Soroosh; Li, Xin
2016-03-01
The controlled outflows from a reservoir or dam are highly dependent on the decisions made by the reservoir operators, instead of a natural hydrological process. Difference exists between the natural upstream inflows to reservoirs and the controlled outflows from reservoirs that supply the downstream users. With the decision maker's awareness of changing climate, reservoir management requires adaptable means to incorporate more information into decision making, such as water delivery requirement, environmental constraints, dry/wet conditions, etc. In this paper, a robust reservoir outflow simulation model is presented, which incorporates one of the well-developed data-mining models (Classification and Regression Tree) to predict the complicated human-controlled reservoir outflows and extract the reservoir operation patterns. A shuffled cross-validation approach is further implemented to improve CART's predictive performance. An application study of nine major reservoirs in California is carried out. Results produced by the enhanced CART, original CART, and random forest are compared with observation. The statistical measurements show that the enhanced CART and random forest overperform the CART control run in general, and the enhanced CART algorithm gives a better predictive performance over random forest in simulating the peak flows. The results also show that the proposed model is able to consistently and reasonably predict the expert release decisions. Experiments indicate that the release operation in the Oroville Lake is significantly dominated by SWP allocation amount and reservoirs with low elevation are more sensitive to inflow amount than others.
NASA Astrophysics Data System (ADS)
Grinn-Gofroń, Agnieszka; Strzelczak, Agnieszka
2009-11-01
A study was made of the link between time of day, weather variables and the hourly content of certain fungal spores in the atmosphere of the city of Szczecin, Poland, in 2004-2007. Sampling was carried out with a Lanzoni 7-day-recording spore trap. The spores analysed belonged to the taxa Alternaria and Cladosporium. These spores were selected both for their allergenic capacity and for their high level presence in the atmosphere, particularly during summer. Spearman correlation coefficients between spore concentrations, meteorological parameters and time of day showed different indices depending on the taxon being analysed. Relative humidity (RH), air temperature, air pressure and clouds most strongly and significantly influenced the concentration of Alternaria spores. Cladosporium spores correlated less strongly and significantly than Alternaria. Multivariate regression tree analysis revealed that, at air pressures lower than 1,011 hPa the concentration of Alternaria spores was low. Under higher air pressure spore concentrations were higher, particularly when RH was lower than 36.5%. In the case of Cladosporium, under higher air pressure (>1,008 hPa), the spores analysed were more abundant, particularly after 0330 hours. In artificial neural networks, RH, air pressure and air temperature were the most important variables in the model for Alternaria spore concentration. For Cladosporium, clouds, time of day, air pressure, wind speed and dew point temperature were highly significant factors influencing spore concentration. The maximum abundance of Cladosporium spores in air fell between 1200 and 1700 hours.
Chao, Cheng-Min; Yu, Ya-Wen; Cheng, Bor-Wen; Kuo, Yao-Lung
2014-10-01
The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients' survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90% for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame. PMID:25119239
NASA Astrophysics Data System (ADS)
Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.
2016-06-01
Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).
ERIC Educational Resources Information Center
Al-Khaja, Nawal
2007-01-01
This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.
Tosteson, Tor D.; Morden, Nancy E.; Stukel, Therese A.; O'Malley, A. James
2014-01-01
The estimation of treatment effects is one of the primary goals of statistics in medicine. Estimation based on observational studies is subject to confounding. Statistical methods for controlling bias due to confounding include regression adjustment, propensity scores and inverse probability weighted estimators. These methods require that all confounders are recorded in the data. The method of instrumental variables (IVs) can eliminate bias in observational studies even in the absence of information on confounders. We propose a method for integrating IVs within the framework of Cox's proportional hazards model and demonstrate the conditions under which it recovers the causal effect of treatment. The methodology is based on the approximate orthogonality of an instrument with unobserved confounders among those at risk. We derive an estimator as the solution to an estimating equation that resembles the score equation of the partial likelihood in much the same way as the traditional IV estimator resembles the normal equations. To justify this IV estimator for a Cox model we perform simulations to evaluate its operating characteristics. Finally, we apply the estimator to an observational study of the effect of coronary catheterization on survival. PMID:25506259
Amrhein, J.F.; Stow, C.A.; Wible, C.
1999-08-01
Fish polychlorinated-biphenyl (PCB) measurements usually represent one of two different sample types: filets or homogenized whole fish. Filet measurements are more appropriate for use if the goal of analysis is estimating human PCB consumption, while whole-fish analysis may be more useful for quantifying and understanding processes of contaminant flow and bioaccumulation. While it is generally assumed that whole-fish PCB concentrations exceed filet concentrations because of the presence of fatty internal organs in whole-fish samples, the literature contains no reported comparisons of filet versus whole-fish PCB concentrations. The authors measured total PCB concentrations in filets and whole-fish samples from the same individuals in Lake Michigan coho salmon (Oncorhynchus kisutch) and rainbow trout (Oncorhynchus mykiss). The average whole-fish to filet PCB concentration ratio was 1.70 for coho salmon and 1.47 for rainbow trout, but it varied considerably among individuals, with a few fish exhibiting a higher concentration in the filet than in the whole-fish sample. Classification and regression tree (CART) models indicated that filet PCB concentration and fish length were the best predictors of whole-fish PCB concentration, whereas filet and whole-fish lipid concentrations were less important predictors. Lipid normalization of the PCB data decreased within-individual variability, was equivocal with respect to variability among individuals, and accentuated the between-species difference. Both species exhibit a pronounced 1:1 relationship between the whole-fish to filet PCB concentration ratio and the whole-fish to filet lipid concentration ratio; however, the authors point out that there is a strong spurious component to this relationship, which indicates that the relationship may be more algebraic rather than an indication of underlying mechanisms.
Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.
2012-01-01
agebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change – adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush (Artemisia spp.), percent big sagebrush (Artemisia tridentata), percent Wyoming sagebrush (Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.
NASA Astrophysics Data System (ADS)
Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.
2012-02-01
Sagebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change - adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush ( Artemisia spp.), percent big sagebrush ( Artemisia tridentata), percent Wyoming sagebrush ( Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.
Moisen, G.G.; Freeman, E.A.; Blackard, J.A.; Frescino, T.S.; Zimmermann, N.E.; Edwards, T.C., Jr.
2006-01-01
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's?? See5 and Cubist (for binary and continuous responses, respectively) are the tools of choice in many of these applications. These tools are widely used in large remote sensing applications, but are not easily interpretable, do not have ties with survey estimation methods, and use proprietary unpublished algorithms. Consequently, three alternative modelling techniques were compared for mapping presence and basal area of 13 species located in the mountain ranges of Utah, USA. The modelling techniques compared included the widely used See5/Cubist, generalized additive models (GAMs), and stochastic gradient boosting (SGB). Model performance was evaluated using independent test data sets. Evaluation criteria for mapping species presence included specificity, sensitivity, Kappa, and area under the curve (AUC). Evaluation criteria for the continuous basal area variables included correlation and relative mean squared error. For predicting species presence (setting thresholds to maximize Kappa), SGB had higher values for the majority of the species for specificity and Kappa, while GAMs had higher values for the majority of the species for sensitivity. In evaluating resultant AUC values, GAM and/or SGB models had significantly better results than the See5 models where significant differences could be detected between models. For nine out of 13 species, basal area prediction results for all modelling techniques were poor (correlations less than 0.5 and relative mean squared errors greater than 0.8), but SGB provided the most stable predictions in these instances. SGB and Cubist performed equally well for modelling basal area for three species with moderate prediction success
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
NASA Astrophysics Data System (ADS)
Ito, Toshihide; Fuse, Norikazu; Ohki, Yoshimichi
Photoluminescence (PL) spectra induced by irradiation of ultraviolet photons are compared among low-density polyethylene (LDPE), crosslinked polyethylene (XLPE), and polypropylene (PP). Three PL bands appear around 4.2, 3.6, and 3.1 eV in LDPE and XLPE, while similar three PL bands are observed at similar energies in PP. The PL spectra and their decay profiles are independent of the presence of additives and are also independent of whether the samples were crosslinked or not. These results indicate that neither the additives nor the crosslinking has any significant effects on the respective three PLs in PE and PP. When the sample was pre-irradiated by the ultraviolet photons under different atmospheres (air, O2, and vacuum), all the PL intensities decrease with the progress of the pre-irradiation regardless of whether the sample is PE or PP. Therefore, all the PLs are considered to result from impurities. In all the pre-irradiated samples, a new PL band appears at 2.9 eV, of which intensity is stronger when the oxygen partial pressure during the pre-irradiation was lower. This PL is considered to be due to photo-induced conjugated double bonds. It has also been confirmed that water-tree degradation in LDPE or in XLPE does not contribute to PL.
Frederick, Logan; VanDerslice, James; Taddie, Marissa; Malecki, Kristen; Gregg, Josh; Faust, Nicholas; Johnson, William P
2016-03-15
Arsenic contamination in groundwater is a public health and environmental concern in the United States (U.S.) particularly where monitoring is not required under the Safe Water Drinking Act. Previous studies suggest the influence of regional mechanisms for arsenic mobilization into groundwater; however, no study has examined how influencing parameters change at a continental scale spanning multiple regions. We herein examine covariates for groundwater in the western, central and eastern U.S. regions representing mechanisms associated with arsenic concentrations exceeding the U.S. Environmental Protection Agency maximum contamination level (MCL) of 10 parts per billion (ppb). Statistically significant covariates were identified via classification and regression tree (CART) analysis, and included hydrometeorological and groundwater chemical parameters. The CART analyses were performed at two scales: national and regional; for which three physiographic regions located in the western (Payette Section and the Snake River Plain), central (Osage Plains of the Central Lowlands), and eastern (Embayed Section of the Coastal Plains) U.S. were examined. Validity of each of the three regional CART models was indicated by values >85% for the area under the receiver-operating characteristic curve. Aridity (precipitation minus potential evapotranspiration) was identified as the primary covariate associated with elevated arsenic at the national scale. At the regional scale, aridity and pH were the major covariates in the arid to semi-arid (western) region; whereas dissolved iron (taken to represent chemically reducing conditions) and pH were major covariates in the temperate (eastern) region, although additional important covariates emerged, including elevated phosphate. Analysis in the central U.S. region indicated that elevated arsenic concentrations were driven by a mixture of those observed in the western and eastern regions. PMID:26803265
ERIC Educational Resources Information Center
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
NASA Astrophysics Data System (ADS)
Wilkes, Martin; Maddock, Ian; Link, Oscar; Habit, Evelyn
2015-04-01
Despite the numerous advantages over traditional methods ascribed to community-level analyses, including the ability to rapidly predict the abundance of multiple species and the integration of complex biological interactions, very few applications to the mesoscale of river habitats can be found in the extant literature. Most previous work has been based on single species, species-by-species modelling or reduced dimensionality approaches. Community-level analyses have especially good properties for improving the understanding of habitat associations in large rivers where biological interactions are most intense and applications of the mesohabitat concept relatively sparse. This chapter seeks to identify quantitative relationships between key environmental variables and community structure using a particular type of community-level technique known as multivariate regression trees in order to test the ecological basis for applications of the mesohabitat concept in large rivers. Mesohabitats were mapped and their environmental characteristics recorded along a reach of the San Pedro River, Chile, which is inhabited by a highly endemic fish community. A representative portion of the mesohabitats were selected for fish sampling and multivariate regression trees produced to predict community structure based on combinations of environmental variables. The analyses showed that fish assemblages were distinct at the mesoscale, with flow depth, bank materials, cover and woody debris the key predictor variables. The results support the application of the mesohabitat concept in this geographical context and establish a basis for predicting the community structure of any mesohabitat along the reach.
NASA Astrophysics Data System (ADS)
Lopes de Gerenyu, Valentin; Kurganova, Irina; Kapitsa, Ekaterina; Shorokhova, Ekaterina
2016-04-01
In forest ecosystems, the processes of decomposition of coarse woody debris (CWD) can contribute significantly to the emission component of carbon (C) cycle and thus accelerate the greenhouse effect and global climate change. A better understanding of decomposition of CWD is required to refine estimates of the C balance in forest ecosystems and improve biogeochemical models. These estimates will in turn contribute to assessing the role of forests in maintaining their long-term productivity and other ecosystems services. We examined the decomposition rate of coniferous bark with added nitrogen (N) and phosphorus (P) fertilizers in experiment under field conditions. The experiment was carried out in 2015 during 17 weeks in Moscow region (54o50'N, 37o36'E) under continental-temperate climatic conditions. The conifer tree bark mixture (ca. 70% of Norway spruce and 30% of Scots pine) was combined with soil and placed in piles of soil-bark substrate (SBS) with height of ca. 60 cm and surface area of ca. 3 m2. The dry mass ratio of bark to soil was 10:1. The experimental design included following treatments: (1) soil (Luvisols Haplic) without bark, (S), (2) pure SBS, (3) SBS with N addition in the amount of 1% of total dry bark mass (SBS-N), and (4) SBS with N and P addition in the amount of 1% of total dry bark mass for each element (SBS-NP). The decomposition rate expressed as CO2 emission flux, g C/m2/h was measured using closed chamber method 1-3 times per week from July to early November using LiCor 6400 (Nebraska, USA). During the experiment, we also controlled soil temperature at depths of 5, 20, 40, and 60 cm below surface of SBS using thermochrons iButton (DS1921G, USA). The pattern of CO2 emission rate from SBS depended strongly on fertilizing. The highest decomposition rates (DecR) of 2.8-5.6 g C/m2/h were observed in SBS-NP treatment during the first 6 weeks of experiment. The decay process of bark was less active in the treatment with only N addition. In this
Oubida, Regis W.; Gantulga, Dashzeveg; Zhang, Man; Zhou, Lecong; Bawa, Rajesh; Holliday, Jason A.
2015-01-01
Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13–0.32) and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref) explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP), and mean annual precipitation (MAP). These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref) had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures) had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP) had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP) performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar. PMID:25870603
Kasprzyk, Idalia; Grinn-Gofroń, Agnieszka; Strzelczak, Agnieszka; Wolski, Tomasz
2011-02-01
Ganoderma spores are one of the most airspora abundant taxa in many regions of the world, and are considered to be important allergens. The aerobiology of Ganoderma basidiospores in two cities in Poland was examined using the volumetric method, (Burkard and Lanzonii Spore Traps), from selected days in 2004, 2005 and 2006. Spores of Ganoderma were present in the atmosphere from June to November, with peak concentrations generally occurring from late July to mid-October. ANN (artificial neural network) and MRT (multivariate regression trees), models indicated that atmospheric phenomenon, hour and relative humidity were the most important variables influencing spore content. The remaining variables (air temperature, dew point, air pressure, wind speed and wind direction), also contributed to the high network performance, (ratio above 1), but their impact was less distinct. Those results are consistent with the Spearman's rank correlation analysis. PMID:21183203
Petersen, M B; Tolver, A; Husted, L; Tølbøll, T H; Pihl, T H
2016-07-01
The objective of this study was to investigate the prognostic value of single and repeated measurements of blood l-lactate (Lac) and ionised calcium (iCa) concentrations, packed cell volume (PCV) and plasma total protein (TP) concentration in horses with acute colitis. A total of 66 adult horses admitted with acute colitis (<24 h) to a referral hospital in the 2002-2011 period were included. The prognostic value of Lac, iCa, PCV and TP recorded at admission and 6 h post admission was analysed with univariate analysis, logistic regression, classification and regression trees, as well as random forest analysis. Ponies and Icelandic horses made up 59% of the population, whilst the remaining 41% were horses. Blood lactate concentration at admission was the only individual parameter significantly associated with probability of survival to discharge (P < 0.001). In a training sample, a Lac cut-off value of 7 mmol/L had a sensitivity of 0.66 and a specificity of 0.92 in predicting survival. In independent test data, the sensitivity was 0.69 and the specificity was 0.76. At the observed survival rate (38%), the optimal decision tree identified horses as non-survivors when the Lac at admission was ≥4.3 mmol/L and the Lac 6 h post admission stayed at >2 mmol/L (sensitivity, 0.72; specificity, 0.8). In conclusion, blood lactate concentration measured at admission and repeated 6 h later aided the prognostic evaluation of horses with acute colitis in this population with a very high mortality rate. This should allow clinicians to give a more reliable prognosis for the horse. PMID:27240909
Robertson, D.M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
2013-01-01
Background To analyze the impact of multimorbidity (MM) on health care costs taking into account data heterogeneity. Methods Data come from a multicenter prospective cohort study of 1,050 randomly selected primary care patients aged 65 to 85 years suffering from MM in Germany. MM was defined as co-occurrence of ≥3 conditions from a list of 29 chronic diseases. A conditional inference tree (CTREE) algorithm was used to detect the underlying structure and most influential variables on costs of inpatient care, outpatient care, medications as well as formal and informal nursing care. Results Irrespective of the number and combination of co-morbidities, a limited number of factors influential on costs were detected. Parkinson’s disease (PD) and cardiac insufficiency (CI) were the most influential variables for total costs. Compared to patients not suffering from any of the two conditions, PD increases predicted mean total costs 3.5-fold to approximately € 11,000 per 6 months, and CI two-fold to approximately € 6,100. The high total costs of PD are largely due to costs of nursing care. Costs of inpatient care were significantly influenced by cerebral ischemia/chronic stroke, whereas medication costs were associated with COPD, insomnia, PD and Diabetes. Except for costs of nursing care, socio-demographic variables did not significantly influence costs. Conclusions Irrespective of any combination and number of co-occurring diseases, PD and CI appear to be most influential on total health care costs in elderly patients with MM, and only a limited number of factors significantly influenced cost. Trial registration Current Controlled Trials ISRCTN89818205 PMID:23768192
Strobl, Carolin; Malley, James; Tutz, Gerhard
2010-01-01
Recursive partitioning methods have become popular and widely used tools for non-parametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing. PMID:19968396
NASA Astrophysics Data System (ADS)
Cárate Tandalla, Daisy; Leuschner, Christoph; Homeier, Jürgen
2015-12-01
Nitrogen deposition to tropical forests is predicted to increase in future in many regions due to agricultural intensification. We conducted a seedling transplantation experiment in a tropical premontane forest in Ecuador with a locally abundant late-successional tree species (Pouteria torta, Sapotaceae) aimed at detecting species-specific responses to moderate N and P addition and to understand how increasing nutrient availability will affect regeneration. From locally collected seeds, 320 seedlings were produced and transplanted to the plots of the Ecuadorian Nutrient Manipulation Experiment (NUMEX) with three treatments (moderate N addition: 50 kg N ha-1 yr-1, moderate P addition: 10 kg P ha-1 yr-1 and combined N and P addition) and a control (80 plants per treatment). After 12 months, mortality, relative growth rate, leaf nutrient content and leaf herbivory rate were measured. N and NP addition significantly increased the mortality rate (70 % vs. 54 % in the control). However, N and P addition also increased the diameter growth rate of the surviving seedlings. N and P addition did not alter foliar nutrient concentrations and leaf N:P ratio, but N addition decreased the leaf C:N ratio and increased SLA. P addition (but not N addition) resulted in higher leaf area loss to herbivore consumption and also shifted carbon allocation to root growth. This fertilization experiment with a common rainforest tree species conducted in old-growth forest shows that already moderate doses of added N and P are affecting seedling performance which most likely will have consequences for the competitive strength in the understory and the recruitment success of P. torta. Simultaneous increases in growth, herbivory and mortality rates make it difficult to assess the species' overall performance and predict how a future increase in nutrient deposition will alter the abundance of this species in the Andean tropical montane forests.
2013-01-01
Background Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann–Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression. Results The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence… Conclusions The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence. PMID:23506640
NASA Astrophysics Data System (ADS)
Sayegh, Arwa; Tate, James E.; Ropkins, Karl
2016-02-01
Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper
NASA Astrophysics Data System (ADS)
Hu, Gensheng; Li, Xiaoyi; Liang, Dong
2015-01-01
The existence of clouds affects the interpretation and utilization of remote sensing images. A thin cloud removal algorithm for cloud-contaminated remote sensing images is proposed by combining a multidirectional dual tree complex wavelet transform (M-DTCWT) with domain adaptation transfer least square support vector regression (T-LSSVR). First, M-DTCWT is constructed by using the hourglass filter bank in combination with DTCWT, which is used to decompose remote sensing images into multiscale and multidirectional subbands. Then the low-frequency subband coefficients of the cloud-free regions on target images and source domain images are used as samples for a T-LSSVR model, which can be used to predict those of the cloud regions on cloud-contaminated images. Finally, by enhancing the high-frequency coefficients and replacing the low-frequency coefficients, the thin clouds on cloud-contaminated images are removed. Experimental results show that M-DTCWT contributes to keeping the details of the ground objects of cloud-contaminated images, and the T-LSSVR model can effectively learn the contour information from multisource and multitemporal images, therefore, the proposed method achieves a good effect of thin cloud removal.
Yu, Huibin; Song, Yonghui; Liu, Ruixia; Pan, Hongwei; Xiang, Liancheng; Qian, Feng
2014-10-01
The stabilization of latent tracers of dissolved organic matter (DOM) of wastewater was analyzed by three-dimensional excitation-emission matrix (EEM) fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis (CART) in wastewater treatment performance. DOM of water samples collected from primary sedimentation, anaerobic, anoxic, oxic and secondary sedimentation tanks in a large-scale wastewater treatment plant contained four fluorescence components: tryptophan-like (C1), tyrosine-like (C2), microbial humic-like (C3) and fulvic-like (C4) materials extracted by self-organizing map. These components showed good positive linear correlations with dissolved organic carbon of DOM. C1 and C2 were representative components in the wastewater, and they were removed to a higher extent than those of C3 and C4 in the treatment process. C2 was a latent parameter determined by CART to differentiate water samples of oxic and secondary sedimentation tanks from the successive treatment units, indirectly proving that most of tyrosine-like material was degraded by anaerobic microorganisms. C1 was an accurate parameter to comprehensively separate the samples of the five treatment units from each other, indirectly indicating that tryptophan-like material was decomposed by anaerobic and aerobic bacteria. EEM fluorescence spectroscopy in combination with self-organizing map and CART analysis can be a nondestructive effective method for characterizing structural component of DOM fractions and monitoring organic matter removal in wastewater treatment process. PMID:25065793
Addition of wsp sequences to the Wolbachia phylogenetic tree and stability of the classification.
Pintureau, B; Chaudier, S; Lassablière, F; Charles, H; Grenier, S
2000-10-01
Wolbachia are symbiotic bacteria altering reproductive characters of numerous arthropods. Their most recent phylogeny and classification are based on sequences of the wsp gene. We sequenced wsp gene from six Wolbachia strains infecting six Trichogramma species that live as egg parasitoids on many insects. This allows us to test the effect of the addition of sequences on the Wolbachia phylogeny and to check the classification of Wolbachia infecting Trichogramma. The six Wolbachia studied are classified in the B supergroup. They confirm the monophyletic structure of the B Wolbachia in Trichogramma but introduce small differences in the Wolbachia classification. Modifications include the definition of a new group, Sem, for Wolbachia of T. semblidis and the merging of the two closely related groups, Sib and Kay. Specific primers were determined and tested for the Sem group. PMID:11040288
Mulder, V.L.; Plotze, Michael; de Bruin, Sytze; Schaepman, Michael E.; Mavris, C.; Kokaly, Raymond F.; Egli, Markus
2013-01-01
This paper presents a methodology for assessing mineral abundances of mixtures having more than two constituents using absorption features in the 2.1-2.4 μm wavelength region. In the first step, the absorption behaviour of mineral mixtures is parameterised by exponential Gaussian optimisation. Next, mineral abundances are predicted by regression tree analysis using these parameters as inputs. The approach is demonstrated on a range of prepared samples with known abundances of kaolinite, dioctahedral mica, smectite, calcite and quartz and on a set of field samples from Morocco. The latter contained varying quantities of other minerals, some of which did not have diagnostic absorption features in the 2.1-2.4 μm region. Cross validation showed that the prepared samples of kaolinite, dioctahedral mica, smectite and calcite were predicted with a root mean square error (RMSE) less than 9 wt.%. For the field samples, the RMSE was less than 8 wt.% for calcite, dioctahedral mica and kaolinite abundances. Smectite could not be well predicted, which was attributed to spectral variation of the cations within the dioctahedral layered smectites. Substitution of part of the quartz by chlorite at the prediction phase hardly affected the accuracy of the predicted mineral content; this suggests that the method is robust in handling the omission of minerals during the training phase. The degree of expression of absorption components was different between the field sample and the laboratory mixtures. This demonstrates that the method should be calibrated and trained on local samples. Our method allows the simultaneous quantification of more than two minerals within a complex mixture and thereby enhances the perspectives of spectral analysis for mineral abundances.
NASA Technical Reports Server (NTRS)
Smalheer, C. V.
1973-01-01
The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.
NASA Astrophysics Data System (ADS)
Zhang, W.; Zhu, X.; Luo, Y.; Rafique, R.; Chen, H.; Huang, J.; Mo, J.
2014-01-01
Leguminous tree plantations at phosphorus (P) limited sites may result in higher rates of nitrous oxide (N2O) emissions, however, the effects of nitrogen (N) and P applications on soil N2O emissions from plantations with N-fixing vs. non-N-fixing tree species has rarely been studied in the field. We conducted an experimental manipulation of N and P additions in two tropical plantations with Acacia auriculiformis (AA) and Eucalyptus urophylla (EU) tree species in South China. The objective was to determine the effects of N- or P-addition alone, as well as NP application together on soil N2O emissions from tropical plantations with N-fixing vs. non-N-fixing tree species. We found that the average N2O emission from control was greater in AA (2.26 ± 0.06 kg N2O-N ha-1 yr-1) than in EU plantation (1.87 ± 0.05 kg N2O-N ha-1 yr-1). For the AA plantation, N-addition stimulated the N2O emission from soil while P-addition did not. Applications of N with P together significantly decreased N2O emission compared to N-addition alone, especially in high level treatment plots (decreased by 18%). In the EU plantation, N2O emissions significantly decreased in P-addition plots compared with the controls, however, N- and NP-additions did not. The differing response of N2O emissions to N- or P-addition was attributed to the higher initial soil N status in the AA than that of the EU plantation, due to symbiotic N fixation in the former. Our results suggest that atmospheric N deposition potentially stimulates N2O emissions from leguminous tree plantations in the tropics, whereas P fertilization has the potential to mitigate N deposition-induced N2O emissions from such plantations.
Templeton, Alan R.; Maxwell, Taylor; Posada, David; Stengård, Jari H.; Boerwinkle, Eric; Sing, Charles F.
2005-01-01
We use evolutionary trees of haplotypes to study phenotypic associations by exhaustively examining all possible biallelic partitions of the tree, a technique we call tree scanning. If the first scan detects significant associations, additional rounds of tree scanning are used to partition the tree into three or more allelic classes. Two worked examples are presented. The first is a reanalysis of associations between haplotypes at the Alcohol Dehydrogenase locus in Drosophila melanogaster that was previously analyzed using a nested clade analysis, a more complicated technique for using haplotype trees to detect phenotypic associations. Tree scanning and the nested clade analysis yield the same inferences when permutation testing is used with both approaches. The second example is an analysis of associations between variation in various lipid traits and genetic variation at the Apolipoprotein E (APOE) gene in three human populations. Tree scanning successfully identified phenotypic associations expected from previous analyses. Tree scanning for the most part detected more associations and provided a better biological interpretative framework than single SNP analyses. We also show how prior information can be incorporated into the tree scan by starting with the traditional three electrophoretic alleles at APOE. Tree scanning detected genetically determined phenotypic heterogeneity within all three electrophoretic allelic classes. Overall, tree scanning is a simple, powerful, and flexible method for using haplotype trees to detect phenotype/genotype associations at candidate loci. PMID:15371364
Incremental hierarchical discriminant regression.
Weng, Juyang; Hwang, Wey-Shiuan
2007-03-01
This paper presents incremental hierarchical discriminant regression (IHDR) which incrementally builds a decision tree or regression tree for very high-dimensional regression or decision spaces by an online, real-time learning system. Biologically motivated, it is an approximate computational model for automatic development of associative cortex, with both bottom-up sensory inputs and top-down motor projections. At each internal node of the IHDR tree, information in the output space is used to automatically derive the local subspace spanned by the most discriminating features. Embedded in the tree is a hierarchical probability distribution model used to prune very unlikely cases during the search. The number of parameters in the coarse-to-fine approximation is dynamic and data-driven, enabling the IHDR tree to automatically fit data with unknown distribution shapes (thus, it is difficult to select the number of parameters up front). The IHDR tree dynamically assigns long-term memory to avoid the loss-of-memory problem typical with a global-fitting learning algorithm for neural networks. A major challenge for an incrementally built tree is that the number of samples varies arbitrarily during the construction process. An incrementally updated probability model, called sample-size-dependent negative-log-likelihood (SDNLL) metric is used to deal with large sample-size cases, small sample-size cases, and unbalanced sample-size cases, measured among different internal nodes of the IHDR tree. We report experimental results for four types of data: synthetic data to visualize the behavior of the algorithms, large face image data, continuous video stream from robot navigation, and publicly available data sets that use human defined features. PMID:17385628
NASA Astrophysics Data System (ADS)
Zhang, W.; Zhu, X.; Luo, Y.; Rafique, R.; Chen, H.; Huang, J.; Mo, J.
2014-09-01
Leguminous tree plantations at phosphorus (P) limited sites may result in excess nitrogen (N) and higher rates of nitrous oxide (N2O) emissions. However, the effects of N and P applications on soil N2O emissions from plantations with N-fixing vs. non-N-fixing tree species have rarely been studied in the field. We conducted an experimental manipulation of N and/or P additions in two plantations with Acacia auriculiformis (AA, N-fixing) and Eucalyptus urophylla (EU, non-N-fixing) in South China. The objective was to determine the effects of N or P addition alone, as well as NP application together on soil N2O emissions from these tropical plantations. We found that the average N2O emission from control was greater in the AA (2.3 ± 0.1 kg N2O-N ha-1 yr-1) than in EU plantation (1.9 ± 0.1 kg N2O-N ha-1 yr-1). For the AA plantation, N addition stimulated N2O emission from the soil while P addition did not. Applications of N with P together significantly decreased N2O emission compared to N addition alone, especially in the high-level treatments (decreased by 18%). In the EU plantation, N2O emissions significantly decreased in P-addition plots compared with the controls; however, N and NP additions did not. The different response of N2O emission to N or P addition was attributed to the higher initial soil N status in the AA than that of EU plantation, due to symbiotic N fixation in the former. Our result suggests that atmospheric N deposition potentially stimulates N2O emissions from leguminous tree plantations in the tropics, whereas P fertilization has the potential to mitigate N-deposition-induced N2O emissions from such plantations.
NASA Astrophysics Data System (ADS)
Beckerman, Bernardo S.; Jerrett, Michael; Martin, Randall V.; van Donkelaar, Aaron; Ross, Zev; Burnett, Richard T.
2013-10-01
Land use regression (LUR) models are widely employed in health studies to characterize chronic exposure to air pollution. The LUR is essentially an interpolation technique that employs the pollutant of interest as the dependent variable with proximate land use, traffic, and physical environmental variables used as independent predictors. Two major limitations with this method have not been addressed: (1) variable selection in the model building process, and (2) dealing with unbalanced repeated measures. In this paper, we address these issues with a modeling framework that implements the deletion/substitution/addition (DSA) machine learning algorithm that uses a generalized linear model to average over unbalanced temporal observations. Models were derived for fine particulate matter with aerodynamic diameter of 2.5 microns or less (PM2.5) and nitrogen dioxide (NO2) using monthly observations. We used 4119 observations at 108 sites and 15,301 observations at 138 sites for PM2.5 and NO2, respectively. We derived models with good predictive capacity (cross-validated-R2 values were 0.65 and 0.71 for PM2.5 and NO2, respectively). By addressing these two shortcomings in current approaches to LUR modeling, we have developed a framework that minimizes arbitrary decisions during the model selection process. We have also demonstrated how to integrate temporally unbalanced data in a theoretically sound manner. These developments could have widespread applicability for future LUR modeling efforts.
Boyte, Stephen P.; Wylie, Bruce K.; Major, Donald J.; Brown, Jesslyn F.
2015-01-01
Cheatgrass exhibits spatial and temporal phenological variability across the Great Basin as described by ecological models formed using remote sensing and other spatial data-sets. We developed a rule-based, piecewise regression-tree model trained on 99 points that used three data-sets – latitude, elevation, and start of season time based on remote sensing input data – to estimate cheatgrass beginning of spring growth (BOSG) in the northern Great Basin. The model was then applied to map the location and timing of cheatgrass spring growth for the entire area. The model was strong (R2 = 0.85) and predicted an average cheatgrass BOSG across the study area of 29 March–4 April. Of early cheatgrass BOSG areas, 65% occurred at elevations below 1452 m. The highest proportion of cheatgrass BOSG occurred between mid-April and late May. Predicted cheatgrass BOSG in this study matched well with previous Great Basin cheatgrass green-up studies.
Momen, Bahram; Behling, Shawna J.; Lawrence, Greg B.; Sullivan, Joseph H.
2015-01-01
Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the forest floor
Momen, Bahram; Behling, Shawna J; Lawrence, Greg B; Sullivan, Joseph H
2015-01-01
Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the forest floor
Momen, Bahram; Behling, Shawna J; Lawrence, Gregory B.; Sullivan, Joseph H
2015-01-01
Decline of sugar maple in North American forests has been attributed to changes in soil calcium (Ca) and nitrogen (N) by acidic precipitation. Although N is an essential and usually a limiting factor in forests, atmospheric N deposition may cause N-saturation leading to loss of soil Ca. Such changes can affect carbon gain and growth of sugar maple trees and seedlings. We applied a 22 factorial arrangement of N and dolomitic limestone containing Ca and Magnesium (Mg) to 12 forest plots in the Catskill Mountain region of NY, USA. To quantify the short-term effects, we measured photosynthetic-light responses of sugar maple mature trees and seedlings two or three times during two summers. We estimated maximum net photosynthesis (An-max) and its related light intensity (PAR at An-max), apparent quantum efficiency (Aqe), and light compensation point (LCP). To quantify the long-term effects, we measured basal area of living mature trees before and 4 and 8 years after treatment applications. Soil and foliar chemistry variables were also measured. Dolomitic limestone increased Ca, Mg, and pH in the soil Oe horizon. Mg was increased in the B horizon when comparing the plots receiving N with those receiving CaMg. In mature trees, foliar Ca and Mg concentrations were higher in the CaMg and N+CaMg plots than in the reference or N plots; foliar Ca concentration was higher in the N+CaMg plots compared with the CaMg plots, foliar Mg was higher in the CaMg plots than the N+CaMg plots; An-max was maximized due to N+CaMg treatment; Aqe decreased by N addition; and PAR at An-max increased by N or CaMg treatments alone, but the increase was maximized by their combination. No treatment effect was detected on basal areas of living mature trees four or eight years after treatment applications. In seedlings, An-max was increased by N+CaMg addition. The reference plots had an open herbaceous layer, but the plots receiving N had a dense monoculture of common woodfern in the
Villagra, Mariana; Campanello, Paula I; Bucci, Sandra J; Goldstein, Guillermo
2013-12-01
Leaves can be both a hydraulic bottleneck and a safety valve against hydraulic catastrophic dysfunctions, and thus changes in traits related to water movement in leaves and associated costs may be critical for the success of plant growth. A 4-year fertilization experiment with nitrogen (N) and phosphorus (P) addition was done in a semideciduous Atlantic forest in northeastern Argentina. Saplings of five dominant canopy species were grown in similar gaps inside the forests (five control and five N + P addition plots). Leaf lifespan (LL), leaf mass per unit area (LMA), leaf and stem vulnerability to cavitation, leaf hydraulic conductance (K(leaf_area) and K(leaf_mass)) and leaf turgor loss point (TLP) were measured in the five species and in both treatments. Leaf lifespan tended to decrease with the addition of fertilizers, and LMA was significantly higher in plants with nutrient addition compared with individuals in control plots. The vulnerability to cavitation of leaves (P50(leaf)) either increased or decreased with the nutrient treatment depending on the species, but the average P50(leaf) did not change with nutrient addition. The P50(leaf) decreased linearly with increasing LMA and LL across species and treatments. These trade-offs have an important functional significance because more expensive (higher LMA) and less vulnerable leaves (lower P50(leaf)) are retained for a longer period of time. Osmotic potentials at TLP and at full turgor became more negative with decreasing P50(leaf) regardless of nutrient treatment. The K(leaf) on a mass basis was negatively correlated with LMA and LL, indicating that there is a carbon cost associated with increased water transport that is compensated by a longer LL. The vulnerability to cavitation of stems and leaves were similar, particularly in fertilized plants. Leaves in the species studied may not function as safety valves at low water potentials to protect the hydraulic pathway from water stress-induced cavitation
Yan, Junhua; Zhang, Deqiang; Liu, Juxiu; Zhou, Guoyi
2014-07-01
Carbon dioxide (CO2 ) enhancement (eCO2 ) and N addition (aN) have been shown to increase net primary production (NPP) and to affect water-use efficiency (WUE) for many temperate ecosystems, but few studies have been made on subtropical tree species. This study compared the responses of NPP and WUE from a mesocosm composing five subtropical tree species to eCO2 (700 ppm), aN (10 g N m(-2) yr(-1) ) and eCO2 × aN using open-top chambers. Our results showed that mean annual ecosystem NPP did not changed significantly under eCO2 , increased by 56% under aN and 64% under eCO2 × aN. Ecosystem WUE increased by 14%, 55%, and 61% under eCO2 , aN and eCO2 × aN, respectively. We found that the observed responses of ecosystem WUE were largely driven by the responses of ecosystem NPP. Statistical analysis showed that there was no significant interactions between eCO2 and aN on ecosystem NPP (P = 0.731) or WUE (P = 0.442). Our results showed that increasing N deposition was likely to have much stronger effects on ecosystem NPP and WUE than increasing CO2 concentration for the subtropical forests. However, different tree species responded quite differently. aN significantly increased annual NPP of the fast-growing species (Schima superba). Nitrogen-fixing species (Ormosia pinnata) grew significantly faster only under eCO2 × aN. eCO2 had no effects on annual NPP of those two species but significantly increased annual NPP of other two species (Castanopsis hystrix and Acmena acuminatissima). Differential responses of the NPP among different tree species to eCO2 and aN will likely have significant implications on the species composition of subtropical forests under future global change. PMID:24339232
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Pouliakis, Abraham; Karakitsou, Efrossyni; Chrelias, Charalampos; Pappas, Asimakis; Panayiotides, Ioannis; Valasoulis, George; Kyrgiou, Maria; Paraskevaidis, Evangelos; Karakitsos, Petros
2015-01-01
Objective. Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs) for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN) in cases with ASCUS+ in cytology. Study Design. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Results. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. Conclusions. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study. PMID:26339651
Huang, Dong; Cabral, Ricardo; De la Torre, Fernando
2016-02-01
Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740
Gerber, Samuel; Rübel, Oliver; Bremer, Peer-Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-01
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduce a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse-Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this paper introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to over-fitting. The Morse-Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse-Smale regression. Supplementary materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse-Smale complex approximation and additional tables for the climate-simulation study. PMID:23687424
Gerber, Samuel; Rubel, Oliver; Bremer, Peer -Timo; Pascucci, Valerio; Whitaker, Ross T.
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
NASA Astrophysics Data System (ADS)
Loschetter, Annick; Rohmer, Jérémy
2016-04-01
Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered
Schmid, Matthias; Wickler, Florian; Maloney, Kelly O.; Mitchell, Richard; Fenske, Nora; Mayr, Andreas
2013-01-01
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures. PMID:23626706
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed
The Regression Trunk Approach to Discover Treatment Covariate Interaction
ERIC Educational Resources Information Center
Dusseldorp, Elise; Meulman, Jacqueline J.
2004-01-01
The regression trunk approach (RTA) is an integration of regression trees and multiple linear regression analysis. In this paper RTA is used to discover treatment covariate interactions, in the regression of one continuous variable on a treatment variable with "multiple" covariates. The performance of RTA is compared to the classical method of…
Data Mining within a Regression Framework
NASA Astrophysics Data System (ADS)
Berk, Richard A.
Regression analysis can imply a far wider range of statistical procedures than often appreciated. In this chapter, a number of common Data Mining procedures are discussed within a regression framework. These include non-parametric smoothers, classification and regression trees, bagging, and random forests. In each case, the goal is to characterize one or more of the distributional features of a response conditional on a set of predictors.
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Thompson, Robert S.; Anderson, Katherine H.; Bartlein, Patrick J.; Smith, Sharon A.
2000-01-01
This volume explores the continental-scale relations between climate and the geographic ranges of woody plant species in North America. A 25-km equal-area grid of modern climatic and bioclimatic parameters for North America was constructed from instrumental weather records. The geographic distributions of selected tree and shrub species were digitized, and the presence or absence of each species was determined for each cell on the 25-km grid, thus providing a basis for comparing climatic data and species' distribution.
ERIC Educational Resources Information Center
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
ERIC Educational Resources Information Center
Dochinger, Leon S.
To help urban, suburban, and rural tree owners know about air pollution's effects on trees and their tolerance and intolerance to pollutants, the USDA Forest Service has prepared this booklet. It answers the following questions about atmospheric pollution: Where does it come from? What can it do to trees? and What can we do about it? In addition,…
ERIC Educational Resources Information Center
Nature Study, 1998
1998-01-01
Presents a Project Learning Tree (PLT) activity that has students investigate and compare the lifecycle of a tree to other living things and the tree's role in the ecosystem. Includes background material as well as step-by-step instructions, variation and enrichment ideas, assessment opportunities, and student worksheets. (SJR)
Lee, Myung Hee; Liu, Yufeng
2013-12-01
The continuum regression technique provides an appealing regression framework connecting ordinary least squares, partial least squares and principal component regression in one family. It offers some insight on the underlying regression model for a given application. Moreover, it helps to provide deep understanding of various regression techniques. Despite the useful framework, however, the current development on continuum regression is only for linear regression. In many applications, nonlinear regression is necessary. The extension of continuum regression from linear models to nonlinear models using kernel learning is considered. The proposed kernel continuum regression technique is quite general and can handle very flexible regression model estimation. An efficient algorithm is developed for fast implementation. Numerical examples have demonstrated the usefulness of the proposed technique. PMID:24058224
Predictive Classification Trees
NASA Astrophysics Data System (ADS)
Dlugosz, Stephan; Müller-Funk, Ulrich
CART (Breiman et al., Classification and Regression Trees, Chapman and Hall, New York, 1984) and (exhaustive) CHAID (Kass, Appl Stat 29:119-127, 1980) figure prominently among the procedures actually used in data based management, etc. CART is a well-established procedure that produces binary trees. CHAID, in contrast, admits multiple splittings, a feature that allows to exploit the splitting variable more extensively. On the other hand, that procedure depends on premises that are questionable in practical applications. This can be put down to the fact that CHAID relies on simultaneous Chi-Square- resp. F-tests. The null-distribution of the second test statistic, for instance, relies on the normality assumption that is not plausible in a data mining context. Moreover, none of these procedures - as implemented in SPSS, for instance - take ordinal dependent variables into account. In the paper we suggest an alternative tree-algorithm that: Requires explanatory categorical variables
Wrong Signs in Regression Coefficients
NASA Technical Reports Server (NTRS)
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Assessing visual green effects of individual urban trees using airborne Lidar data.
Chen, Ziyue; Xu, Bing; Gao, Bingbo
2015-12-01
Urban trees benefit people's daily life in terms of air quality, local climate, recreation and aesthetics. Among these functions, a growing number of studies have been conducted to understand the relationship between residents' preference towards local environments and visual green effects of urban greenery. However, except for on-site photography, there are few quantitative methods to calculate green visibility, especially tree green visibility, from viewers' perspectives. To fill this research gap, a case study was conducted in the city of Cambridge, which has a diversity of tree species, sizes and shapes. Firstly, a photograph-based survey was conducted to approximate the actual value of visual green effects of individual urban trees. In addition, small footprint airborne Lidar (Light detection and ranging) data was employed to measure the size and shape of individual trees. Next, correlations between visual tree green effects and tree structural parameters were examined. Through experiments and gradual refinement, a regression model with satisfactory R2 and limited large errors is proposed. Considering the diversity of sample trees and the result of cross-validation, this model has the potential to be applied to other study sites. This research provides urban planners and decision makers with an innovative method to analyse and evaluate landscape patterns in terms of tree greenness. PMID:26218562
Abstract Expression Grammar Symbolic Regression
NASA Astrophysics Data System (ADS)
Korns, Michael F.
This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
Time-Warped Geodesic Regression
Hong, Yi; Singh, Nikhil; Kwitt, Roland; Niethammer, Marc
2016-01-01
We consider geodesic regression with parametric time-warps. This allows, for example, to capture saturation effects as typically observed during brain development or degeneration. While highly-flexible models to analyze time-varying image and shape data based on generalizations of splines and polynomials have been proposed recently, they come at the cost of substantially more complex inference. Our focus in this paper is therefore to keep the model and its inference as simple as possible while allowing to capture expected biological variation. We demonstrate that by augmenting geodesic regression with parametric time-warp functions, we can achieve comparable flexibility to more complex models while retaining model simplicity. In addition, the time-warp parameters provide useful information of underlying anatomical changes as demonstrated for the analysis of corpora callosa and rat calvariae. We exemplify our strategy for shape regression on the Grassmann manifold, but note that the method is generally applicable for time-warped geodesic regression. PMID:25485368
ERIC Educational Resources Information Center
Tolman, Marvin
2005-01-01
Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…
ERIC Educational Resources Information Center
Center for Environmental Study, Grand Rapids, MI.
Tree Amigos is a special cross-cultural program that uses trees as a common bond to bring the people of the Americas together in unique partnerships to preserve and protect the shared global environment. It is a tangible program that embodies the philosophy that individuals, acting together, can make a difference. This resource book contains…
Phylogenetic trees in bioinformatics
Burr, Tom L
2008-01-01
Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.
Bayesian Evidence Framework for Decision Tree Learning
NASA Astrophysics Data System (ADS)
Chatpatanasiri, Ratthachat; Kijsirikul, Boonserm
2005-11-01
This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure `the fitness' for each decision tree given a set of observed data. Our method, in fact, is analogous to a well-known Bayesian model selection method for interpolating noisy continuous-value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham's razor, and hence reasonably deals with the issue of underfitting-overfitting tradeoff.
Eberly, Lynn E
2007-01-01
This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout. PMID:18450050
Energy Science and Technology Software Center (ESTSC)
2015-09-09
The NCCS Regression Test Harness is a software package that provides a framework to perform regression and acceptance testing on NCCS High Performance Computers. The package is written in Python and has only the dependency of a Subversion repository to store the regression tests.
Orthogonal Regression and Equivariance.
ERIC Educational Resources Information Center
Blankmeyer, Eric
Ordinary least-squares regression treats the variables asymmetrically, designating a dependent variable and one or more independent variables. When it is not obvious how to make this distinction, a researcher may prefer to use orthogonal regression, which treats the variables symmetrically. However, the usual procedure for orthogonal regression is…
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Rate of tree carbon accumulation increases continuously with tree size.
Stephenson, N L; Das, A J; Condit, R; Russo, S E; Baker, P J; Beckman, N G; Coomes, D A; Lines, E R; Morris, W K; Rüger, N; Alvarez, E; Blundo, C; Bunyavejchewin, S; Chuyong, G; Davies, S J; Duque, A; Ewango, C N; Flores, O; Franklin, J F; Grau, H R; Hao, Z; Harmon, M E; Hubbell, S P; Kenfack, D; Lin, Y; Makana, J-R; Malizia, A; Malizia, L R; Pabst, R J; Pongpattananurak, N; Su, S-H; Sun, I-F; Tan, S; Thomas, D; van Mantgem, P J; Wang, X; Wiser, S K; Zavala, M A
2014-03-01
Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle--particularly net primary productivity and carbon storage--increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree's total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to undertand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence. PMID:24429523
Rate of tree carbon accumulation increases continuously with tree size
NASA Astrophysics Data System (ADS)
Stephenson, N. L.; Das, A. J.; Condit, R.; Russo, S. E.; Baker, P. J.; Beckman, N. G.; Coomes, D. A.; Lines, E. R.; Morris, W. K.; Rüger, N.; Álvarez, E.; Blundo, C.; Bunyavejchewin, S.; Chuyong, G.; Davies, S. J.; Duque, Á.; Ewango, C. N.; Flores, O.; Franklin, J. F.; Grau, H. R.; Hao, Z.; Harmon, M. E.; Hubbell, S. P.; Kenfack, D.; Lin, Y.; Makana, J.-R.; Malizia, A.; Malizia, L. R.; Pabst, R. J.; Pongpattananurak, N.; Su, S.-H.; Sun, I.-F.; Tan, S.; Thomas, D.; van Mantgem, P. J.; Wang, X.; Wiser, S. K.; Zavala, M. A.
2014-03-01
Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle--particularly net primary productivity and carbon storage--increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree's total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to undertand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-07-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-02-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
NASA Astrophysics Data System (ADS)
Ormeño, M. I.; Faúndez-Abans, M.; Cavada, G.
2003-08-01
A importância deste trabalho deve-se à seleção de objetos ainda não tratados particularmente como uma família e ao emprego de procedimento estatístico robusto que não precisa de pressupostos ou condições de contorno. Contribui, assim, ao melhor entendimento do cenário das Galáxias Aneladas do diagrama de Hubble via classificação e estudo de subclasses. Selecionaram-se 100 galáxias possuidoras de dois anéis do Catalog of Southern Ringed Galaxies compilado por Ronald Buta, de modo a construir uma amostra completa em termos de conhecimento dos semi-eixos dos anéis interno e externo projetados no plano do céu. Visando uma possível classificação destas galáxias aneladas normais em famílias de acordo com as características geométricas dos anéis, empregou-se primeiramente a Análise de Aglomerados (ferramenta de classificação: medições de semelhança em um espaço bidimensional) para explorar a possível existência de famílias. As variáveis analisadas foram: os diâmetros interiores menores d(I) e maiores D(I), os diâmetros exteriores menores d(E) e maiores D(E), e os ângulos de inclinação dos semi-eixos maiores interiores q(I) e exteriores q(E) dos anéis. Como metodologia de discriminação, empregou-se a construção de Árvores de Classificação. As árvores de classificação constituem um método de discriminação alternativo aos modelos clássicos, tais como a Análise Discriminante e a Regressão Logística, onde uma base de dados é dividida em partições (subgrupos) da árvore por ação de um predictor (variável específica). Os pacotes estatísticos utilizados para o processamento da informação foram: SAS versão 8.0 (Statistical Analisys System) e CART versão 3.6.3. Esta análise estatística sugere a existência de três possíveis famílias de galáxias bianeladas, com base apenas na geometria dos anéis. Como forma exploratória inicial deste resultado, a construção de um diagrama BT (magnitude total) versus o
Decision tree modeling using R
2016-01-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769
Rate of tree carbon accumulation increases continuously with tree size
Stephenson, N.L.; Das, A.J.; Condit, R.; Russo, S.E.; Baker, P.J.; Beckman, N.G.; Coomes, D.A.; Lines, E.R.; Morris, W.K.; Rüger, N.; Álvarez, E.; Blundo, C.; Bunyavejchewin, S.; Chuyong, G.; Davies, S.J.; Duque, Á.; Ewango, C.N.; Flores, O.; Franklin, J.F.; Grau, H.R.; Hao, Z.; Harmon, M.E.; Hubbell, S.P.; Kenfack, D.; Lin, Y.; Makana, J.-R.; Malizia, A.; Malizia, L.R.; Pabst, R.J.; Pongpattananurak, N.; Su, S.-H.; Sun, I-F.; Tan, S.; Thomas, D.; van Mantgem, P.J.; Wang, X.; Wiser, S.K.; Zavala, M.A.
2014-01-01
Forests are major components of the global carbon cycle, providing substantial feedback to atmospheric greenhouse gas concentrations. Our ability to understand and predict changes in the forest carbon cycle—particularly net primary productivity and carbon storage—increasingly relies on models that represent biological processes across several scales of biological organization, from tree leaves to forest stands. Yet, despite advances in our understanding of productivity at the scales of leaves and stands, no consensus exists about the nature of productivity at the scale of the individual tree, in part because we lack a broad empirical assessment of whether rates of absolute tree mass growth (and thus carbon accumulation) decrease, remain constant, or increase as trees increase in size and age. Here we present a global analysis of 403 tropical and temperate tree species, showing that for most species mass growth rate increases continuously with tree size. Thus, large, old trees do not act simply as senescent carbon reservoirs but actively fix large amounts of carbon compared to smaller trees; at the extreme, a single big tree can add the same amount of carbon to the forest within a year as is contained in an entire mid-sized tree. The apparent paradoxes of individual tree growth increasing with tree size despite declining leaf-level and stand-level productivity can be explained, respectively, by increases in a tree’s total leaf area that outpace declines in productivity per unit of leaf area and, among other factors, age-related reductions in population density. Our results resolve conflicting assumptions about the nature of tree growth, inform efforts to understand and model forest carbon dynamics, and have additional implications for theories of resource allocation and plant senescence.
NASA Technical Reports Server (NTRS)
Martensen, Anna L.; Butler, Ricky W.
1987-01-01
The Fault Tree Compiler Program is a new reliability tool used to predict the top event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N gates. The high level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precise (within the limits of double precision floating point arithmetic) to the five digits in the answer. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Corporation VAX with the VMS operation system.
Prediction in Multiple Regression.
ERIC Educational Resources Information Center
Osborne, Jason W.
2000-01-01
Presents the concept of prediction via multiple regression (MR) and discusses the assumptions underlying multiple regression analyses. Also discusses shrinkage, cross-validation, and double cross-validation of prediction equations and describes how to calculate confidence intervals around individual predictions. (SLD)
Improved Regression Calibration
ERIC Educational Resources Information Center
Skrondal, Anders; Kuha, Jouni
2012-01-01
The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration…
Hanover, J.W.; Hart, J.W.
1980-05-09
Michigan State University has been conducting research on growth control of woody plants with emphasis on commercial plantations. The objective was to develop the optimum levels for the major factors that affect tree seedling growth and development so that high quality plants can be produced for a specific use. This article describes the accelerated-optimal-growth (AOG) concept, describes precautions to take in its application, and shows ways to maximize the potential of AOG for producing ornamental trees. Factors considered were container growing system; protective culture including light, temperature, mineral nutrients, water, carbon dioxide, growth regulators, mycorrhizae, growing media, competition, and pests; size of seedlings; and acclamation. 1 table. (DP)
ERIC Educational Resources Information Center
National Audubon Society, New York, NY.
Included are an illustrated student reader, "The Story of Trees," a leaders' guide, and a large tree chart with 37 colored pictures. The student reader reviews several aspects of trees: a definition of a tree; where and how trees grow; flowers, pollination and seed production; how trees make their food; how to recognize trees; seasonal changes;…
Visualizing phylogenetic trees using TreeView.
Page, Roderic D M
2002-08-01
TreeView provides a simple way to view the phylogenetic trees produced by a range of programs, such as PAUP*, PHYLIP, TREE-PUZZLE, and ClustalX. While some phylogenetic programs (such as the Macintosh version of PAUP*) have excellent tree printing facilities, many programs do not have the ability to generate publication quality trees. TreeView addresses this need. The program can read and write a range of tree file formats, display trees in a variety of styles, print trees, and save the tree as a graphic file. Protocols in this unit cover both displaying and printing a tree. Support protocols describe how to download and install TreeView, and how to display bootstrap values in trees generated by ClustalX and PAUP*. PMID:18792942
Shi, Tao
2016-03-01
Complicated history of gene duplication and loss brings challenge to molecular phylogenetic inference, especially in deep phylogenies. However, phylogenomic approaches, such as gene tree parsimony (GTP), show advantage over some other approaches in its ability to use gene families with duplications. GTP searches the 'optimal' species tree by minimizing the total cost of biological events such as duplications, but accuracy of GTP and phylogenetic signal in the context of different gene families with distinct histories of duplication and loss are unclear. To evaluate how different evolutionary properties of different gene families can impact on species tree inference, 3900 gene families from seven angiosperms encompassing a wide range of gene content, lineage-specific expansions and contractions were analyzed. It was found that the gene content and total duplication number in a gene family strongly influence species tree inference accuracy, with the highest accuracy achieved at either very low or very high gene content (or duplication number) and lowest accuracy centered in intermediate gene content (or duplication number), as the relationship can fit a binomial regression. Besides, for gene families of similar level of average gene content, those with relatively higher lineage-specific expansion or duplication rates tend to show lower accuracy. Additional correlation tests support that high accuracy for those gene families with large gene content may rely on abundant ancestral copies to provide many subtrees to resolve conflicts, whereas high accuracy for single or low copy gene families are just subject to sequence substitution per se. Very low accuracy reached by gene families of intermediate gene content or duplication number can be due to insufficient subtrees to resolve the conflicts from loss of alternative copies. As these evolutionary properties can significantly influence species tree accuracy, I discussed the potential weighting of the duplication cost by
On Tree-Based Phylogenetic Networks.
Zhang, Louxin
2016-07-01
A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model. PMID:27228397
Factors Governing Stemflow Production from Plantation Grown Teak Trees in Thailand
NASA Astrophysics Data System (ADS)
Tanaka, N.; Levia, D. F., Jr.; Igarashi, Y.; Yoshifuji, N.; Tanaka, K.; Chatchai, T.; Nanko, K.; Suzuki, M.; Kumagai, T.
2015-12-01
Stemflow (SF) is recognized as an important process delivering water, solute, and particulate fluxes to spatially localized areas of the forest floor. Using both long-term SF data from nine even-aged deciduous teak trees grown in the same plantation and meteorological data from a nearby tower, this study seeks to better understand how: (1) specific biotic and abiotic factors control stand-scale SF production of teak; and (2) various biotic and abiotic factors affect tree-to-tree variations in teak SF production. A conventional regression analysis of SF volume against rainfall indicates that, for five individuals among the nine, SF was more efficiently produced in the leafless than in the leafed. However, for the other individuals, there was no such a relation, suggesting tree-to-tree variation in the response of SF to canopy status. A boosted regression tree (BRT) analysis setting daily basis SF funneling ratios (SFF) of the nine trees as dependent variables, indicates that SFF was intricately controlled by a variety of biotic and abiotic factors. The top six influential factors were, in descending order, rainfall duration, tree height, rainfall intensity, air temperature, wind speed, and antecedent dry period length having positive, negative, positive, negative, positive, and negative influence on SFF, respectively. Although teak exhibits drastic intra-annual changes in leaf phenology, leaf area index (LAI) had an unexpectedly small influence on SFF on a stand scale. Additional BRT analyses focusing on individuals with the maximum and the minimum SFF values (among the nine individuals) showed that there was considerable tree-to-tree variation in an array of the influential variables for SFF, even though they were planted in the same year and grown in the same plot. In addition to this difference, the BRT analyses also showed that response of SFF to LAI differs between the two individuals. The differentiating responses to LAI depending on individuals may be the
George: Gaussian Process regression
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel
2015-11-01
George is a fast and flexible library, implemented in C++ with Python bindings, for Gaussian Process regression useful for accounting for correlated noise in astronomical datasets, including those for transiting exoplanet discovery and characterization and stellar population modeling.
Multivariate Regression with Calibration*
Liu, Han; Wang, Lie; Zhao, Tuo
2014-01-01
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite-sample performance. Computationally, we develop an efficient smoothed proximal gradient algorithm which has a worst-case iteration complexity O(1/ε), where ε is a pre-specified numerical accuracy. Theoretically, we prove that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR is as competitive as the handcrafted model created by human experts. PMID:25620861
Image segmentation via piecewise constant regression
NASA Astrophysics Data System (ADS)
Acton, Scott T.; Bovik, Alan C.
1994-09-01
We introduce a novel unsupervised image segmentation technique that is based on piecewise constant (PICO) regression. Given an input image, a PICO output image for a specified feature size (scale) is computed via nonlinear regression. The regression effectively provides the constant region segmentation of the input image that has a minimum deviation from the input image. PICO regression-based segmentation avoids the problems of region merging, poor localization, region boundary ambiguity, and region fragmentation. Additionally, our segmentation method is particularly well-suited for corrupted (noisy) input data. An application to segmentation and classification of remotely sensed imagery is provided.
Regression based modeling of vegetation and climate variables for the Amazon rainforests
NASA Astrophysics Data System (ADS)
Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.
2015-12-01
Both short-term (weather) and long-term (climate) variations in the atmosphere directly impact various ecosystems on earth. Forest ecosystems, especially tropical forests, are crucial as they are the largest reserves of terrestrial carbon sink. For example, the Amazon forests are a critical component of global carbon cycle storing about 100 billion tons of carbon in its woody biomass. There is a growing concern that these forests could succumb to precipitation reduction in a progressively warming climate, leading to release of significant amount of carbon in the atmosphere. Therefore, there is a need to accurately quantify the dependence of vegetation growth on different climate variables and obtain better estimates of drought-induced changes to atmospheric CO2. The availability of globally consistent climate and earth observation datasets have allowed global scale monitoring of various climate and vegetation variables such as precipitation, radiation, surface greenness, etc. Using these diverse datasets, we aim to quantify the magnitude and extent of ecosystem exposure, sensitivity and resilience to droughts in forests. The Amazon rainforests have undergone severe droughts twice in last decade (2005 and 2010), which makes them an ideal candidate for the regional scale analysis. Current studies on vegetation and climate relationships have mostly explored linear dependence due to computational and domain knowledge constraints. We explore a modeling technique called symbolic regression based on evolutionary computation that allows discovery of the dependency structure without any prior assumptions. In symbolic regression the population of possible solutions is defined via trees structures. Each tree represents a mathematical expression that includes pre-defined functions (mathematical operators) and terminal sets (independent variables from data). Selection of these sets is critical to computational efficiency and model accuracy. In this work we investigate
Cascades of Regression Tree Fields for Image Restoration.
Schmidt, Uwe; Jancsary, Jeremy; Nowozin, Sebastian; Roth, Stefan; Rother, Carsten
2016-04-01
Conditional random fields (CRFs) are popular discriminative models for computer vision and have been successfully applied in the domain of image restoration, especially to image denoising. For image deblurring, however, discriminative approaches have been mostly lacking. We posit two reasons for this: First, the blur kernel is often only known at test time, requiring any discriminative approach to cope with considerable variability. Second, given this variability it is quite difficult to construct suitable features for discriminative prediction. To address these challenges we first show a connection between common half-quadratic inference for generative image priors and Gaussian CRFs. Based on this analysis, we then propose a cascade model for image restoration that consists of a Gaussian CRF at each stage. Each stage of our cascade is semi-parametric, i.e., it depends on the instance-specific parameters of the restoration problem, such as the blur kernel. We train our model by loss minimization with synthetically generated training data. Our experiments show that when applied to non-blind image deblurring, the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur. Moreover, we demonstrate its suitability for image denoising, where we achieve competitive results for grayscale and color images. PMID:26959673
Zong, Shengwei; Wu, Zhengfang; Xu, Jiawei; Li, Ming; Gao, Xiaofeng; He, Hongshi; Du, Haibo; Wang, Lei
2014-01-01
Tree line ecotone in the Changbai Mountains has undergone large changes in the past decades. Tree locations show variations on the four sides of the mountains, especially on the northern and western sides, which has not been fully explained. Previous studies attributed such variations to the variations in temperature. However, in this study, we hypothesized that topographic controls were responsible for causing the variations in the tree locations in tree line ecotone of the Changbai Mountains. To test the hypothesis, we used IKONOS images and WorldView-1 image to identify the tree locations and developed a logistic regression model using topographical variables to identify the dominant controls of the tree locations. The results showed that aspect, wetness, and slope were dominant controls for tree locations on western side of the mountains, whereas altitude, SPI, and aspect were the dominant factors on northern side. The upmost altitude a tree can currently reach was 2140 m asl on the northern side and 2060 m asl on western side. The model predicted results showed that habitats above the current tree line on the both sides were available for trees. Tree recruitments under the current tree line may take advantage of the available habitats at higher elevations based on the current tree location. Our research confirmed the controlling effects of topography on the tree locations in the tree line ecotone of Changbai Mountains and suggested that it was essential to assess the tree response to topography in the research of tree line ecotone. PMID:25170918
Regression versus No Regression in the Autistic Disorder: Developmental Trajectories
ERIC Educational Resources Information Center
Bernabei, P.; Cerquiglini, A.; Cortesi, F.; D' Ardia, C.
2007-01-01
Developmental regression is a complex phenomenon which occurs in 20-49% of the autistic population. Aim of the study was to assess possible differences in the development of regressed and non-regressed autistic preschoolers. We longitudinally studied 40 autistic children (18 regressed, 22 non-regressed) aged 2-6 years. The following developmental…
ERIC Educational Resources Information Center
Jenkins, Peter
Tree climbing offers a safe, inexpensive adventure sport that can be performed almost anywhere. Using standard procedures practiced in tree surgery or rock climbing, almost any tree can be climbed. Tree climbing provides challenge and adventure as well as a vigorous upper-body workout. Tree Climbers International classifies trees using a system…
NASA Astrophysics Data System (ADS)
Darnah
2016-04-01
Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
Smit, Gert N
2005-01-01
Background The investigation was conducted in a savanna area covered by what was considered an undesirably dense stand of Colophospermum mopane trees, mainly because such a dense stand of trees often results in the suppression of herbaceous plants. The objectives of this study were to determine the influence of intensity of tree thinning on the dry matter yield of herbaceous plants (notably grasses) and to investigate differences in herbaceous species composition between defined subhabitats (under tree canopies, between tree canopies and where trees have been removed). Seven plots (65 × 180 m) were subjected to different intensities of tree thinning, ranging from a totally cleared plot (0 %) to plots thinned to the equivalent of 10 %, 20%, 35 %, 50% and 75 % of the leaf biomass of a control plot (100 %) with a tree density of 2711 plants ha-1. The establishment of herbaceous plants (grasses and forbs) in response to reduced competition from the woody plants was measured during three full growing seasons following the thinning treatments. Results The grass component reacted positively to the tree thinning in terms of total dry matter (DM) yield, but forbs were negatively influenced. Rainfall interacted with tree density and the differences between grass DM yields in thinned plots during years of below average rainfall were substantially higher than those of the control. At high tree densities, yields differed little between seasons of varying rainfall. The relation between grass DM yield and tree biomass was curvilinear, best described by the exponential regression equation. Subhabitat differentiation by C. mopane trees did provide some qualitative benefits, with certain desirable grass species showing a preference for the subhabitat under tree canopies. Conclusion While it can be concluded from this study that high tree densities suppress herbaceous production, the decision to clear/thin the C. mopane trees should include additional considerations. Thinning of C
Extensions and applications of ensemble-of-trees methods in machine learning
NASA Astrophysics Data System (ADS)
Bleich, Justin
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of
Springer, Mark S; Gatesy, John
2016-01-01
Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are
MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.
Chen, Shu-Chuan; Ogata, Aaron
2015-01-01
The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process. PMID:25826378
Understanding Boswellia papyrifera tree secondary metabolites through bark spectral analysis
NASA Astrophysics Data System (ADS)
Girma, Atkilt; Skidmore, Andrew K.; de Bie, C. A. J. M.; Bongers, Frans
2015-07-01
Decision makers are concerned whether to tap or rest Boswellia Papyrifera trees. Tapping for the production of frankincense is known to deplete carbon reserves from the tree leading to production of less viable seeds, tree carbon starvation and ultimately tree mortality. Decision makers use traditional experience without considering the amount of metabolites stored or depleted from the stem-bark of the tree. This research was designed to come up with a non-destructive B. papyrifera tree metabolite estimation technique relevant for management using spectroscopy. The concentration of biochemicals (metabolites) found in the tree bark was estimated through spectral analysis. Initially, a random sample of 33 trees was selected, the spectra of bark measured with an Analytical Spectral Device (ASD) spectrometer. Bark samples were air dried and ground. Then, 10 g of sample was soaked in Petroleum ether to extract crude metabolites. Further chemical analysis was conducted to quantify and isolate pure metabolite compounds such as incensole acetate and boswellic acid. The crude metabolites, which relate to frankincense produce, were compared to plant properties (such as diameter and crown area) and reflectance spectra of the bark. Moreover, the extract was compared to the ASD spectra using partial least square regression technique (PLSR) and continuum removed spectral analysis. The continuum removed spectral analysis were performed, on two wavelength regions (1275-1663 and 1836-2217) identified through PLSR, using absorption features such as band depth, area, position, asymmetry and the width to characterize and find relationship with the bark extracts. The results show that tree properties such as diameter at breast height (DBH) and the crown area of untapped and healthy trees were strongly correlated to the amount of stored crude metabolites. In addition, the PLSR technique applied to the first derivative transformation of the reflectance spectrum was found to estimate the
Explorations in Statistics: Regression
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2011-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This seventh installment of "Explorations in Statistics" explores regression, a technique that estimates the nature of the relationship between two things for which we may only surmise a mechanistic or predictive connection.…
Modern Regression Discontinuity Analysis
ERIC Educational Resources Information Center
Bloom, Howard S.
2012-01-01
This article provides a detailed discussion of the theory and practice of modern regression discontinuity (RD) analysis for estimating the effects of interventions or treatments. Part 1 briefly chronicles the history of RD analysis and summarizes its past applications. Part 2 explains how in theory an RD analysis can identify an average effect of…
Webcast entitled Statistical Tools for Making Sense of Data, by the National Nutrient Criteria Support Center, N-STEPS (Nutrients-Scientific Technical Exchange Partnership. The section "Correlation and Regression" provides an overview of these two techniques in the context of nut...
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Partial covariate adjusted regression
Şentürk, Damla; Nguyen, Danh V.
2008-01-01
Covariate adjusted regression (CAR) is a recently proposed adjustment method for regression analysis where both the response and predictors are not directly observed (Şentürk and Müller, 2005). The available data has been distorted by unknown functions of an observable confounding covariate. CAR provides consistent estimators for the coefficients of the regression between the variables of interest, adjusted for the confounder. We develop a broader class of partial covariate adjusted regression (PCAR) models to accommodate both distorted and undistorted (adjusted/unadjusted) predictors. The PCAR model allows for unadjusted predictors, such as age, gender and demographic variables, which are common in the analysis of biomedical and epidemiological data. The available estimation and inference procedures for CAR are shown to be invalid for the proposed PCAR model. We propose new estimators and develop new inference tools for the more general PCAR setting. In particular, we establish the asymptotic normality of the proposed estimators and propose consistent estimators of their asymptotic variances. Finite sample properties of the proposed estimators are investigated using simulation studies and the method is also illustrated with a Pima Indians diabetes data set. PMID:20126296
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Bayesian ARTMAP for regression.
Sasu, L M; Andonie, R
2013-10-01
Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. PMID:23665468
Atlas of United States Trees, Volume 2: Alaska Trees and Common Shrubs.
ERIC Educational Resources Information Center
Viereck, Leslie A.; Little, Elbert L., Jr.
This volume is the second in a series of atlases describing the natural distribution or range of native tree species in the United States. The 82 species maps include 32 of trees in Alaska, 6 of shrubs rarely reaching tree size, and 44 more of common shrubs. More than 20 additional maps summarize environmental factors and furnish general…
Ridge Regression: A Regression Procedure for Analyzing Correlated Independent Variables.
ERIC Educational Resources Information Center
Rakow, Ernest A.
Ridge regression is presented as an analytic technique to be used when predictor variables in a multiple linear regression situation are highly correlated, a situation which may result in unstable regression coefficients and difficulties in interpretation. Ridge regression avoids the problem of selection of variables that may occur in stepwise…
NASA Astrophysics Data System (ADS)
Cho, Junghwan; Li, Xiaopeng; Gu, Zhiyong; Kurup, Pradeep
2011-09-01
This paper aims to classify and estimate concentrations of explosive precursors using a nanowire sensor array and decision tree learning algorithm. The nanowire sensor array consists of tin oxide sensors with four different additives, platinum (Pt), copper (Cu), indium (In), and nickel (Ni). The nanowire sensor array was tested using the vapors from four explosives precursors, acetone, nitrobenzene, nitrotoluene, and octane with 10 different concentration levels each. A pattern recognition technique based on decision tree learning was applied to classify the explosive precursors and estimate their concentration. Classification and regression tree (CART) analysis was used for classification. The CART was also utilized for the purpose of structure identification in Sugeno fuzzy inference system (FIS) for estimating the concentration of the precursors. Two CARTs were trained and their testing results were investigated.
Ridge Regression Signal Processing
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Fast Censored Linear Regression
HUANG, YIJIAN
2013-01-01
Weighted log-rank estimating function has become a standard estimation method for the censored linear regression model, or the accelerated failure time model. Well established statistically, the estimator defined as a consistent root has, however, rather poor computational properties because the estimating function is neither continuous nor, in general, monotone. We propose a computationally efficient estimator through an asymptotics-guided Newton algorithm, in which censored quantile regression methods are tailored to yield an initial consistent estimate and a consistent derivative estimate of the limiting estimating function. We also develop fast interval estimation with a new proposal for sandwich variance estimation. The proposed estimator is asymptotically equivalent to the consistent root estimator and barely distinguishable in samples of practical size. However, computation time is typically reduced by two to three orders of magnitude for point estimation alone. Illustrations with clinical applications are provided. PMID:24347802
ERIC Educational Resources Information Center
Smithyman, S. J.
This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…
1999-07-01
This article discusses women's land rights in the context of the findings of the paper, "Women's Land Rights in the Transition to Individualized Ownership: Implications for Tree Resource Management in Western Ghana." The study showed that customary land tenure institutions have evolved toward individualized systems, which provide incentives to invest in tree planting. In effect, individualization of land tenure had strengthened women's land rights through inter vivos gifts. However, transferring of land ownership to women is unlikely to raise productivity if access to and use of other inputs remains unequal. This suggests that attempts to equalize land rights of men and women are unlikely to lead to gender equity and improved efficiency and productivity of women farmers unless other constraints faced by women are also addressed. The article also documents comments, suggestions, and recommendations in response to the summary of the paper. In addition, the different practices of guaranteeing land ownership for women in some countries of Africa are presented. PMID:12295514
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Environmental conditions for alternative tree cover states in high latitudes
NASA Astrophysics Data System (ADS)
Abis, Beniamino; Brovkin, Victor
2016-04-01
Previous analysis of the vegetation cover from remote sensing revealed the existence of three alternative modes in the frequency distribution of boreal tree cover: a sparsely vegetated treeless state, a savanna-like state, and a forest state. Identifying which are the regions subject to multimodality, and assessing which are the main factors underlying their existence, is important to project future change of natural vegetation cover and its effect on climate. We study the impact on the forest cover fraction distribution of seven globally-observed environmental factors: mean annual rainfall, mean minimum temperature, growing degree days above 0, permafrost distribution, soil moisture, wildfire occurrence frequency, and thawing depth. Through the use of generalised additive models, regression trees, and conditional histograms, we find that the main factors determining the forest distribution in high latitudes are: permafrost distribution, mean annual rainfall, mean minimum temperature, soil moisture, and wildfire frequency. Additionally, we find differences between regions within the boreal area, such as Eurasia, Eastern North America, and Western North America. Furthermore, using a classification based on these factors, we show the existence and location of alternative tree cover states under the same climate conditions in the boreal region. These are areas of potential interest for a more detailed analysis of land-atmosphere interactions.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression. PMID:18450049
Yaussy, D.A.
1993-03-01
The generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive variables, allowing the equations to be incorporated into individual-tree growth and yield simulators such as NE-TWIGS.
Digression and Value Concatenation to Enable Privacy-Preserving Regression
Li, Xiao-Bai; Sarkar, Sumit
2015-01-01
Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals’ sensitive data. This problem, which we call a “regression attack,” has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression, which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis. PMID:26752802
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Estimating Scots Pine Tree Mortality Using High Resolution Multispectral Images
NASA Astrophysics Data System (ADS)
Buriak, L.; Sukhinin, A. I.; Conard, S. G.; Ivanova, G. A.; McRae, D. J.; Soja, A. J.; Okhotkina, E.
2010-12-01
Scots pine (Pinus sylvestris) forest stands of central Siberia are characterized by a mixed-severity fire regime that is dominated by low- to high-severity surface fires, with crown fires occurring less frequently. The purpose of this study was to link ground measurements with air-borne and satellite observations of active wildfires and older fire scars to better estimate tree mortality remotely. Data from field sampling on experimental fires and wildfires were linked with intermediate-resolution satellite (Landsat Enhanced Thematic Mapper) data to estimate fire severity and carbon emissions. Results are being applied to Advanced Very High Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectroradiometer (MODIS) imagery, MERIS, Landsat-ETM, SPOT (i.e., low, middle and high spatial resolution), to understand their remote-sensing capability for mapping fire severity, as indicated by tree mortality. Tree mortality depends on fireline intensity, residence time, and the physiological effects on the cambium layer, foliage and roots. We have correlated tree mortality measured after fires of varying severity with NDVI and other Chlorophyll Indexes to model tree mortality on a landscape scale. The field data obtained on experimental and wildfires are being analyzed and compared with intermediate-resolution satellite data (Landsat7-ETM) to help estimate fire severity, emissions, and carbon balance. In addition, it is being used to monitor immediate ecosystem fire effects (e.g., tree mortality) and long-term postfire vegetation recovery. These data are also being used to validate AVHRR , MODIS, and MERIS estimates of burn area. We studied burned areas in the Angara Region of central Siberia (northeast of Lake Baikal) for which both ground data and satellite data (ENVISAT-MERIS, Spot4, Landsat5, Landsat7-ETM) were available for the 2003 - 2004 and 2006 - 2008 periods. Ground validation was conducted on seventy sample plots established on burned sites differing in
Using tree diversity to compare phylogenetic heuristics
Sul, Seung-Jin; Matthews, Suzanne; Williams, Tiffani L
2009-01-01
Background Evolutionary trees are family trees that represent the relationships between a group of organisms. Phylogenetic heuristics are used to search stochastically for the best-scoring trees in tree space. Given that better tree scores are believed to be better approximations of the true phylogeny, traditional evaluation techniques have used tree scores to determine the heuristics that find the best scores in the fastest time. We develop new techniques to evaluate phylogenetic heuristics based on both tree scores and topologies to compare Pauprat and Rec-I-DCM3, two popular Maximum Parsimony search algorithms. Results Our results show that although Pauprat and Rec-I-DCM3 find the trees with the same best scores, topologically these trees are quite different. Furthermore, the Rec-I-DCM3 trees cluster distinctly from the Pauprat trees. In addition to our heatmap visualizations of using parsimony scores and the Robinson-Foulds distance to compare best-scoring trees found by the two heuristics, we also develop entropy-based methods to show the diversity of the trees found. Overall, Pauprat identifies more diverse trees than Rec-I-DCM3. Conclusion Overall, our work shows that there is value to comparing heuristics beyond the parsimony scores that they find. Pauprat is a slower heuristic than Rec-I-DCM3. However, our work shows that there is tremendous value in using Pauprat to reconstruct trees—especially since it finds identical scoring but topologically distinct trees. Hence, instead of discounting Pauprat, effort should go in improving its implementation. Ultimately, improved performance measures lead to better phylogenetic heuristics and will result in better approximations of the true evolutionary history of the organisms of interest. PMID:19426451
The influence of tree morphology on stemflow generation in a tropical lowland rainforest
NASA Astrophysics Data System (ADS)
Uber, Magdalena; Levia, Delphis F.; Zimmermann, Beate; Zimmermann, Alexander
2014-05-01
Even though stemflow usually accounts for only a small proportion of rainfall, it is an important point source of water and ion input to forest floors and may, for instance, influence soil moisture patterns and groundwater recharge. Previous studies showed that the generation of stemflow depends on a multitude of meteorological and biological factors. Interestingly, despite the tremendous progress in stemflow research during the last decades it is still largely unknown which combination of tree characteristics determines stemflow volumes in species-rich tropical forests. This knowledge gap motivated us to analyse the influence of tree characteristics on stemflow volumes in a 1 hectare plot located in a Panamanian lowland rainforest. Our study comprised stemflow measurements in six randomly selected 10 m by 10 m subplots. In each subplot we measured stemflow of all trees with a diameter at breast height (DBH) > 5 cm on an event-basis for a period of six weeks. Additionally, we identified all tree species and determined a set of tree characteristics including DBH, crown diameter, bark roughness, bark furrowing, epiphyte coverage, tree architecture, stem inclination, and crown position. During the sampling period, we collected 985 L of stemflow (0.98 % of total rainfall). Based on regression analyses and comparisons among plant functional groups we show that palms were most efficient in yielding stemflow due to their large inclined fronds. Trees with large emergent crowns also produced relatively large amounts of stemflow. Due to their abundance, understory trees contribute much to stemflow yield not on individual but on the plot scale. Even though parameters such as crown diameter, branch inclination and position of the crown influence stemflow generation to some extent, these parameters explain less than 30 % of the variation in stemflow volumes. In contrast to published results from temperate forests, we did not detect a negative correlation between bark roughness
Brands, K.W.; Ball, I.G.; Cegielski, E.J.; Gresham, J.S.; Saunders, D.N.
1982-09-01
This paper outlines the overall project for development and installation of a low-profile, caisson-installed subsea Christmas tree. After various design studies and laboratory and field tests of key components, a system for installation inside a 30-in. conductor was ordered in July 1978 from Cameron Iron Works Inc. The system is designed to have all critical-pressure-containing components below the mudline and, with the reduced profile (height) above seabed, provides for improved safety of satellite underwater wells from damage by anchors, trawl boards, and even icebergs. In addition to the innovative nature of the tree design, the completion includes improved 3 1/2-in. through flowline (TFL) pumpdown completion equipment with deep set safety valves and a dual detachable packer head for simplified workover capability. The all-hydraulic control system incorporates a new design of sequencing valve for both Christmas tree control and remote flowline connection. A semisubmersible drilling rig was used to initiate the first end flowline connection at the wellhead for subsequent tie-in to the prelaid, surface-towed, all-welded subsea pipeline bundle.
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.
NASA Astrophysics Data System (ADS)
Vogt, Peter R.
2004-09-01
Nature often replicates her processes at different scales of space and time in differing media. Here a tree-trunk cross section I am preparing for a dendrochronological display at the Battle Creek Cypress Swamp Nature Sanctuary (Calvert County, Maryland) dried and cracked in a way that replicates practically all the planform features found along the Mid-Oceanic Ridge (see Figure 1). The left-lateral offset of saw marks, contrasting with the right-lateral ``rift'' offset, even illustrates the distinction between transcurrent (strike-slip) and transform faults, the latter only recognized as a geologic feature, by J. Tuzo Wilson, in 1965. However, wood cracking is but one of many examples of natural processes that replicate one or several elements of lithospheric plate tectonics. Many of these examples occur in everyday venues and thus make great teaching aids, ``teachable'' from primary school to university levels. Plate tectonics, the dominant process of Earth geology, also occurs in miniature on the surface of some lava lakes, and as ``ice plate tectonics'' on our frozen seas and lakes. Ice tectonics also happens at larger spatial and temporal scales on the Jovian moons Europa and perhaps Ganymede. Tabletop plate tectonics, in which a molten-paraffin ``asthenosphere'' is surfaced by a skin of congealing wax ``plates,'' first replicated Mid-Oceanic Ridge type seafloor spreading more than three decades ago. A seismologist (J. Brune, personal communication, 2004) discovered wax plate tectonics by casually and serendipitously pulling a stick across a container of molten wax his wife and daughters had used in making candles. Brune and his student D. Oldenburg followed up and mirabile dictu published the results in Science (178, 301-304).
ERIC Educational Resources Information Center
Boyd, Amy E.; Cooper, Jim
2004-01-01
Tree rings can be used not only to look at plant growth, but also to make connections between plant growth and resource availability. In this lesson, students in 2nd-4th grades use role-play to become familiar with basic requirements of trees and how availability of those resources is related to tree ring sizes and tree growth. These concepts can…
Recursive Algorithm For Linear Regression
NASA Technical Reports Server (NTRS)
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Multipolar consensus for phylogenetic trees.
Bonnard, Cécile; Berry, Vincent; Lartillot, Nicolas
2006-10-01
Collections of phylogenetic trees are usually summarized using consensus methods. These methods build a single tree, supposed to be representative of the collection. However, in the case of heterogeneous collections of trees, the resulting consensus may be poorly resolved (strict consensus, majority-rule consensus, ...), or may perform arbitrary choices among mutually incompatible clades, or splits (greedy consensus). Here, we propose an alternative method, which we call the multipolar consensus (MPC). Its aim is to display all the splits having a support above a predefined threshold, in a minimum number of consensus trees, or poles. We show that the problem is equivalent to a graph-coloring problem, and propose an implementation of the method. Finally, we apply the MPC to real data sets. Our results indicate that, typically, all the splits down to a weight of 10% can be displayed in no more than 4 trees. In addition, in some cases, biologically relevant secondary signals, which would not have been present in any of the classical consensus trees, are indeed captured by our method, indicating that the MPC provides a convenient exploratory method for phylogenetic analysis. The method was implemented in a package freely available at http://www.lirmm.fr/~cbonnard/MPC.html PMID:17060203
Childers, Carl C; Ueckermann, Eduard A
2015-03-01
Seven citrus orchards on reduced- to no-pesticide spray programs in central and south central Florida were sampled for non-phytoseiid mesostigmatid mites. Inner and outer canopy leaves, fruits, twigs and trunk scrapings were sampled monthly between August 1994 and January 1996. Open flowers were sampled in March from five of the sites. A total of 431 samples from one or more of 82 vine or ground cover plants were sampled monthly in five of the seven orchards. Two of the seven orchards (Mixon I and II) were on full herbicide programs and vines and ground cover plants were absent. A total of 2,655 mites (26 species) within the families: Ascidae, Blattisociidae, Laelapidae, Macrochelidae, Melicharidae, Pachylaelapidae and Parasitidae were identified. A total of 685 mites in the genus Asca (nine species: family Ascidae) were collected from within tree samples, 79 from vine or ground cover plants. Six species of Blattisociidae were collected: Aceodromus convolvuli, Blattisocius dentriticus, B. keegani, Cheiroseius sp. near jamaicensis, Lasioseius athiashenriotae and L. dentatus. A total of 485 Blattisociidae were collected from within tree samples compared with 167 from vine or ground cover plants. Low numbers of Laelapidae and Macrochelidae were collected from within tree samples. One Zygoseius furciger (Pachylaelapidae) was collected from Eleusine indica. Four species of Melicharidae were identified from 34 mites collected from within tree samples and 1,190 from vine or ground cover plants: Proctolaelaps lobatus was the most abundant species with 1,177 specimens collected from seven ground cover plants. One Phorytocarpais fimetorum (Parasitidae) was collected from inner leaves and four from twigs. Species of Ascidae, Blattisociidae, Melicharidae, Laelapidae and Pachylaelapidae were collected from 31 of the 82 vine or ground cover plants sampled, representing only a small fraction of the total number of Phytoseiidae collected from the same plants. Including the
Stadler, Tanja; Degnan, James H.; Rosenberg, Noah A.
2016-01-01
Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth–death and multispecies coalescent model can explain the difference in empirical trees and birth–death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785
Stadler, Tanja; Degnan, James H; Rosenberg, Noah A
2016-07-01
Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth-death and multispecies coalescent model can explain the difference in empirical trees and birth-death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785
Decision Tree Modeling for Ranking Data
NASA Astrophysics Data System (ADS)
Yu, Philip L. H.; Wan, Wai Ming; Lee, Paul H.
Ranking/preference data arises from many applications in marketing, psychology, and politics. We establish a new decision tree model for the analysis of ranking data by adopting the concept of classification and regression tree. The existing splitting criteria are modified in a way that allows them to precisely measure the impurity of a set of ranking data. Two types of impurity measures for ranking data are introduced, namelyg-wise and top-k measures. Theoretical results show that the new measures exhibit properties of impurity functions. In model assessment, the area under the ROC curve (AUC) is applied to evaluate the tree performance. Experiments are carried out to investigate the predictive performance of the tree model for complete and partially ranked data and promising results are obtained. Finally, a real-world application of the proposed methodology to analyze a set of political rankings data is presented.
Multinomial logistic regression ensembles.
Lee, Kyewon; Ahn, Hongshik; Moon, Hojin; Kodell, Ralph L; Chen, James J
2013-05-01
This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model. PMID:23611203
Bayesian Spatial Quantile Regression
Reich, Brian J.; Fuentes, Montserrat; Dunson, David B.
2013-01-01
Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997–2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. PMID:23459794
Bayesian Spatial Quantile Regression.
Reich, Brian J; Fuentes, Montserrat; Dunson, David B
2011-03-01
Tropospheric ozone is one of the six criteria pollutants regulated by the United States Environmental Protection Agency under the Clean Air Act and has been linked with several adverse health effects, including mortality. Due to the strong dependence on weather conditions, ozone may be sensitive to climate change and there is great interest in studying the potential effect of climate change on ozone, and how this change may affect public health. In this paper we develop a Bayesian spatial model to predict ozone under different meteorological conditions, and use this model to study spatial and temporal trends and to forecast ozone concentrations under different climate scenarios. We develop a spatial quantile regression model that does not assume normality and allows the covariates to affect the entire conditional distribution, rather than just the mean. The conditional distribution is allowed to vary from site-to-site and is smoothed with a spatial prior. For extremely large datasets our model is computationally infeasible, and we develop an approximate method. We apply the approximate version of our model to summer ozone from 1997-2005 in the Eastern U.S., and use deterministic climate models to project ozone under future climate conditions. Our analysis suggests that holding all other factors fixed, an increase in daily average temperature will lead to the largest increase in ozone in the Industrial Midwest and Northeast. PMID:23459794
Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun
2016-07-01
In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. PMID:26861909
Fuzzy tree automata and syntactic pattern recognition.
Lee, E T
1982-04-01
An approach of representing patterns by trees and processing these trees by fuzzy tree automata is described. Fuzzy tree automata are defined and investigated. The results include that the class of fuzzy root-to-frontier recognizable ¿-trees is closed under intersection, union, and complementation. Thus, the class of fuzzy root-to-frontier recognizable ¿-trees forms a Boolean algebra. Fuzzy tree automata are applied to processing fuzzy tree representation of patterns based on syntactic pattern recognition. The grade of acceptance is defined and investigated. Quantitative measures of ``approximate isosceles triangle,'' ``approximate elongated isosceles triangle,'' ``approximate rectangle,'' and ``approximate cross'' are defined and used in the illustrative examples of this approach. By using these quantitative measures, a house, a house with high roof, and a church are also presented as illustrative examples. In addition, three fuzzy tree automata are constructed which have the capability of processing the fuzzy tree representations of ``fuzzy houses,'' ``houses with high roofs,'' and ``fuzzy churches,'' respectively. The results may have useful applications in pattern recognition, image processing, artificial intelligence, pattern database design and processing, image science, and pictorial information systems. PMID:21869062
Trees grow on money: urban tree canopy cover and environmental justice.
Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G; Zhou, Weiqi; McHale, Melissa; Grove, J Morgan; O'Neil-Dunne, Jarlath; McFadden, Joseph P; Buckley, Geoffrey L; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L
2015-01-01
This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman's correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns. PMID:25830303
Trees Grow on Money: Urban Tree Canopy Cover and Environmental Justice
Schwarz, Kirsten; Fragkias, Michail; Boone, Christopher G.; Zhou, Weiqi; McHale, Melissa; Grove, J. Morgan; O’Neil-Dunne, Jarlath; McFadden, Joseph P.; Buckley, Geoffrey L.; Childers, Dan; Ogden, Laura; Pincetl, Stephanie; Pataki, Diane; Whitmer, Ali; Cadenasso, Mary L.
2015-01-01
This study examines the distributional equity of urban tree canopy (UTC) cover for Baltimore, MD, Los Angeles, CA, New York, NY, Philadelphia, PA, Raleigh, NC, Sacramento, CA, and Washington, D.C. using high spatial resolution land cover data and census data. Data are analyzed at the Census Block Group levels using Spearman’s correlation, ordinary least squares regression (OLS), and a spatial autoregressive model (SAR). Across all cities there is a strong positive correlation between UTC cover and median household income. Negative correlations between race and UTC cover exist in bivariate models for some cities, but they are generally not observed using multivariate regressions that include additional variables on income, education, and housing age. SAR models result in higher r-square values compared to the OLS models across all cities, suggesting that spatial autocorrelation is an important feature of our data. Similarities among cities can be found based on shared characteristics of climate, race/ethnicity, and size. Our findings suggest that a suite of variables, including income, contribute to the distribution of UTC cover. These findings can help target simultaneous strategies for UTC goals and environmental justice concerns. PMID:25830303
Linear regression in astronomy. I
NASA Technical Reports Server (NTRS)
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Haasl, D.F.; Roberts, N.H.; Vesely, W.E.; Goldberg, F.F.
1981-01-01
This handbook describes a methodology for reliability analysis of complex systems such as those which comprise the engineered safety features of nuclear power generating stations. After an initial overview of the available system analysis approaches, the handbook focuses on a description of the deductive method known as fault tree analysis. The following aspects of fault tree analysis are covered: basic concepts for fault tree analysis; basic elements of a fault tree; fault tree construction; probability, statistics, and Boolean algebra for the fault tree analyst; qualitative and quantitative fault tree evaluation techniques; and computer codes for fault tree evaluation. Also discussed are several example problems illustrating the basic concepts of fault tree construction and evaluation.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Categorizing Ideas about Trees: A Tree of Trees
Fisler, Marie; Lecointre, Guillaume
2013-01-01
The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a “tree of trees.” Then, we categorize schools of tree-representations. Classical schools like “cladists” and “pheneticists” are recovered but others are not: “gradists” are separated into two blocks, one of them being called here “grade theoreticians.” We propose new interesting categories like the “buffonian school,” the “metaphoricians,” and those using “strictly genealogical classifications.” We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization. PMID:23950877
Forest Management Intensity Affects Aquatic Communities in Artificial Tree Holes
Petermann, Jana S.; Rohland, Anja; Sichardt, Nora; Lade, Peggy; Guidetti, Brenda; Weisser, Wolfgang W.; Gossner, Martin M.
2016-01-01
Forest management could potentially affect organisms in all forest habitats. However, aquatic communities in water-filled tree-holes may be especially sensitive because of small population sizes, the risk of drought and potential dispersal limitation. We set up artificial tree holes in forest stands subject to different management intensities in two regions in Germany and assessed the influence of local environmental properties (tree-hole opening type, tree diameter, water volume and water temperature) as well as regional drivers (forest management intensity, tree-hole density) on tree-hole insect communities (not considering other organisms such as nematodes or rotifers), detritus content, oxygen and nutrient concentrations. In addition, we compared data from artificial tree holes with data from natural tree holes in the same area to evaluate the methodological approach of using tree-hole analogues. We found that forest management had strong effects on communities in artificial tree holes in both regions and across the season. Abundance and species richness declined, community composition shifted and detritus content declined with increasing forest management intensity. Environmental variables, such as tree-hole density and tree diameter partly explained these changes. However, dispersal limitation, indicated by effects of tree-hole density, generally showed rather weak impacts on communities. Artificial tree holes had higher water temperatures (on average 2°C higher) and oxygen concentrations (on average 25% higher) than natural tree holes. The abundance of organisms was higher but species richness was lower in artificial tree holes. Community composition differed between artificial and natural tree holes. Negative management effects were detectable in both tree-hole systems, despite their abiotic and biotic differences. Our results indicate that forest management has substantial and pervasive effects on tree-hole communities and may alter their structure and
Forest Management Intensity Affects Aquatic Communities in Artificial Tree Holes.
Petermann, Jana S; Rohland, Anja; Sichardt, Nora; Lade, Peggy; Guidetti, Brenda; Weisser, Wolfgang W; Gossner, Martin M
2016-01-01
Forest management could potentially affect organisms in all forest habitats. However, aquatic communities in water-filled tree-holes may be especially sensitive because of small population sizes, the risk of drought and potential dispersal limitation. We set up artificial tree holes in forest stands subject to different management intensities in two regions in Germany and assessed the influence of local environmental properties (tree-hole opening type, tree diameter, water volume and water temperature) as well as regional drivers (forest management intensity, tree-hole density) on tree-hole insect communities (not considering other organisms such as nematodes or rotifers), detritus content, oxygen and nutrient concentrations. In addition, we compared data from artificial tree holes with data from natural tree holes in the same area to evaluate the methodological approach of using tree-hole analogues. We found that forest management had strong effects on communities in artificial tree holes in both regions and across the season. Abundance and species richness declined, community composition shifted and detritus content declined with increasing forest management intensity. Environmental variables, such as tree-hole density and tree diameter partly explained these changes. However, dispersal limitation, indicated by effects of tree-hole density, generally showed rather weak impacts on communities. Artificial tree holes had higher water temperatures (on average 2°C higher) and oxygen concentrations (on average 25% higher) than natural tree holes. The abundance of organisms was higher but species richness was lower in artificial tree holes. Community composition differed between artificial and natural tree holes. Negative management effects were detectable in both tree-hole systems, despite their abiotic and biotic differences. Our results indicate that forest management has substantial and pervasive effects on tree-hole communities and may alter their structure and
Heritability Estimation using Regression Models for Correlation
Lee, Hye-Seung; Paik, Myunghee Cho; Rundek, Tatjana; Sacco, Ralph L; Dong, Chuanhui; Krischer, Jeffrey P
2012-01-01
Heritability estimates a polygenic effect on a trait for a population. Reliable interpretation of heritability is critical in planning further genetic studies to locate a gene responsible for the trait. This study accommodates both single and multiple trait cases by employing regression models for correlation parameter to infer the heritability. Sharing the properties of regression approach, the proposed methods are exible to incorporate non-genetic and/or non-additive genetic information in the analysis. The performances of the proposed model are compared with those using the likelihood approach through simulations and carotid Intima Media Thickness analysis from Northern Manhattan family Study. PMID:22457844
Improving phylogenetic regression under complex evolutionary models.
Mazel, Florent; Davies, T Jonathan; Georges, Damien; Lavergne, Sébastien; Thuiller, Wilfried; Peres-NetoO, Pedro R
2016-02-01
Phylogenetic Generalized Least Square (PGLS) is the tool of choice among phylogenetic comparative methods to measure the correlation between species features such as morphological and life-history traits or niche characteristics. In its usual form, it assumes that the residual variation follows a homogenous model of evolution across the branches of the phylogenetic tree. Since a homogenous model of evolution is unlikely to be realistic in nature, we explored the robustness of the phylogenetic regression when this assumption is violated. We did so by simulating a set of traits under various heterogeneous models of evolution, and evaluating the statistical performance (type I error [the percentage of tests based on samples that incorrectly rejected a true null hypothesis] and power [the percentage of tests that correctly rejected a false null hypothesis]) of classical phylogenetic regression. We found that PGLS has good power but unacceptable type I error rates. This finding is important since this method has been increasingly used in comparative analyses over the last decade. To address this issue, we propose a simple solution based on transforming the underlying variance-covariance matrix to adjust for model heterogeneity within PGLS. We suggest that heterogeneous rates of evolution might be particularly prevalent in large phylogenetic trees, while most current approaches assume a homogenous rate of evolution. Our analysis demonstrates that overlooking rate heterogeneity can result in inflated type I errors, thus misleading comparative analyses. We show that it is possible to correct for this bias even when the underlying model of evolution is not known a priori. PMID:27145604
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Quantile regression for climate data
NASA Astrophysics Data System (ADS)
Marasinghe, Dilhani Shalika
Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile regression.Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis. When dealing with the non-normality and non constant variance assumption, quantile regression is a better candidate for the estimation of the derivative.
The Allometry of Coarse Root Biomass: Log-Transformed Linear Regression or Nonlinear Regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J.; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees. PMID:24116197
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA. PMID:11410035
Decision Tree Approach for Soil Liquefaction Assessment
Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498
Raven, John A; Andrews, Mitchell
2010-09-01
Using a broad definition of trees, the evolutionary origins of trees in a nutritional context is considered using data from the fossil record and molecular phylogeny. Trees are first known from the Late Devonian about 380 million years ago, originated polyphyletically at the pteridophyte grade of organization; the earliest gymnosperms were trees, and trees are polyphyletic in the angiosperms. Nutrient transporters, assimilatory pathways, homoiohydry (cuticle, intercellular gas spaces, stomata, endohydric water transport systems including xylem and phloem-like tissue) and arbuscular mycorrhizas preceded the origin of trees. Nutritional innovations that began uniquely in trees were the seed habit and, certainly (but not necessarily uniquely) in trees, ectomycorrhizas, cyanobacterial, actinorhizal and rhizobial (Parasponia, some legumes) diazotrophic symbioses and cluster roots. PMID:20581011
NASA Technical Reports Server (NTRS)
Buntine, Wray
1993-01-01
This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.
ERIC Educational Resources Information Center
Barry, Dana M.
1997-01-01
Provides details on the chemical composition of trees including a definition of wood. Also includes an activity on anthocyanins as well as a discussion of the resistance of wood to solvents and chemicals. Lists interesting products from trees. (DDR)
Category of trees in representation theory of quantum algebras
Moskaliuk, N. M.; Moskaliuk, S. S.
2013-10-15
New applications of categorical methods are connected with new additional structures on categories. One of such structures in representation theory of quantum algebras, the category of Kuznetsov-Smorodinsky-Vilenkin-Smirnov (KSVS) trees, is constructed, whose objects are finite rooted KSVS trees and morphisms generated by the transition from a KSVS tree to another one.
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. PMID:27188249
NASA Technical Reports Server (NTRS)
Buntine, Wray
1994-01-01
IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform. PMID:26835234
Wehenkel, Louis; Babu, M. Madan; Geurts, Pierre
2015-01-01
Networks are ubiquitous in biology, and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as a classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the latter for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods. PMID:26008881
ERIC Educational Resources Information Center
Sweeney, Debra; Rounds, Judy
2011-01-01
Trees are great inspiration for artists. Many art teachers find themselves inspired and maybe somewhat obsessed with the natural beauty and elegance of the lofty tree, and how it changes through the seasons. One such tree that grows in several regions and always looks magnificent, regardless of the time of year, is the birch. In this article, the…
Max, N
2002-08-19
This paper is a survey of the author's work on illumination and shadows under trees, including the effects of sky illumination, sun penumbras, scattering in a misty atmosphere below the trees, and multiple scattering and transmission between leaves. It also describes a hierarchical image-based rendering method for trees.
Minnesota's Forest Trees. Revised.
ERIC Educational Resources Information Center
Miles, William R.; Fuller, Bruce L.
This bulletin describes 46 of the more common trees found in Minnesota's forests and windbreaks. The bulletin contains two tree keys, a summer key and a winter key, to help the reader identify these trees. Besides the two keys, the bulletin includes an introduction, instructions for key use, illustrations of leaf characteristics and twig…