Sample records for conditional inference tree

  1. Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis

    USDA-ARS?s Scientific Manuscript database

    Objective: To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. Methods: The conditional inference tree analysis, a data mining approach, was used to con...

  2. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  3. Determinants of establishment survival for residential trees in Sacramento County, CA

    Treesearch

    Lara A. Roman; John J. Battles; Joe R. McBride

    2014-01-01

    Urban forests can provide ecosystem services that motivate tree planting campaigns, and tree survival is a key element of program success and projected benefits. We studied survival in a shade tree give-away program in Sacramento, CA, monitoring a cohort of young trees for five years on single-family residential properties. We used conditional inference trees to...

  4. Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference.

    PubMed

    Chernomor, Olga; Minh, Bui Quang; von Haeseler, Arndt

    2015-12-01

    In phylogenomic analysis the collection of trees with identical score (maximum likelihood or parsimony score) may hamper tree search algorithms. Such collections are coined phylogenetic terraces. For sparse supermatrices with a lot of missing data, the number of terraces and the number of trees on the terraces can be very large. If terraces are not taken into account, a lot of computation time might be unnecessarily spent to evaluate many trees that in fact have identical score. To save computation time during the tree search, it is worthwhile to quickly identify such cases. The score of a species tree is the sum of scores for all the so-called induced partition trees. Therefore, if the topological rearrangement applied to a species tree does not change the induced partition trees, the score of these partition trees is unchanged. Here, we provide the conditions under which the three most widely used topological rearrangements (nearest neighbor interchange, subtree pruning and regrafting, and tree bisection and reconnection) change the topologies of induced partition trees. During the tree search, these conditions allow us to quickly identify whether we can save computation time on the evaluation of newly encountered trees. We also introduce the concept of partial terraces and demonstrate that they occur more frequently than the original "full" terrace. Hence, partial terrace is the more important factor of timesaving compared to full terrace. Therefore, taking into account the above conditions and the partial terrace concept will help to speed up the tree search in phylogenomic inference.

  5. Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference

    PubMed Central

    Minh, Bui Quang; von Haeseler, Arndt

    2015-01-01

    Abstract In phylogenomic analysis the collection of trees with identical score (maximum likelihood or parsimony score) may hamper tree search algorithms. Such collections are coined phylogenetic terraces. For sparse supermatrices with a lot of missing data, the number of terraces and the number of trees on the terraces can be very large. If terraces are not taken into account, a lot of computation time might be unnecessarily spent to evaluate many trees that in fact have identical score. To save computation time during the tree search, it is worthwhile to quickly identify such cases. The score of a species tree is the sum of scores for all the so-called induced partition trees. Therefore, if the topological rearrangement applied to a species tree does not change the induced partition trees, the score of these partition trees is unchanged. Here, we provide the conditions under which the three most widely used topological rearrangements (nearest neighbor interchange, subtree pruning and regrafting, and tree bisection and reconnection) change the topologies of induced partition trees. During the tree search, these conditions allow us to quickly identify whether we can save computation time on the evaluation of newly encountered trees. We also introduce the concept of partial terraces and demonstrate that they occur more frequently than the original “full” terrace. Hence, partial terrace is the more important factor of timesaving compared to full terrace. Therefore, taking into account the above conditions and the partial terrace concept will help to speed up the tree search in phylogenomic inference. PMID:26448206

  6. Calibrated birth-death phylogenetic time-tree priors for bayesian inference.

    PubMed

    Heled, Joseph; Drummond, Alexei J

    2015-05-01

    Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  7. Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis.

    PubMed

    Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L

    2017-07-01

    To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.

  8. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  9. Design and implementation of the tree-based fuzzy logic controller.

    PubMed

    Liu, B D; Huang, C Y

    1997-01-01

    In this paper, a tree-based approach is proposed to design the fuzzy logic controller. Based on the proposed methodology, the fuzzy logic controller has the following merits: the fuzzy control rule can be extracted automatically from the input-output data of the system and the extraction process can be done in one-pass; owing to the fuzzy tree inference structure, the search spaces of the fuzzy inference process are largely reduced; the operation of the inference process can be simplified as a one-dimensional matrix operation because of the fuzzy tree approach; and the controller has regular and modular properties, so it is easy to be implemented by hardware. Furthermore, the proposed fuzzy tree approach has been applied to design the color reproduction system for verifying the proposed methodology. The color reproduction system is mainly used to obtain a color image through the printer that is identical to the original one. In addition to the software simulation, an FPGA is used to implement the prototype hardware system for real-time application. Experimental results show that the effect of color correction is quite good and that the prototype hardware system can operate correctly under the condition of 30 MHz clock rate.

  10. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.

    PubMed

    Sayyari, Erfan; Mirarab, Siavash

    2016-11-11

    Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.

  11. A new fast method for inferring multiple consensus trees using k-medoids.

    PubMed

    Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir

    2018-04-05

    Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.

  12. The effects of urban warming on herbivore abundance and street tree condition.

    PubMed

    Dale, Adam G; Frank, Steven D

    2014-01-01

    Trees are essential to urban habitats because they provide services that benefit the environment and improve human health. Unfortunately, urban trees often have more herbivorous insect pests than rural trees but the mechanisms and consequences of these infestations are not well documented. Here, we examine how temperature affects the abundance of a scale insect, Melanaspis tenebricosa (Comstock) (Hemiptera: Diaspididae), on one of the most commonly planted street trees in the eastern U.S. Next, we examine how both pest abundance and temperature are associated with water stress, growth, and condition of 26 urban street trees. Although trees in the warmest urban sites grew the most, they were more water stressed and in worse condition than trees in cooler sites. Our analyses indicate that visible declines in tree condition were best explained by scale-insect infestation rather than temperature. To test the broader relevance of these results, we extend our analysis to a database of more than 2700 Raleigh, US street trees. Plotting these trees on a Landsat thermal image of Raleigh, we found that warmer sites had over 70% more trees in poor condition than those in cooler sites. Our results support previous studies linking warmer urban habitats to greater pest abundance and extend this association to show its effect on street tree condition. Our results suggest that street tree condition and ecosystem services may decline as urban expansion and global warming exacerbate the urban heat island effect. Although our non-probability sampling method limits our scope of inference, our results present a gloomy outlook for urban forests and emphasize the need for management tools. Existing urban tree inventories and thermal maps could be used to identify species that would be most suitable for urban conditions.

  13. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

    PubMed Central

    2014-01-01

    Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates. PMID:24678701

  14. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines.

    PubMed

    DeGiorgio, Michael; Syring, John; Eckert, Andrew J; Liston, Aaron; Cronn, Richard; Neale, David B; Rosenberg, Noah A

    2014-03-29

    As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each "strategy" for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.

  15. Tree growth inference and prediction from diameter censuses and ring widths

    Treesearch

    James S. Clark; Michael Wolosin; Michael Dietze; Ines Ibanez; Shannon LaDeau; Miranda Welsh; Brian Kloeppel

    2007-01-01

    Knowledge of tree growth is needed to understand population dynamics (Condit et al. 1993, Fastie 1995, Frelich and Reich 1995, Clark and Clark 1999, Wyckoff and Clark 2002, 2005, Webster and Lorimer 2005), species interactions (Swetnam and Lynch 1993), carbon sequestration (DeLucia et al. 1999, Casperson et al. 2000), forest response to climate change (Cook 1987,...

  16. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

    PubMed

    Wu, Yufeng

    2012-03-01

    Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.

  17. Faults Discovery By Using Mined Data

    NASA Technical Reports Server (NTRS)

    Lee, Charles

    2005-01-01

    Fault discovery in the complex systems consist of model based reasoning, fault tree analysis, rule based inference methods, and other approaches. Model based reasoning builds models for the systems either by mathematic formulations or by experiment model. Fault Tree Analysis shows the possible causes of a system malfunction by enumerating the suspect components and their respective failure modes that may have induced the problem. The rule based inference build the model based on the expert knowledge. Those models and methods have one thing in common; they have presumed some prior-conditions. Complex systems often use fault trees to analyze the faults. Fault diagnosis, when error occurs, is performed by engineers and analysts performing extensive examination of all data gathered during the mission. International Space Station (ISS) control center operates on the data feedback from the system and decisions are made based on threshold values by using fault trees. Since those decision-making tasks are safety critical and must be done promptly, the engineers who manually analyze the data are facing time challenge. To automate this process, this paper present an approach that uses decision trees to discover fault from data in real-time and capture the contents of fault trees as the initial state of the trees.

  18. Accounting for Uncertainty in Gene Tree Estimation: Summary-Coalescent Species Tree Inference in a Challenging Radiation of Australian Lizards.

    PubMed

    Blom, Mozes P K; Bragg, Jason G; Potter, Sally; Moritz, Craig

    2017-05-01

    Accurate gene tree inference is an important aspect of species tree estimation in a summary-coalescent framework. Yet, in empirical studies, inferred gene trees differ in accuracy due to stochastic variation in phylogenetic signal between targeted loci. Empiricists should, therefore, examine the consistency of species tree inference, while accounting for the observed heterogeneity in gene tree resolution of phylogenomic data sets. Here, we assess the impact of gene tree estimation error on summary-coalescent species tree inference by screening ${\\sim}2000$ exonic loci based on gene tree resolution prior to phylogenetic inference. We focus on a phylogenetically challenging radiation of Australian lizards (genus Cryptoblepharus, Scincidae) and explore effects on topology and support. We identify a well-supported topology based on all loci and find that a relatively small number of high-resolution gene trees can be sufficient to converge on the same topology. Adding gene trees with decreasing resolution produced a generally consistent topology, and increased confidence for specific bipartitions that were poorly supported when using a small number of informative loci. This corroborates coalescent-based simulation studies that have highlighted the need for a large number of loci to confidently resolve challenging relationships and refutes the notion that low-resolution gene trees introduce phylogenetic noise. Further, our study also highlights the value of quantifying changes in nodal support across locus subsets of increasing size (but decreasing gene tree resolution). Such detailed analyses can reveal anomalous fluctuations in support at some nodes, suggesting the possibility of model violation. By characterizing the heterogeneity in phylogenetic signal among loci, we can account for uncertainty in gene tree estimation and assess its effect on the consistency of the species tree estimate. We suggest that the evaluation of gene tree resolution should be incorporated in the analysis of empirical phylogenomic data sets. This will ultimately increase our confidence in species tree estimation using summary-coalescent methods and enable us to exploit genomic data for phylogenetic inference. [Coalescence; concatenation; Cryptoblepharus; exon capture; gene tree; phylogenomics; species tree.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.

  19. PhySIC_IST: cleaning source trees to infer more informative supertrees

    PubMed Central

    Scornavacca, Celine; Berry, Vincent; Lefort, Vincent; Douzery, Emmanuel JP; Ranwez, Vincent

    2008-01-01

    Background Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees. Results To overcome this problem, we propose to infer non-plenary supertrees, i.e. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the PhySIC veto method called PhySIC_IST that can infer non-plenary supertrees. PhySIC_IST aims at inferring supertrees that satisfy the same appealing theoretical properties as with PhySIC, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter. Performing large-scale simulations, we observe that STC+PhySIC_IST infers much more informative supertrees than PhySIC, while preserving low type I error compared to the well-known MRP method. Two biological case studies on animals confirm that the STC preprocess successfully detects anomalies in the source trees while STC+PhySIC_IST provides well-resolved supertrees agreeing with current knowledge in systematics. Conclusion The paper introduces and tests two new methodologies, PhySIC_IST and STC, that demonstrate the interest in inferring non-plenary supertrees as well as preprocessing the source trees. An implementation of the methods is available at: . PMID:18834542

  20. PhySIC_IST: cleaning source trees to infer more informative supertrees.

    PubMed

    Scornavacca, Celine; Berry, Vincent; Lefort, Vincent; Douzery, Emmanuel J P; Ranwez, Vincent

    2008-10-04

    Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees. To overcome this problem, we propose to infer non-plenary supertrees, i.e. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the PhySIC veto method called PhySIC_IST that can infer non-plenary supertrees. PhySIC_IST aims at inferring supertrees that satisfy the same appealing theoretical properties as with PhySIC, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter.Performing large-scale simulations, we observe that STC+PhySIC_IST infers much more informative supertrees than PhySIC, while preserving low type I error compared to the well-known MRP method. Two biological case studies on animals confirm that the STC preprocess successfully detects anomalies in the source trees while STC+PhySIC_IST provides well-resolved supertrees agreeing with current knowledge in systematics. The paper introduces and tests two new methodologies, PhySIC_IST and STC, that demonstrate the interest in inferring non-plenary supertrees as well as preprocessing the source trees. An implementation of the methods is available at: http://www.atgc-montpellier.fr/physic_ist/.

  1. Tree-Structured Infinite Sparse Factor Model

    PubMed Central

    Zhang, XianXing; Dunson, David B.; Carin, Lawrence

    2013-01-01

    A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structure) of the tree. In addition to developing the model, theoretical properties of the TMGP are addressed, and a novel MCMC sampler is developed. The structure of the inferred tree is used to learn relationships between high-dimensional data, and the model is also applied to compressive sensing and interpolation of incomplete images. PMID:25279389

  2. Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.

    PubMed

    Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung

    2015-01-01

    To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.

  3. STRIDE: Species Tree Root Inference from Gene Duplication Events.

    PubMed

    Emms, David M; Kelly, Steven

    2017-12-01

    The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

    PubMed Central

    Zhou, Xiaofan; Shen, Xing-Xing; Hittinger, Chris Todd

    2018-01-01

    Abstract The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses. PMID:29177474

  5. Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

    PubMed Central

    2013-01-01

    Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets. PMID:24180377

  6. Examining the influences of tree-to-tree competition and climate on size-growth relationships in hydric, multi-aged Fraxinus nigra stands

    Treesearch

    Christopher E. Looney; Anthony W. D' Amato; Shawn Fraver; Brian J. Palik; Michael R. Reinikainen

    2016-01-01

    Most research on tree-tree competition and size-growth relationship (SGR – a stand-level metric that infers the relative efficiency with which different sized trees utilize available resources) has focused on upland systems. It is unclear if inferences from these studies extend to wetland forests. Moreover, no study to date has thoroughly investigated the relationship...

  7. A tree-parenchyma coupled model for lung ventilation simulation.

    PubMed

    Pozin, Nicolas; Montesantos, Spyridon; Katz, Ira; Pichelin, Marine; Vignon-Clementel, Irene; Grandmont, Céline

    2017-11-01

    In this article, we develop a lung ventilation model. The parenchyma is described as an elastic homogenized media. It is irrigated by a space-filling dyadic resistive pipe network, which represents the tracheobronchial tree. In this model, the tree and the parenchyma are strongly coupled. The tree induces an extra viscous term in the system constitutive relation, which leads, in the finite element framework, to a full matrix. We consider an efficient algorithm that takes advantage of the tree structure to enable a fast matrix-vector product computation. This framework can be used to model both free and mechanically induced respiration, in health and disease. Patient-specific lung geometries acquired from computed tomography scans are considered. Realistic Dirichlet boundary conditions can be deduced from surface registration on computed tomography images. The model is compared to a more classical exit compartment approach. Results illustrate the coupling between the tree and the parenchyma, at global and regional levels, and how conditions for the purely 0D model can be inferred. Different types of boundary conditions are tested, including a nonlinear Robin model of the surrounding lung structures. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes.

    PubMed

    Ruane, Sara; Raxworthy, Christopher J; Lemmon, Alan R; Lemmon, Emily Moriarty; Burbrink, Frank T

    2015-10-12

    Using molecular data generated by high throughput next generation sequencing (NGS) platforms to infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome goes up. While there is a general consensus that greater numbers of independent loci should result in more robust phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar's serpent diversity, yet they are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights concerning intergeneric relationships for the pseudoxyrhophiines. We also compared how these and other methods performed with respect to estimating tree topology using datasets with varying numbers of loci. Using Sanger sequencing and an anchored phylogenomics approach, we sequenced datasets comprised of 5 and 377 loci, respectively, for 23 pseudoxyrhophiine taxa. For each dataset, we estimated phylogenies using both gene-tree (concatenation) and species-tree (STAR, MP-EST) approaches. We determined the similarity of resulting tree topologies from the different datasets using Robinson-Foulds distances. In addition, we examined how subsets of these data performed compared to the complete Sanger and anchored datasets for phylogenetic accuracy using the same tree inference methodologies, as well as the program *BEAST to determine if a full coalescent model for species tree estimation could generate robust results with fewer loci compared to the summary statistics species tree approaches. We also examined the individual gene trees in comparison to the 377-locus species tree using the program MetaTree. Using the full anchored dataset under a variety of methods gave us the same, well-supported phylogeny for pseudoxyrhophiines. The African pseudoxyrhophiine Duberria is the sister taxon to the Malagasy pseudoxyrhophiines genera, providing evidence for a monophyletic radiation in Madagascar. In addition, within Madagascar, the two major clades inferred correspond largely to the aglyphous and opisthoglyphous genera, suggesting that feeding specializations associated with tooth venom delivery may have played a major role in the early diversification of this radiation. The comparison of tree topologies from the concatenated and species-tree methods using different datasets indicated the 5-locus dataset cannot beused to infer a correct phylogeny for the pseudoxyrhophiines under any method tested here and that summary statistics methods require 50 or more loci to consistently recover the species-tree inferred using the complete anchored dataset. However, as few as 15 loci may infer the correct topology when using the full coalescent species tree method *BEAST. MetaTree analyses of each gene tree from the Sanger and anchored datasets found that none of the individual gene trees matched the 377-locus species tree, and that no gene trees were identical with respect to topology. Our results suggest that ≥50 loci may be necessary to confidently infer phylogenies when using summaryspecies-tree methods, but that the coalescent-based method *BEAST consistently recovers the same topology using only 15 loci. These results reinforce that datasets with small numbers of markers may result in misleading topologies, and further, that the method of inference used to generate a phylogeny also has a major influence on the number of loci necessary to infer robust species trees.

  9. Recursive algorithms for phylogenetic tree counting.

    PubMed

    Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

    2013-10-28

    In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.

  10. Pollen reconstructions, tree-rings and early climate data from Minnesota, USA: a cautionary tale of bias and signal attentuation

    NASA Astrophysics Data System (ADS)

    St-Jacques, J. M.; Cumming, B. F.; Smol, J. P.; Sauchyn, D.

    2015-12-01

    High-resolution proxy reconstructions are essential to assess the rate and magnitude of anthropogenic global warming. High-resolution pollen records are being critically examined for the production of accurate climate reconstructions of the last millennium, often as extensions of tree-ring records. Past climate inference from a sedimentary pollen record depends upon the stationarity of the pollen-climate relationship. However, humans have directly altered vegetation, and hence modern pollen deposition is a product of landscape disturbance and climate, unlike in the past with its dominance of climate-derived processes. This could cause serious bias in pollen reconstructions. In the US Midwest, direct human impacts have greatly altered the vegetation and pollen rain since Euro-American settlement in the mid-19th century. Using instrumental climate data from the early 1800s from Fort Snelling (Minnesota), we assessed the bias from the conventional method of inferring climate from pollen assemblages in comparison to a calibration set from pre-settlement pollen assemblages and the earliest instrumental climate data. The pre-settlement calibration set provides more accurate reconstructions of 19th century temperature than the modern set does. When both calibration sets are used to reconstruct temperatures since AD 1116 from a varve-dated pollen record from Lake Mina, Minnesota, the conventional method produces significant low-frequency (centennial-scale) signal attenuation and positive bias of 0.8-1.7 oC, resulting in an overestimation of Little Ice Age temperature and an underestimation of anthropogenic warming. We also compared the pollen-inferred moisture reconstruction to a four-century tree-ring-inferred moisture record from Minnesota and Dakotas, which shows that the tree-ring reconstruction is biased towards dry conditions and records wet periods relatively poorly, giving a false impression of regional aridity. The tree-ring chronology also suggests varve chronology problems. It remains to be explored how widespread this landscape disturbance problem is when conventional pollen-based inference methods are used, and consequently how seriously regional manifestations of global warming might have been underestimated with traditional pollen-based techniques.

  11. Inferring patterns in mitochondrial DNA sequences through hypercube independent spanning trees.

    PubMed

    Silva, Eduardo Sant Ana da; Pedrini, Helio

    2016-03-01

    Given a graph G, a set of spanning trees rooted at a vertex r of G is said vertex/edge independent if, for each vertex v of G, v≠r, the paths of r to v in any pair of trees are vertex/edge disjoint. Independent spanning trees (ISTs) provide a number of advantages in data broadcasting due to their fault tolerant properties. For this reason, some studies have addressed the issue by providing mechanisms for constructing independent spanning trees efficiently. In this work, we investigate how to construct independent spanning trees on hypercubes, which are generated based upon spanning binomial trees, and how to use them to predict mitochondrial DNA sequence parts through paths on the hypercube. The prediction works both for inferring mitochondrial DNA sequences comprised of six bases as well as infer anomalies that probably should not belong to the mitochondrial DNA standard. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies

    PubMed Central

    Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros

    2015-01-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths. PMID:26227865

  13. Disentangling methodological and biological sources of gene tree discordance on Oryza (Poaceae) chromosome 3.

    PubMed

    Zwickl, Derrick J; Stein, Joshua C; Wing, Rod A; Ware, Doreen; Sanderson, Michael J

    2014-09-01

    We describe new methods for characterizing gene tree discordance in phylogenomic data sets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Using an exceptionally complete set of genome sequences for the short arm of chromosome 3 in Oryza (rice) species, we applied these methods to identify the causes and consequences of differing patterns of discordance in the sets of gene trees inferred using a panel of 20 distinct analysis pipelines. We found that discordance patterns were strongly affected by aspects of data selection, alignment, and alignment masking. Unusual patterns of discordance evident when using certain pipelines were reduced or eliminated by using alternative pipelines, suggesting that they were the product of methodological biases rather than evolutionary processes. In some cases, once such biases were eliminated, evolutionary processes such as introgression could be implicated. Additionally, patterns of gene tree discordance had significant downstream impacts on species tree inference. For example, inference from supermatrices was positively misleading when pipelines that led to biased gene trees were used. Several results may generalize to other data sets: we found that gene tree and species tree inference gave more reasonable results when intron sequence was included during sequence alignment and tree inference, the alignment software PRANK was used, and detectable "block-shift" alignment artifacts were removed. We discuss our findings in the context of well-established relationships in Oryza and continuing controversies regarding the domestication history of O. sativa. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Clustering Genes of Common Evolutionary History

    PubMed Central

    Gori, Kevin; Suchan, Tomasz; Alvarez, Nadir; Goldman, Nick; Dessimoz, Christophe

    2016-01-01

    Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent—due to events such as incomplete lineage sorting or horizontal gene transfer—it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such “process-agnostic” approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward’s method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta. We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl). PMID:26893301

  15. Inferring biome-scale net primary productivity from tree-ring isotopes

    NASA Astrophysics Data System (ADS)

    Pederson, N.; Levesque, M.; Williams, A. P.; Hobi, M. L.; Smith, W. K.; Andreu-Hayles, L.

    2017-12-01

    Satellite estimates of vegetation growth (net primary productivity; NPP), tree-ring records, and forest inventories indicate that ongoing climate change and rising atmospheric CO2 concentration are altering productivity and carbon storage of forests worldwide. The impact of global change on the trends of NPP, however, remain unknown because of the lack of long-term high-resolution NPP data. For the first time, we tested if annually resolved carbon (δ13C) and oxygen (δ18O) stable isotopes from the cellulose of tree rings from trees in temperate regions could be used as a tool for inferring NPP across spatiotemporal scales. We compared satellite NPP estimates from the moderate-resolution imaging spectroradiometer sensor (MODIS, product MOD17A) and a newly developed global NPP dataset derived from the Global Inventory Modeling and Mapping Studies (GIMMS) dataset to annually resolved tree-ring width and δ13C and δ18O records from four sites along a hydroclimatic gradient in Eastern and Central United States. We found strong correlations across large geographical regions between satellite-derived NPP and tree-ring isotopes that ranged from -0.40 to -0.91. Notably, tree-ring derived δ18O had the strongest relation to climate. The results were consistent among the studied tree species (Quercus rubra and Liriodendron tulipifera) and along the hydroclimatic conditions of our network. Our study indicates that tree-ring isotopes can potentially be used to reconstruct NPP in time and space. As such, our findings represent an important breakthrough for estimating long-term changes in vegetation productivity at the biome scale.

  16. The Reliability and Stability of an Inferred Phylogenetic Tree from Empirical Data.

    PubMed

    Katsura, Yukako; Stanley, Craig E; Kumar, Sudhir; Nei, Masatoshi

    2017-03-01

    The reliability of a phylogenetic tree obtained from empirical data is usually measured by the bootstrap probability (Pb) of interior branches of the tree. If the bootstrap probability is high for most branches, the tree is considered to be reliable. If some interior branches show relatively low bootstrap probabilities, we are not sure that the inferred tree is really reliable. Here, we propose another quantity measuring the reliability of the tree called the stability of a subtree. This quantity refers to the probability of obtaining a subtree (Ps) of an inferred tree obtained. We then show that if the tree is to be reliable, both Pb and Ps must be high. We also show that Ps is given by a bootstrap probability of the subtree with the closest outgroup sequence, and computer program RESTA for computing the Pb and Ps values will be presented. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. The effects of inference method, population sampling, and gene sampling on species tree inferences: an empirical study in slender salamanders (Plethodontidae: Batrachoseps).

    PubMed

    Jockusch, Elizabeth L; Martínez-Solano, Iñigo; Timpe, Elizabeth K

    2015-01-01

    Species tree methods are now widely used to infer the relationships among species from multilocus data sets. Many methods have been developed, which differ in whether gene and species trees are estimated simultaneously or sequentially, and in how gene trees are used to infer the species tree. While these methods perform well on simulated data, less is known about what impacts their performance on empirical data. We used a data set including five nuclear genes and one mitochondrial gene for 22 species of Batrachoseps to compare the effects of method of analysis, within-species sampling and gene sampling on species tree inferences. For this data set, the choice of inference method had the largest effect on the species tree topology. Exclusion of individual loci had large effects in *BEAST and STEM, but not in MP-EST. Different loci carried the greatest leverage in these different methods, showing that the causes of their disproportionate effects differ. Even though substantial information was present in the nuclear loci, the mitochondrial gene dominated the *BEAST species tree. This leverage is inherent to the mtDNA locus and results from its high variation and lower assumed ploidy. This mtDNA leverage may be problematic when mtDNA has undergone introgression, as is likely in this data set. By contrast, the leverage of RAG1 in STEM analyses does not reflect properties inherent to the locus, but rather results from a gene tree that is strongly discordant with all others, and is best explained by introgression between distantly related species. Within-species sampling was also important, especially in *BEAST analyses, as shown by differences in tree topology across 100 subsampled data sets. Despite the sensitivity of the species tree methods to multiple factors, five species groups, the relationships among these, and some relationships within them, are generally consistently resolved for Batrachoseps. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. Microclimate and ecological threshold responses in a warming and wetting experiment following whole-tree harvest

    USDA-ARS?s Scientific Manuscript database

    Ecosystem climate manipulation experiments (ECMEs) are a key tool for predicting the effects of climate on ecosystems. However, the strength of inferences drawn from these experiments depends on whether the manipulated conditions mimic future climate changes. While ECMEs have examined mean tempera...

  19. Climate variability at the onset of the Younger Dryas as reflected in annually resolved tree-ring stable isotope chronologies

    NASA Astrophysics Data System (ADS)

    Pieper, H.; Helle, G.; Brauer, A.; Kaiser, K. F.; Miramont, C.

    2013-12-01

    The Younger Dryas interval during the Last Glacial Termination was an abrupt return to glacial-like conditions punctuating the transition to a warmer, interglacial climate. Despite recent advances in the layer counting of ice-core records of the termination, the timing and length of the Younger Dryas remain controversial. Late Glacial and early Holocene tree-ring chronologies are rare, however, they contain valuable information about past environmental conditions at annual time resolution. Changes in tree-ring growth rates can be related to past climate anomalies and changes in the carbon and oxygen isotope composition of tree-ring cellulose reflect atmospheric and hydrospheric changes. We are investigating a 860-year (13200 - 12340 cal BP) dated dendrochronological record of Late Glacial and Early Holocene chronologies of scots pine (Pinus sylvestris L.) from subfossil tree remnants from Barbiers River (Moyenne Durance, Southern French Alps), as well as from Swiss (Dättnau, Landikon and Gänziloh) sites. Dendro-ecological parameters, such as ring width and stable isotope variations (δ 13C und δ 18O) are used to infer past environmental conditions. We will present our first carbon and oxygen isotope records from tree rings reflecting the environmental changes at the Alleröd/Younger Dryas -transition.

  20. PhyloTreePruner: A Phylogenetic Tree-Based Approach for Selection of Orthologous Sequences for Phylogenomics.

    PubMed

    Kocot, Kevin M; Citarella, Mathew R; Moroz, Leonid L; Halanych, Kenneth M

    2013-01-01

    Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.

  1. RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination

    PubMed Central

    Mirzaei, Sajad; Wu, Yufeng

    2017-01-01

    Abstract Motivation: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. Results: In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT, which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT. The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. Availability and Implementation: RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus. Contacts: sajad@engr.uconn.edu or ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28065901

  2. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

    Treesearch

    Michael DeGiorgio; John Syring; Andrew J. Eckert; Aaron Liston; Richard Cronn; David B. Neale; Noah A. Rosenberg

    2014-01-01

    Background: As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models,...

  3. The prevalence of terraced treescapes in analyses of phylogenetic data sets.

    PubMed

    Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J

    2018-04-04

    The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

  4. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

    PubMed

    Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie

    2012-09-15

    Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.

  5. Forward modeling of tree-ring data: a case study with a global network

    NASA Astrophysics Data System (ADS)

    Breitenmoser, P. D.; Frank, D.; Brönnimann, S.

    2012-04-01

    Information derived from tree-rings is one of the most powerful tools presently available for studying past climatic variability as well as identifying fundamental relationships between tree-growth and climate. Climate reconstructions are typically performed by extending linear relationships, established during the overlapping period of instrumental and climate proxy archives into the past. Such analyses, however, are limited by methodological assumptions, including stationarity and linearity of the climate-proxy relationship. We investigate climate and tree-ring data using the Vaganov-Shashkin-Lite (VS-Lite) forward model of tree-ring width formation to examine the relations among actual tree growth and climate (as inferred from the simulated chronologies) to reconstruct past climate variability. The VS-lite model has been shown to produce skill comparable to that achieved using classical dendrochronological statistical modeling techniques when applied on simulations of a network of North American tree-ring chronologies. Although the detailed mechanistic processes such as photosynthesis, storage, or cell processes are not modeled directly, the net effect of the dominating nonlinear climatic controls on tree-growth are implemented into the model by the principle of limiting factors and threshold growth response functions. The VS-lite model requires as inputs only latitude, monthly mean temperature and monthly accumulated precipitation. Hence, this simple, process-based model enables ring-width simulation at any location where monthly climate records exist. In this study, we analyse the growth response of simulated tree-rings to monthly climate conditions obtained from the 20th century reanalysis project back to 1871. These simulated tree-ring chronologies are compared to the climate-driven variability in worldwide observed tree-ring chronologies from the International Tree Ring Database. Results point toward the suitability of the relationship among actual tree growth and climate (as inferred from the simulated chronologies) for use in global palaeoclimate reconstructions.

  6. Representations of the language recognition problem for a theorem prover

    NASA Technical Reports Server (NTRS)

    Minker, J.; Vanderbrug, G. J.

    1972-01-01

    Two representations of the language recognition problem for a theorem prover in first order logic are presented and contrasted. One of the representations is based on the familiar method of generating sentential forms of the language, and the other is based on the Cocke parsing algorithm. An augmented theorem prover is described which permits recognition of recursive languages. The state-transformation method developed by Cordell Green to construct problem solutions in resolution-based systems can be used to obtain the parse tree. In particular, the end-order traversal of the parse tree is derived in one of the representations. An inference system, termed the cycle inference system, is defined which makes it possible for the theorem prover to model the method on which the representation is based. The general applicability of the cycle inference system to state space problems is discussed. Given an unsatisfiable set S, where each clause has at most one positive literal, it is shown that there exists an input proof. The clauses for the two representations satisfy these conditions, as do many state space problems.

  7. Non-monophyly and intricate morphological evolution within the avian family Cettiidae revealed by multilocus analysis of a taxonomically densely sampled dataset

    PubMed Central

    2011-01-01

    Background The avian family Cettiidae, including the genera Cettia, Urosphena, Tesia, Abroscopus and Tickellia and Orthotomus cucullatus, has recently been proposed based on analysis of a small number of loci and species. The close relationship of most of these taxa was unexpected, and called for a comprehensive study based on multiple loci and dense taxon sampling. In the present study, we infer the relationships of all except one of the species in this family using one mitochondrial and three nuclear loci. We use traditional gene tree methods (Bayesian inference, maximum likelihood bootstrapping, parsimony bootstrapping), as well as a recently developed Bayesian species tree approach (*BEAST) that accounts for lineage sorting processes that might produce discordance between gene trees. We also analyse mitochondrial DNA for a larger sample, comprising multiple individuals and a large number of subspecies of polytypic species. Results There are many topological incongruences among the single-locus trees, although none of these is strongly supported. The multi-locus tree inferred using concatenated sequences and the species tree agree well with each other, and are overall well resolved and well supported by the data. The main discrepancy between these trees concerns the most basal split. Both methods infer the genus Cettia to be highly non-monophyletic, as it is scattered across the entire family tree. Deep intraspecific divergences are revealed, and one or two species and one subspecies are inferred to be non-monophyletic (differences between methods). Conclusions The molecular phylogeny presented here is strongly inconsistent with the traditional, morphology-based classification. The remarkably high degree of non-monophyly in the genus Cettia is likely to be one of the most extraordinary examples of misconceived relationships in an avian genus. The phylogeny suggests instances of parallel evolution, as well as highly unequal rates of morphological divergence in different lineages. This complex morphological evolution apparently misled earlier taxonomists. These results underscore the well-known but still often neglected problem of basing classifications on overall morphological similarity. Based on the molecular data, a revised taxonomy is proposed. Although the traditional and species tree methods inferred much the same tree in the present study, the assumption by species tree methods that all species are monophyletic is a limitation in these methods, as some currently recognized species might have more complex histories. PMID:22142197

  8. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    PubMed

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  9. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    PubMed Central

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  10. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  11. RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination.

    PubMed

    Mirzaei, Sajad; Wu, Yufeng

    2017-04-01

    : Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. : In this paper, we present a new approach called RENT+  for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+  builds on a previous genealogy inference approach called RENT , which infers a set of related genealogical trees at different genomic positions. RENT+  represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT . The key components of RENT+  are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+  is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+  in the inference of population demographic history from haplotypes, which outperforms several existing methods. : RENT+  is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus . : sajad@engr.uconn.edu or ywu@engr.uconn.edu. : Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    PubMed

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  13. Exact solutions for species tree inference from discordant gene trees.

    PubMed

    Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

    2013-10-01

    Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.

  14. Alternative methods of phylogenetic inference for the Patagonian lizard group Liolaemus elongatus-kriegi (Iguania: Liolaemini) based on mitochondrial and nuclear markers.

    PubMed

    Medina, Cintia Débora; Avila, Luciano Javier; Sites, Jack Walter; Santos, Juan; Morando, Mariana

    2018-03-01

    We present different approaches to a multi-locus phylogeny for the Liolaemus elongatus-kriegi group, including almost all species and recognized lineages. We sequenced two mitochondrial and five nuclear gene regions for 123 individuals from 35 taxa, and compared relationships resolved from concatenated and species tree methods. The L. elongatus-kriegi group was inferred as monophyletic in three of the five analyses (concatenated mitochondrial, concatenated mitochondrial + nuclear gene trees, and SVD quartet species tree). The mitochondrial gene tree resolved four haploclades, three corresponding to the previously recognized complexes: L. elongatus, L. kriegi and L. petrophilus complexes, and the L. punmahuida group. The BEAST species tree approach included the L. punmahuida group within the L. kriegi complex, but the SVD quartet method placed it as sister to the L. elongatus-kriegi group. BEAST inferred species of the L. elongatus and L. petrophilus complexes as one clade, while SVDquartet inferred these two complexes as monophyletic (although with no statistical support for the L. petrophilus complex). The species tree approach also included the L. punmahuida group as part of the L. elongatus-kriegi group. Our study provides detailed multilocus phylogenetic hypotheses for the L. elongatus-kriegi group, and we discuss possible reasons for differences in the concatenation and species tree methods. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Tree growth-climate relationships in a forest-plot network on Mediterranean mountains.

    PubMed

    Fyllas, Nikolaos M; Christopoulou, Anastasia; Galanidis, Alexandros; Michelaki, Chrysanthi Z; Dimitrakopoulos, Panayiotis G; Fulé, Peter Z; Arianoutsou, Margarita

    2017-11-15

    In this study we analysed a novel tree-growth dataset, inferred from annual ring-width measurements, of 7 forest tree species from 12 mountain regions in Greece, in order to identify tree growth - climate relationships. The tree species of interest were: Abies cephalonica, Abies borisii-regis, Picea abies, Pinus nigra, Pinus sylvestris, Fagus sylvatica and Quercus frainetto growing across a gradient of climate conditions with mean annual temperature ranging from 5.7 to 12.6°C and total annual precipitation from 500 to 950mm. In total, 344 tree cores (one per tree) were analysed across a network of 20 study sites. We found that water availability during the summer period (May-August) was a strong predictor of interannual variation in tree growth for all study species. Across species and sites, annual tree growth was positively related to summer season precipitation (P SP ). The responsiveness of annual growth to P SP was tightly related to species and site specific measurements of instantaneous photosynthetic water use efficiency (WUE), suggesting that the growth of species with efficient water use is more responsive to variations in precipitation during the dry months of the year. Our findings support the importance of water availability for the growth of mountainous Mediterranean tree species and highlight that future reductions in precipitation are likely to lead to reduced tree-growth under climate change conditions. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Does the choice of nucleotide substitution models matter topologically?

    PubMed

    Hoff, Michael; Orf, Stefan; Riehm, Benedikt; Darriba, Diego; Stamatakis, Alexandros

    2016-03-24

    In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies. We find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study. We find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences.

  17. Condition trees as a mechanism for communicating the meaning of uncertainties

    NASA Astrophysics Data System (ADS)

    Beven, Keith

    2015-04-01

    Uncertainty communication for environmental problems is fraught with difficulty for good epistemic reasons. The fact that most sources of uncertainty are subject to, and often dominated by, epistemic uncertainties means that the unthinking use of probability theory might actually be misleading and lead to false inference (even in some cases where the assumptions of a probabilistic error model might seem to be reasonably valid). This therefore creates problems in communicating the meaning of probabilistic uncertainties of model predictions to potential users (there are many examples in hydrology, hydraulics, climate change and other domains). It is suggested that one way of being more explicit about the meaning of uncertainties is to associate each type of application with a condition tree of assumptions that need to be made in producing an estimate of uncertainty. The condition tree then provides a basis for discussion and communication of assumptions about uncertainties with users. Agreement of assumptions (albeit generally at some institutional level) will provide some buy-in on the part of users, and a basis for commissioning of future studies. Even in some relatively well-defined problems, such as mapping flood risk, such a condition tree can be rather extensive, but by making each step in the tree explicit then an audit trail is established for future reference. This can act to provide focus in the exercise of agreeing more realistic assumptions.

  18. Despite available habitat at range edge, yellow-cedar migration is punctuated with a past pulse tied to colder conditions

    Treesearch

    John Krapek; Paul E. Hennon; David V. D' Amore; Brian Buma

    2017-01-01

    Aim: To explore the recent (past ~1,000 year) migration history of yellow-cedar (Callitropsis nootkatensis), a climate-threatened tree, which appears to lag behind its potential climatic niche at a leading northern range edge, and infer its continued migration potential under changing climate. Location:...

  19. Irrational exuberance for resolved species trees.

    PubMed

    Hahn, Matthew W; Nakhleh, Luay

    2016-01-01

    Phylogenomics has largely succeeded in its aim of accurately inferring species trees, even when there are high levels of discordance among individual gene trees. These resolved species trees can be used to ask many questions about trait evolution, including the direction of change and number of times traits have evolved. However, the mapping of traits onto trees generally uses only a single representation of the species tree, ignoring variation in the gene trees used to construct it. Recognizing that genes underlie traits, these results imply that many traits follow topologies that are discordant with the species topology. As a consequence, standard methods for character mapping will incorrectly infer the number of times a trait has evolved. This phenomenon, dubbed "hemiplasy," poses many problems in analyses of character evolution. Here we outline these problems, explaining where and when they are likely to occur. We offer several ways in which the possible presence of hemiplasy can be diagnosed, and discuss multiple approaches to dealing with the problems presented by underlying gene tree discordance when carrying out character mapping. Finally, we discuss the implications of hemiplasy for general phylogenetic inference, including the possible drawbacks of the widespread push for "resolved" species trees. © 2015 The Author(s). Evolution © 2015 The Society for the Study of Evolution.

  20. Phylodynamic Inference with Kernel ABC and Its Application to HIV Epidemiology.

    PubMed

    Poon, Art F Y

    2015-09-01

    The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  1. Faster Mass Spectrometry-based Protein Inference: Junction Trees are More Efficient than Sampling and Marginalization by Enumeration

    PubMed Central

    Serang, Oliver; Noble, William Stafford

    2012-01-01

    The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862

  2. Salient measures of inhibition and switching are associated with frontal lobe gray matter volume in healthy middle-aged and older adults.

    PubMed

    Adólfsdóttir, Steinunn; Haász, Judit; Wehling, Eike; Ystad, Martin; Lundervold, Arvid; Lundervold, Astri J

    2014-11-01

    To investigate brain-behavior relationships between morphometric brain measures and salient executive function (EF) measures of inhibition and switching. One hundred participants (49-80 years) performed the Color Word Interference Test from the Delis-Kaplan Executive Function System (D-KEFS). Salient measures of EF components of inhibition and switching, of which the effect of more fundamental skills were regressed out, were analyzed using linear models and a conditional inference trees analysis taking intercorrelations between predictor variables (brain volumes, age, gender, and education) into account. The conditional inference trees analysis demonstrated a primary role of the middle frontal gyrus (MFG) in explaining variations in the salient EF measure of switching and combined inhibition/switching. Age predicted measures of inhibition. The study highlights the importance of considering fundamental cognitive skills and the use of a statistical method taking possible complex relationships between predictor variables into account when interpreting standard EF test results. Further studies should include MRI measures representing neural networks that may relate to CWIT performance, and longitudinal studies are required to investigate any causal relationships. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  3. The probabilistic convolution tree: efficient exact Bayesian inference for faster LC-MS/MS protein inference.

    PubMed

    Serang, Oliver

    2014-01-01

    Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called "causal independence"). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to O(k log(k)2) and the space to O(k log(k)) where k is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.

  4. The Probabilistic Convolution Tree: Efficient Exact Bayesian Inference for Faster LC-MS/MS Protein Inference

    PubMed Central

    Serang, Oliver

    2014-01-01

    Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234

  5. On the Accuracy of Language Trees

    PubMed Central

    Pompei, Simone; Loreto, Vittorio; Tria, Francesca

    2011-01-01

    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it. PMID:21674034

  6. From cacti to carnivores: Improved phylotranscriptomic sampling and hierarchical homology inference provide further insight into the evolution of Caryophyllales.

    PubMed

    Walker, Joseph F; Yang, Ya; Feng, Tao; Timoneda, Alfonso; Mikenas, Jessica; Hutchison, Vera; Edwards, Caroline; Wang, Ning; Ahluwalia, Sonia; Olivieri, Julia; Walker-Hale, Nathanael; Majure, Lucas C; Puente, Raúl; Kadereit, Gudrun; Lauterbach, Maximilian; Eggli, Urs; Flores-Olvera, Hilda; Ochoterena, Helga; Brockington, Samuel F; Moore, Michael J; Smith, Stephen A

    2018-03-01

    The Caryophyllales contain ~12,500 species and are known for their cosmopolitan distribution, convergence of trait evolution, and extreme adaptations. Some relationships within the Caryophyllales, like those of many large plant clades, remain unclear, and phylogenetic studies often recover alternative hypotheses. We explore the utility of broad and dense transcriptome sampling across the order for resolving evolutionary relationships in Caryophyllales. We generated 84 transcriptomes and combined these with 224 publicly available transcriptomes to perform a phylogenomic analysis of Caryophyllales. To overcome the computational challenge of ortholog detection in such a large data set, we developed an approach for clustering gene families that allowed us to analyze >300 transcriptomes and genomes. We then inferred the species relationships using multiple methods and performed gene-tree conflict analyses. Our phylogenetic analyses resolved many clades with strong support, but also showed significant gene-tree discordance. This discordance is not only a common feature of phylogenomic studies, but also represents an opportunity to understand processes that have structured phylogenies. We also found taxon sampling influences species-tree inference, highlighting the importance of more focused studies with additional taxon sampling. Transcriptomes are useful both for species-tree inference and for uncovering evolutionary complexity within lineages. Through analyses of gene-tree conflict and multiple methods of species-tree inference, we demonstrate that phylogenomic data can provide unparalleled insight into the evolutionary history of Caryophyllales. We also discuss a method for overcoming computational challenges associated with homolog clustering in large data sets. © 2018 The Authors. American Journal of Botany is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America.

  7. Pollution and Climate Effects on Tree-Ring Nitrogen Isotopes

    NASA Astrophysics Data System (ADS)

    Savard, M. M.; Bégin, C.; Marion, J.; Smirnoff, A.

    2009-04-01

    BACKGROUND Monitoring of nitrous oxide concentration only started during the last 30 years in North America, but anthropogenic atmospheric nitrogen has been significantly emitted over the last 150 years. Can geochemical characteristics of tree rings be used to infer past changes in the nitrogen cycle of temperate regions? To address this question we use nitrogen stable isotopes in 125 years-long ring series from beech specimens (Fagus grandifolia) of the Georgian Bay Islands National Park (eastern Ontario), and pine (Pinus strobus) and beech trees of the Arboretum Morgan near Montreal (western Quebec). To evaluate the reliability of the N stable isotopes in wood treated for removal of soluble materials, we tested both tree species from the Montreal area. The reproducibility from tree to tree was excellent for both pine and beech trees, the isotopic trends were strongly concordant, and they were not influenced by the heartwood-sapwood transition zone. The coherence of changes of the isotopic series observed for the two species suggests that their tree-ring N isotopic values can serve as environmental indicator. RESULTS AND INTERPRETATION In Montreal and Georgian Bay, the N isotopes show strong and similar parallel agreement (Gleichlaufigkeit test) with the climatic parameters. So in fact, the short-term isotopic fluctuations correlate directly with summer precipitation and inversely with summer and spring temperature. A long-term decreasing isotope trend in Montreal indicates progressive changes in soil chemistry after 1951. A pedochemical change is also inferred for the Georgian Bay site on the basis of a positive N isotopic trend initiated after 1971. At both sites, the long-term ^15N series correlate with a proxy for NOx emissions (Pearson correlation), and carbon-isotope ring series suggest that the same trees have been impacted by phytotoxic pollutants (Savard et al., 2009a). We propose that the contrasted long-term nitrogen-isotope changes of Montreal and Georgian Bay reflect deposition of NOx emissions from cars and coal-power plants, with higher proportions from coal burning in Georgian Bay (Savard et al., 2009b). This interpretation is conceivable because recent monitoring indicates that coal-power plant NOx emissions play an important role in the annual N budget in Ontario, but they are negligible on the Quebec side. CONCLUSION Interpretations of long tree-ring N isotopic series in terms of effects generated by airborne N-species have been previously advocated. Here we further propose that the contrasted isotopic trends obtained for wood samples from two regions reflect different regional anthropogenic N deposition combined with variations of climatic conditions. This research suggests that nitrogen tree-ring series may record both regional climatic conditions and anthropogenic perturbations of the N cycle. REFERENCES Savard, M.M., Bégin,C., Marion, J., Aznar, J.-C., Smirnoff, A., 2009a. Changes of Air Quality in an urban region as inferred from tree-ring width and stable isotopes. Chapter 9 in "Relating Atmospheric Source Apportionment to Vegetation Effects: Establishing Cause Effect Relationships" (A. Legge ed.). Elsevier, Amsterdam; doi: 10.1016/S1474-8177(08)00209x. Savard, M.M., Bégin, C., Smirnoff, A., Marion, J., Rioux-Paquette, E., 2009b. Tree-ring nitrogen isotopes reflect climatic effects and anthropogenic NOx emissions. Env. Sci. Tech (doi: 10.1021/es802437k).

  8. The space of ultrametric phylogenetic trees.

    PubMed

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. OncoNEM: inferring tumor evolution from single-cell sequencing data.

    PubMed

    Ross, Edith M; Markowetz, Florian

    2016-04-15

    Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.

  10. Inferring Phylogenetic Networks Using PhyloNet.

    PubMed

    Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay

    2018-07-01

    PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

  11. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  12. Increased tree-ring network density reveals more precise estimations of sub-regional hydroclimate variability and climate dynamics in the Midwest, USA

    NASA Astrophysics Data System (ADS)

    Maxwell, Justin T.; Harley, Grant L.

    2017-08-01

    Understanding the historic variability in the hydroclimate provides important information on possible extreme dry or wet periods that in turn inform water management plans. Tree rings have long provided historical context of hydroclimate variability of the U.S. However, the tree-ring network used to create these countrywide gridded reconstructions is sparse in certain locations, such as the Midwest. Here, we increase ( n = 20) the spatial resolution of the tree-ring network in southern Indiana and compare a summer (June-August) Palmer Drought Severity Index (PDSI) reconstruction to existing gridded reconstructions of PDSI for this region. We find both droughts and pluvials that were previously unknown that rival the most intense PDSI values during the instrumental period. Additionally, historical drought occurred in Indiana that eclipsed instrumental conditions with regard to severity and duration. During the period 1962-2004 CE, we find that teleconnections of drought conditions through the Atlantic Meridional Overturning Circulation have a strong influence ( r = -0.60, p < 0.01) on secondary tree growth in this region for the late spring-early summer season. These findings highlight the importance of continuing to increase the spatial resolution of the tree-ring network used to infer past climate dynamics to capture the sub-regional spatial variability. Increasing the spatial resolution of the tree-ring network for a given region can better identify sub-regional variability, improve the accuracy of regional tree-ring PDSI reconstructions, and provide better information for climatic teleconnections.

  13. Floodplain ecohydrology: Climatic, anthropogenic, and local physical controls on partitioning of water sources to riparian trees.

    PubMed

    Singer, Michael Bliss; Sargeant, Christopher I; Piégay, Hervé; Riquier, Jérémie; Wilson, Rob J S; Evans, Cristina M

    2014-05-01

    Seasonal and annual partitioning of water within river floodplains has important implications for ecohydrologic links between the water cycle and tree growth. Climatic and hydrologic shifts alter water distribution between floodplain storage reservoirs (e.g., vadose, phreatic), affecting water availability to tree roots. Water partitioning is also dependent on the physical conditions that control tree rooting depth (e.g., gravel layers that impede root growth), the sources of contributing water, the rate of water drainage, and water residence times within particular storage reservoirs. We employ instrumental climate records alongside oxygen isotopes within tree rings and regional source waters, as well as topographic data and soil depth measurements, to infer the water sources used over several decades by two co-occurring tree species within a riparian floodplain along the Rhône River in France. We find that water partitioning to riparian trees is influenced by annual (wet versus dry years) and seasonal (spring snowmelt versus spring rainfall) fluctuations in climate. This influence depends strongly on local (tree level) conditions including floodplain surface elevation and subsurface gravel layer elevation. The latter represents the upper limit of the phreatic zone and therefore controls access to shallow groundwater. The difference between them, the thickness of the vadose zone, controls total soil moisture retention capacity. These factors thus modulate the climatic influence on tree ring isotopes. Additionally, we identified growth signatures and tree ring isotope changes associated with recent restoration of minimum streamflows in the Rhône, which made new phreatic water sources available to some trees in otherwise dry years. Water shifts due to climatic fluctuations between floodplain storage reservoirsAnthropogenic changes to hydrology directly impact water available to treesEcohydrologic approaches to integration of hydrology afford new possibilities.

  14. Floodplain ecohydrology: Climatic, anthropogenic, and local physical controls on partitioning of water sources to riparian trees

    PubMed Central

    Singer, Michael Bliss; Sargeant, Christopher I; Piégay, Hervé; Riquier, Jérémie; Wilson, Rob J S; Evans, Cristina M

    2014-01-01

    Seasonal and annual partitioning of water within river floodplains has important implications for ecohydrologic links between the water cycle and tree growth. Climatic and hydrologic shifts alter water distribution between floodplain storage reservoirs (e.g., vadose, phreatic), affecting water availability to tree roots. Water partitioning is also dependent on the physical conditions that control tree rooting depth (e.g., gravel layers that impede root growth), the sources of contributing water, the rate of water drainage, and water residence times within particular storage reservoirs. We employ instrumental climate records alongside oxygen isotopes within tree rings and regional source waters, as well as topographic data and soil depth measurements, to infer the water sources used over several decades by two co-occurring tree species within a riparian floodplain along the Rhône River in France. We find that water partitioning to riparian trees is influenced by annual (wet versus dry years) and seasonal (spring snowmelt versus spring rainfall) fluctuations in climate. This influence depends strongly on local (tree level) conditions including floodplain surface elevation and subsurface gravel layer elevation. The latter represents the upper limit of the phreatic zone and therefore controls access to shallow groundwater. The difference between them, the thickness of the vadose zone, controls total soil moisture retention capacity. These factors thus modulate the climatic influence on tree ring isotopes. Additionally, we identified growth signatures and tree ring isotope changes associated with recent restoration of minimum streamflows in the Rhône, which made new phreatic water sources available to some trees in otherwise dry years. Key Points Water shifts due to climatic fluctuations between floodplain storage reservoirs Anthropogenic changes to hydrology directly impact water available to trees Ecohydrologic approaches to integration of hydrology afford new possibilities PMID:25506099

  15. Maximum parsimony, substitution model, and probability phylogenetic trees.

    PubMed

    Weng, J F; Thomas, D A; Mareels, I

    2011-01-01

    The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.

  16. Efficiency of the neighbor-joining method in reconstructing deep and shallow evolutionary relationships in large phylogenies.

    PubMed

    Kumar, S; Gadagkar, S R

    2000-12-01

    The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages.

  17. iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.

    PubMed

    Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades

    2014-01-01

    Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.

  18. Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics

    PubMed Central

    Kolaczkowski, Bryan; Thornton, Joseph W.

    2009-01-01

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis. PMID:20011052

  19. Long-branch attraction bias and inconsistency in Bayesian phylogenetics.

    PubMed

    Kolaczkowski, Bryan; Thornton, Joseph W

    2009-12-09

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.

  20. Gene trees, species trees, and morphology converge on a similar phylogeny of living gars (Actinopterygii: Holostei: Lepisosteidae), an ancient clade of ray-finned fishes.

    PubMed

    Wright, Jeremy J; David, Solomon R; Near, Thomas J

    2012-06-01

    Extant gars represent the remaining members of a formerly diverse assemblage of ancient ray-finned fishes and have been the subject of multiple phylogenetic analyses using morphological data. Here, we present the first hypothesis of phylogenetic relationships among living gar species based on molecular data, through the examination of gene tree heterogeneity and coalescent species tree analyses of a portion of one mitochondrial (COI) and seven nuclear (ENC1, myh6, plagl2, S7 ribosomal protein intron 1, sreb2, tbr1, and zic1) genes. Individual gene trees displayed varying degrees of resolution with regards to species-level relationships, and the gene trees inferred from COI and the S7 intron were the only two that were completely resolved. Coalescent species tree analyses of nuclear genes resulted in a well-resolved and strongly supported phylogenetic tree of living gar species, for which Bayesian posterior node support was further improved by the inclusion of the mitochondrial gene. Species-level relationships among gars inferred from our molecular data set were highly congruent with previously published morphological phylogenies, with the exception of the placement of two species, Lepisosteus osseus and L. platostomus. Re-examination of the character coding used by previous authors provided partial resolution of this topological discordance, resulting in broad concordance in the phylogenies inferred from individual genes, the coalescent species tree analysis, and morphology. The completely resolved phylogeny inferred from the molecular data set with strong Bayesian posterior support at all nodes provided insights into the potential for introgressive hybridization and patterns of allopatric speciation in the evolutionary history of living gars, as well as a solid foundation for future examinations of functional diversification and evolutionary stasis in a "living fossil" lineage. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Revisiting the phylogeny of Zoanthidea (Cnidaria: Anthozoa): Staggered alignment of hypervariable sequences improves species tree inference.

    PubMed

    Swain, Timothy D

    2018-01-01

    The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Gene tree rooting methods give distributions that mimic the coalescent process.

    PubMed

    Tian, Yuan; Kubatko, Laura S

    2014-01-01

    Multi-locus phylogenetic inference is commonly carried out via models that incorporate the coalescent process to model the possibility that incomplete lineage sorting leads to incongruence between gene trees and the species tree. An interesting question that arises in this context is whether data "fit" the coalescent model. Previous work (Rosenfeld et al., 2012) has suggested that rooting of gene trees may account for variation in empirical data that has been previously attributed to the coalescent process. We examine this possibility using simulated data. We show that, in the case of four taxa, the distribution of gene trees observed from rooting estimated gene trees with either the molecular clock or with outgroup rooting can be closely matched by the distribution predicted by the coalescent model with specific choices of species tree branch lengths. We apply commonly-used coalescent-based methods of species tree inference to assess their performance in these situations. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Reasoning about Evolution's Grand Patterns: College Students' Understanding of the Tree of Life

    ERIC Educational Resources Information Center

    Novick, Laura R.; Catley, Kefyn M.

    2013-01-01

    Tree thinking involves using cladograms, hierarchical diagrams depicting the evolutionary history of a set of taxa, to reason about evolutionary relationships and support inferences. Tree thinking is indispensable in modern science. College students' tree-thinking skills were investigated using tree (much more common in professional biology) and…

  4. SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.

    PubMed

    Beccati, Alan; Gerken, Jan; Quast, Christian; Yilmaz, Pelin; Glöckner, Frank Oliver

    2017-09-30

    Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. The SILVA Tree Viewer is a web application designed for visualizing large phylogenetic trees without requiring the download of any software tool or data files. The SILVA Tree Viewer is based on Web Geographic Information Systems (Web-GIS) technology with a PostgreSQL backend. It enables zoom and pan functionalities similar to Google Maps. The SILVA Tree Viewer enables access to two phylogenetic (guide) trees provided by the SILVA database: the SSU Ref NR99 inferred from high-quality, full-length small subunit sequences, clustered at 99% sequence identity and the LSU Ref inferred from high-quality, full-length large subunit sequences. The Tree Viewer provides tree navigation, search and browse tools as well as an interactive feedback system to collect any kinds of requests ranging from taxonomy to data curation and improving the tool itself.

  5. Reconstruction of full glacial environments and summer temperatures from Lago della Costa, a refugial site in Northern Italy

    NASA Astrophysics Data System (ADS)

    Samartin, Stéphanie; Heiri, Oliver; Kaltenrieder, Petra; Kühl, Norbert; Tinner, Willy

    2016-07-01

    Vegetation and climate during the last ice age and the Last Glacial Maximum (LGM, ∼23,000-19,000 cal BP) were considerably different than during the current interglacial (Holocene). Cold climatic conditions and growing ice-sheets during the last glaciation radically reduced forest extent in Europe to a restricted number of so-called ;refugia;, mostly located in the southern part of the continent. On the basis of paleobotanical analyses the Euganian Hills (Colli Euganei) in northeastern Italy have previously been proposed as one of the northernmost refugia of temperate trees (e.g. deciduous Quercus, Tilia, Ulmus, Fraxinus excelsior, Acer, Abies alba, Fagus sylvatica, Carpinus and Castanea) in Europe. In this study we provide the first quantitative, vegetation independent summer air temperature reconstruction for Northern Italy spanning the time ∼31,000-17,000 cal yr BP, which covers the coldest periods of the last glacial, including the LGM and Heinrich stadials 1 to 3. Chironomids preserved in a lake sediment core from Lago della Costa (7m a.s.l.), a small lake at the south-eastern edge of the Euganean Hills, allowed quantitative reconstruction of Full and Late Glacial summer air temperatures using a combined Swiss-Norwegian temperature inference model based on chironomid assemblages from 274 lakes. Chironomid and pollen evidence from Lago della Costa derives from finely stratified autochthonous organic gyttja sediments, which excludes major sediment mixing or reworking. After reconstructing paleo-temperatures, we address the question whether climate conditions were warm enough to permit the local survival of temperate tree species during the LGM and whether local expansions and pollen-inferred contractions of temperate tree taxa coincided with chironomid-inferred climatic changes. Our results suggest that chironomids at Lago della Costa have responded to major climatic fluctuations such as temperature decreases during the LGM and Heinrich stadials. The vegetation of the Euganean Hills shows responses to these climatic oscillations although the effects of temperature changes were probably also strongly influenced by changes in humidity. Reconstructed July air temperatures at Lago della Costa never fell below 10-13 °C (error range of reconstruction ∼ ±1.5-1.6 °C), which is considerably above the limit considered necessary for forest growth (8-10 °C). Instead rather mild climatic conditions prevailed ∼31,000-17,000 cal yr BP with average summer temperatures between ∼12 and 16 °C, which most likely allowed survival of temperate tree taxa in the warmest (and moistest) microhabitats of the Euganean Hills during the LGM. Only assuming local survival is it possible to explain the repeated expansions and collapses of temperate trees at Lago della Costa which faithfully accompanied the climatic oscillations.

  6. Signal, Uncertainty, and Conflict in Phylogenomic Data for a Diverse Lineage of Microbial Eukaryotes (Diatoms, Bacillariophyta)

    PubMed Central

    Parks, Matthew B; Wickett, Norman J; Alverson, Andrew J

    2018-01-01

    Abstract Diatoms (Bacillariophyta) are a species-rich group of eukaryotic microbes diverse in morphology, ecology, and metabolism. Previous reconstructions of the diatom phylogeny based on one or a few genes have resulted in inconsistent resolution or low support for critical nodes. We applied phylogenetic paralog pruning techniques to a data set of 94 diatom genomes and transcriptomes to infer perennially difficult species relationships, using concatenation and summary-coalescent methods to reconstruct species trees from data sets spanning a wide range of thresholds for taxon and column occupancy in gene alignments. Conflicts between gene and species trees decreased with both increasing taxon occupancy and bootstrap cutoffs applied to gene trees. Concordance between gene and species trees was lowest for short internodes and increased logarithmically with increasing edge length, suggesting that incomplete lineage sorting disproportionately affects species tree inference at short internodes, which are a common feature of the diatom phylogeny. Although species tree topologies were largely consistent across many data treatments, concatenation methods appeared to outperform summary-coalescent methods for sparse alignments. Our results underscore that approaches to species-tree inference based on few loci are likely to be misled by unrepresentative sampling of gene histories, particularly in lineages that may have diversified rapidly. In addition, phylogenomic studies of diatoms, and potentially other hyperdiverse groups, should maximize the number of gene trees with high taxon occupancy, though there is clearly a limit to how many of these genes will be available. PMID:29040712

  7. Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes

    PubMed Central

    Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi

    2004-01-01

    Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely related to ray-finned fishes than to lungfishes. PMID:15070407

  8. Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data

    PubMed Central

    Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2015-01-01

    Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow. PMID:26187295

  9. The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

    PubMed Central

    Yu, Yun; Degnan, James H.; Nakhleh, Luay

    2012-01-01

    Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa. PMID:22536161

  10. Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors.

    PubMed

    Deshwar, Amit G; Vembu, Shankar; Morris, Quaid

    2015-01-01

    Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…

  11. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

    PubMed Central

    Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

    2011-01-01

    Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353

  12. The Impact of Reconstruction Methods, Phylogenetic Uncertainty and Branch Lengths on Inference of Chromosome Number Evolution in American Daisies (Melampodium, Asteraceae)

    PubMed Central

    McCann, Jamie; Stuessy, Tod F.; Villaseñor, Jose L.; Weiss-Schneeweiss, Hanna

    2016-01-01

    Chromosome number change (polyploidy and dysploidy) plays an important role in plant diversification and speciation. Investigating chromosome number evolution commonly entails ancestral state reconstruction performed within a phylogenetic framework, which is, however, prone to uncertainty, whose effects on evolutionary inferences are insufficiently understood. Using the chromosomally diverse plant genus Melampodium (Asteraceae) as model group, we assess the impact of reconstruction method (maximum parsimony, maximum likelihood, Bayesian methods), branch length model (phylograms versus chronograms) and phylogenetic uncertainty (topological and branch length uncertainty) on the inference of chromosome number evolution. We also address the suitability of the maximum clade credibility (MCC) tree as single representative topology for chromosome number reconstruction. Each of the listed factors causes considerable incongruence among chromosome number reconstructions. Discrepancies between inferences on the MCC tree from those made by integrating over a set of trees are moderate for ancestral chromosome numbers, but severe for the difference of chromosome gains and losses, a measure of the directionality of dysploidy. Therefore, reliance on single trees, such as the MCC tree, is strongly discouraged and model averaging, taking both phylogenetic and model uncertainty into account, is recommended. For studying chromosome number evolution, dedicated models implemented in the program ChromEvol and ordered maximum parsimony may be most appropriate. Chromosome number evolution in Melampodium follows a pattern of bidirectional dysploidy (starting from x = 11 to x = 9 and x = 14, respectively) with no prevailing direction. PMID:27611687

  13. The Impact of Reconstruction Methods, Phylogenetic Uncertainty and Branch Lengths on Inference of Chromosome Number Evolution in American Daisies (Melampodium, Asteraceae).

    PubMed

    McCann, Jamie; Schneeweiss, Gerald M; Stuessy, Tod F; Villaseñor, Jose L; Weiss-Schneeweiss, Hanna

    2016-01-01

    Chromosome number change (polyploidy and dysploidy) plays an important role in plant diversification and speciation. Investigating chromosome number evolution commonly entails ancestral state reconstruction performed within a phylogenetic framework, which is, however, prone to uncertainty, whose effects on evolutionary inferences are insufficiently understood. Using the chromosomally diverse plant genus Melampodium (Asteraceae) as model group, we assess the impact of reconstruction method (maximum parsimony, maximum likelihood, Bayesian methods), branch length model (phylograms versus chronograms) and phylogenetic uncertainty (topological and branch length uncertainty) on the inference of chromosome number evolution. We also address the suitability of the maximum clade credibility (MCC) tree as single representative topology for chromosome number reconstruction. Each of the listed factors causes considerable incongruence among chromosome number reconstructions. Discrepancies between inferences on the MCC tree from those made by integrating over a set of trees are moderate for ancestral chromosome numbers, but severe for the difference of chromosome gains and losses, a measure of the directionality of dysploidy. Therefore, reliance on single trees, such as the MCC tree, is strongly discouraged and model averaging, taking both phylogenetic and model uncertainty into account, is recommended. For studying chromosome number evolution, dedicated models implemented in the program ChromEvol and ordered maximum parsimony may be most appropriate. Chromosome number evolution in Melampodium follows a pattern of bidirectional dysploidy (starting from x = 11 to x = 9 and x = 14, respectively) with no prevailing direction.

  14. Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

    PubMed

    Robinson, Oscar; Dylus, David; Dessimoz, Christophe

    2016-08-01

    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Inferences from growing trees backwards

    Treesearch

    David W. Green; Kent A. McDonald

    1997-01-01

    The objective of this paper is to illustrate how longitudinal stress wave techniques can be useful in tracking the future quality of a growing tree. Monitoring the quality of selected trees in a plantation forest could provide early input to decisions on the effectiveness of management practices, or future utilization options, for trees in a plantation. There will...

  16. Phylogenetic trees in bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding themore » best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.« less

  17. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

    PubMed

    Allman, Elizabeth S; Degnan, James H; Rhodes, John A

    2011-06-01

    Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

  18. Inference of Transmission Network Structure from HIV Phylogenetic Trees

    DOE PAGES

    Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan; ...

    2017-01-13

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic.more » Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.« less

  19. Inference of Transmission Network Structure from HIV Phylogenetic Trees

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giardina, Federica; Romero-Severson, Ethan Obie; Albert, Jan

    Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic.more » Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.« less

  20. Inference Based on Transitive Relation in Tree Shrews ("Tupaia belangeri") and Rats ("Rattus norvegicus") on a Spatial Discrimination Task

    ERIC Educational Resources Information Center

    Takahashi, Makoto; Ushitani, Tomokazu; Fujita, Kazuo

    2008-01-01

    Six tree shrews and 8 rats were tested for their ability to infer transitively in a spatial discrimination task. The apparatus was a semicircular radial-arm maze with 8 arms labeled A through H. In Experiment 1, the animals were first trained in sequence on 4 discriminations to enter 1 of the paired adjacent arms, AB, BC, CD, and DE, with right…

  1. Environmental and biological context modulates the physiological stress response of bats to human disturbance.

    PubMed

    Phelps, Kendra L; Kingston, Tigga

    2018-06-01

    Environmental and biological context play significant roles in modulating physiological stress responses of individuals in wildlife populations yet are often overlooked when evaluating consequences of human disturbance on individual health and fitness. Furthermore, most studies gauge individual stress responses based on a single physiological biomarker, typically circulating glucocorticoid concentrations, which limits interpretation of the complex, multifaceted responses of individuals to stressors. We selected four physiological biomarkers to capture short-term and prolonged stress responses in a widespread cave-roosting bat, Hipposideros diadema, across multiple gradients of human disturbance in and around caves in the Philippines. We used conditional inference trees and random forest analysis to determine the role of environmental quality (cave complexity, available roosting area), assemblage composition (intra- and interspecific associations and species richness), and intrinsic characteristics of individuals (sex and reproductive status) in modulating responses to disturbance. Direct cave disturbance (hunting pressure and human visitation) was the primary driver of neutrophil-to-lymphocyte ratios, with lower ratios associated with increased disturbance, while context-specific factors were more important in explaining total leukocyte count, body condition, and ectoparasite load. Moreover, conditional inference trees revealed complex interactions among human disturbance and modulating factors. Cave complexity often ameliorated individual responses to human disturbance, whereas conspecific abundance often compounded responses. Our study demonstrates the importance of an integrated approach that incorporates environmental and biological context when identifying drivers of physiological responses, and that assesses responses to gradients of direct and indirect disturbance using multiple complementary biomarkers.

  2. Species tree inference by minimizing deep coalescences.

    PubMed

    Than, Cuong; Nakhleh, Luay

    2009-09-01

    In a 1997 seminal paper, W. Maddison proposed minimizing deep coalescences, or MDC, as an optimization criterion for inferring the species tree from a set of incongruent gene trees, assuming the incongruence is exclusively due to lineage sorting. In a subsequent paper, Maddison and Knowles provided and implemented a search heuristic for optimizing the MDC criterion, given a set of gene trees. However, the heuristic is not guaranteed to compute optimal solutions, and its hill-climbing search makes it slow in practice. In this paper, we provide two exact solutions to the problem of inferring the species tree from a set of gene trees under the MDC criterion. In other words, our solutions are guaranteed to find the tree that minimizes the total number of deep coalescences from a set of gene trees. One solution is based on a novel integer linear programming (ILP) formulation, and another is based on a simple dynamic programming (DP) approach. Powerful ILP solvers, such as CPLEX, make the first solution appealing, particularly for very large-scale instances of the problem, whereas the DP-based solution eliminates dependence on proprietary tools, and its simplicity makes it easy to integrate with other genomic events that may cause gene tree incongruence. Using the exact solutions, we analyze a data set of 106 loci from eight yeast species, a data set of 268 loci from eight Apicomplexan species, and several simulated data sets. We show that the MDC criterion provides very accurate estimates of the species tree topologies, and that our solutions are very fast, thus allowing for the accurate analysis of genome-scale data sets. Further, the efficiency of the solutions allow for quick exploration of sub-optimal solutions, which is important for a parsimony-based criterion such as MDC, as we show. We show that searching for the species tree in the compatibility graph of the clusters induced by the gene trees may be sufficient in practice, a finding that helps ameliorate the computational requirements of optimization solutions. Further, we study the statistical consistency and convergence rate of the MDC criterion, as well as its optimality in inferring the species tree. Finally, we show how our solutions can be used to identify potential horizontal gene transfer events that may have caused some of the incongruence in the data, thus augmenting Maddison's original framework. We have implemented our solutions in the PhyloNet software package, which is freely available at: http://bioinfo.cs.rice.edu/phylonet.

  3. The origin of parasitism gene in nematodes: evolutionary analysis through the construction of domain trees.

    PubMed

    Yang, Yizi; Luo, Damin

    2013-01-01

    Inferring evolutionary history of parasitism genes is important to understand how evolutionary mechanisms affect the occurrences of parasitism genes. In this study, we constructed multiple domain trees for parasitism genes and genes under free-living conditions. Further analyses of horizontal gene transfer (HGT)-like phylogenetic incongruences, duplications, and speciations were performed based on these trees. By comparing these analyses, the contributions of pre-adaptations were found to be more important to the evolution of parasitism genes than those of duplications, and pre-adaptations are as crucial as previously reported HGTs to parasitism. Furthermore, speciation may also affect the evolution of parasitism genes. In addition, Pristionchus pacificus was suggested to be a common model organism for studies of parasitic nematodes, including root-knot species. These analyses provided information regarding mechanisms that may have contributed to the evolution of parasitism genes.

  4. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data.

    PubMed

    duVerle, David A; Yotsukura, Sohiya; Nomura, Seitaro; Aburatani, Hiroyuki; Tsuda, Koji

    2016-09-13

    Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .

  5. Species trees for the tree swallows (Genus Tachycineta): an alternative phylogenetic hypothesis to the mitochondrial gene tree.

    PubMed

    Dor, Roi; Carling, Matthew D; Lovette, Irby J; Sheldon, Frederick H; Winkler, David W

    2012-10-01

    The New World swallow genus Tachycineta comprises nine species that collectively have a wide geographic distribution and remarkable variation both within- and among-species in ecologically important traits. Existing phylogenetic hypotheses for Tachycineta are based on mitochondrial DNA sequences, thus they provide estimates of a single gene tree. In this study we sequenced multiple individuals from each species at 16 nuclear intron loci. We used gene concatenated approaches (Bayesian and maximum likelihood) as well as coalescent-based species tree inference to reconstruct phylogenetic relationships of the genus. We examined the concordance and conflict between the nuclear and mitochondrial trees and between concatenated and coalescent-based inferences. Our results provide an alternative phylogenetic hypothesis to the existing mitochondrial DNA estimate of phylogeny. This new hypothesis provides a more accurate framework in which to explore trait evolution and examine the evolution of the mitochondrial genome in this group. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Modeling individual tree growth by fusing diameter tape and increment core data

    Treesearch

    Erin M. Schliep; Tracy Qi Dong; Alan E. Gelfand; Fan. Li

    2014-01-01

    Tree growth estimation is a challenging task as difficulties associated with data collection and inference often result in inaccurate estimates. Two main methods for tree growth estimation are diameter tape measurements and increment cores. The former involves repeatedly measuring tree diameters with a cloth or metal tape whose scale has been adjusted to give diameter...

  7. Decision trees in epidemiological research.

    PubMed

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  8. Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

    PubMed

    Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak

    2006-06-06

    To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.

  9. The Inference of Gene Trees with Species Trees

    PubMed Central

    Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

    2015-01-01

    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970

  10. Challenges in Species Tree Estimation Under the Multispecies Coalescent Model

    PubMed Central

    Xu, Bo; Yang, Ziheng

    2016-01-01

    The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902

  11. Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks

    PubMed Central

    Didelot, Xavier; Fraser, Christophe; Gardy, Jennifer; Colijn, Caroline

    2017-01-01

    Abstract Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom—a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch coloring approach can incorporate a variable number of unique colors to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte–Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo. PMID:28100788

  12. A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

    PubMed

    Rajan, Vaibhav

    2013-03-01

    Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.

  13. Automatic Inference of Cryptographic Key Length Based on Analysis of Proof Tightness

    DTIC Science & Technology

    2016-06-01

    within an attack tree structure, then expand attack tree methodology to include cryptographic reductions. We then provide the algorithms for...maintaining and automatically reasoning about these expanded attack trees . We provide a software tool that utilizes machine-readable proof and attack metadata...and the attack tree methodology to provide rapid and precise answers regarding security parameters and effective security. This eliminates the need

  14. An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

    PubMed Central

    2016-01-01

    Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621

  15. [Phylogeny of protostome moulting animals (Ecdysozoa) inferred from 18 and 28S rRNA gene sequences].

    PubMed

    Petrov, N B; Vladychenskaia, N S

    2005-01-01

    Reliability of reconstruction of phylogenetic relationships within a group of protostome moulting animals was evaluated by means of comparison of 18 and 28S rRNA gene sequences sets both taken separately and combined. Reliability of reconstructions was evaluated by values of the bootstrap support of major phylogenetic tree nodes and by degree of congruence of phylogenetic trees inferred by various methods. By both criteria, phylogenetic trees reconstructed from the combined 18 and 28S rRNA gene sequences were better than those inferred from 18 and 28S sequences taken separately. Results obtained are consistent with phylogenetic hypothesis separating protostome animals into two major clades, moulting Ecdysozoa (Priapulida + Kinorhyncha, Nematoda + Nematomorpha, Onychophora + Tardigrada, Myriapoda + Chelicerata, Crustacea + Hexapoda) and unmoulting Lophotrochozoa (Plathelminthes, Nemertini, Annelida, Mollusca, Echiura, Sipuncula). Clade Cephalorhyncha does not include nematomorphs (Nematomorpha). Conclusion was taken that it is necessary to use combined 18 and 28S data in phylogenetic studies.

  16. Impact of Ice Ages on the genetic structure of trees and shrubs.

    PubMed Central

    Lascoux, Martin; Palmé, Anna E; Cheddadi, Rachid; Latta, Robert G

    2004-01-01

    Data on the genetic structure of tree and shrub populations on the continental scale have accumulated dramatically over the past decade. However, our ability to make inferences on the impact of the last ice age still depends crucially on the availability of informative palaeoecological data. This is well illustrated by the results from a recent project, during which new pollen fossil maps were established and the variation in chloroplast DNA was studied in 22 European species of trees and shrubs. Species exhibit very different levels of genetic variation between and within populations, and obviously went through very different histories after Ice Ages. However, when palaeoecological data are non-informative, inferences on past history are difficult to draw from entirely genetic data. On the other hand, as illustrated by a study in ponderosa pine, when we can infer the species' history with some certainty, coalescent simulations can be used and new hypotheses can be tested. PMID:15101576

  17. Inferring explicit weighted consensus networks to represent alternative evolutionary histories

    PubMed Central

    2013-01-01

    Background The advent of molecular biology techniques and constant increase in availability of genetic material have triggered the development of many phylogenetic tree inference methods. However, several reticulate evolution processes, such as horizontal gene transfer and hybridization, have been shown to blur the species evolutionary history by causing discordance among phylogenies inferred from different genes. Methods To tackle this problem, we hereby describe a new method for inferring and representing alternative (reticulate) evolutionary histories of species as an explicit weighted consensus network which can be constructed from a collection of gene trees with or without prior knowledge of the species phylogeny. Results We provide a way of building a weighted phylogenetic network for each of the following reticulation mechanisms: diploid hybridization, intragenic recombination and complete or partial horizontal gene transfer. We successfully tested our method on some synthetic and real datasets to infer the above-mentioned evolutionary events which may have influenced the evolution of many species. Conclusions Our weighted consensus network inference method allows one to infer, visualize and validate statistically major conflicting signals induced by the mechanisms of reticulate evolution. The results provided by the new method can be used to represent the inferred conflicting signals by means of explicit and easy-to-interpret phylogenetic networks. PMID:24359207

  18. Towards a more molecular taxonomy of disease.

    PubMed

    Park, Jisoo; Hescott, Benjamin J; Slonim, Donna K

    2017-07-27

    Disease taxonomies have been designed for many applications, but they tend not to fully incorporate the growing amount of molecular-level knowledge of disease processes, inhibiting research efforts. Understanding the degree to which we can infer disease relationships from molecular data alone may yield insights into how to ultimately construct more modern taxonomies that integrate both physiological and molecular information. We introduce a new technique we call Parent Promotion to infer hierarchical relationships between disease terms using disease-gene data. We compare this technique with both an established ontology inference method (CliXO) and a minimum weight spanning tree approach. Because there is no gold standard molecular disease taxonomy available, we compare our inferred hierarchies to both the Medical Subject Headings (MeSH) category C forest of diseases and to subnetworks of the Disease Ontology (DO). This comparison provides insights about the inference algorithms, choices of evaluation metrics, and the existing molecular content of various subnetworks of MeSH and the DO. Our results suggest that the Parent Promotion method performs well in most cases. Performance across MeSH trees is also correlated between inference methods. Specifically, inferred relationships are more consistent with those in smaller MeSH disease trees than larger ones, but there are some notable exceptions that may correlate with higher molecular content in MeSH. Our experiments provide insights about learning relationships between diseases from disease genes alone. Future work should explore the prospect of disease term discovery from molecular data and how best to integrate molecular data with anatomical and clinical knowledge. This study nonetheless suggests that disease gene information has the potential to form an important part of the foundation for future representations of the disease landscape.

  19. Sampling diverse characters improves phylogenies: Craniodental and postcranial characters of vertebrates often imply different trees.

    PubMed

    Mounce, Ross C P; Sansom, Robert; Wills, Matthew A

    2016-03-01

    Morphological cladograms of vertebrates are often inferred from greater numbers of characters describing the skull and teeth than from postcranial characters. This is either because the skull is believed to yield characters with a stronger phylogenetic signal (i.e., contain less homoplasy), because morphological variation therein is more readily atomized, or because craniodental material is more widely available (particularly in the palaeontological case). An analysis of 85 vertebrate datasets published between 2000 and 2013 confirms that craniodental characters are significantly more numerous than postcranial characters, but finds no evidence that levels of homoplasy differ in the two partitions. However, a new partition test, based on tree-to-tree distances (as measured by the Robinson Foulds metric) rather than tree length, reveals that relationships inferred from the partitions are significantly different about one time in three, much more often than expected. Such differences may reflect divergent selective pressures in different body regions, resulting in different localized patterns of homoplasy. Most systematists attempt to sample characters broadly across body regions, but this is not always possible. We conclude that trees inferred largely from either craniodental or postcranial characters in isolation may differ significantly from those that would result from a more holistic approach. We urge the latter. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  20. Relating phylogenetic trees to transmission trees of infectious disease outbreaks.

    PubMed

    Ypma, Rolf J F; van Ballegooijen, W Marijn; Wallinga, Jacco

    2013-11-01

    Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.

  1. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent

    PubMed Central

    Zhu, Sha; Degnan, James H.

    2017-01-01

    Abstract Recent work in estimating species relationships from gene trees has included inferring networks assuming that past hybridization has occurred between species. Probabilistic models using the multispecies coalescent can be used in this framework for likelihood-based inference of both network topologies and parameters, including branch lengths and hybridization parameters. A difficulty for such methods is that it is not always clear whether, or to what extent, networks are identifiable—that is whether there could be two distinct networks that lead to the same distribution of gene trees. For cases in which incomplete lineage sorting occurs in addition to hybridization, we demonstrate a new representation of the species network likelihood that expresses the probability distribution of the gene tree topologies as a linear combination of gene tree distributions given a set of species trees. This representation makes it clear that in some cases in which two distinct networks give the same distribution of gene trees when sampling one allele per species, the two networks can be distinguished theoretically when multiple individuals are sampled per species. This result means that network identifiability is not only a function of the trees displayed by the networks but also depends on allele sampling within species. We additionally give an example in which two networks that display exactly the same trees can be distinguished from their gene trees even when there is only one lineage sampled per species. PMID:27780899

  2. Two C++ Libraries for Counting Trees on a Phylogenetic Terrace.

    PubMed

    Biczok, R; Bozsoky, P; Eisenmann, P; Ernst, J; Ribizel, T; Scholz, F; Trefzer, A; Weber, F; Hamann, M; Stamatakis, A

    2018-05-08

    The presence of terraces in phylogenetic tree space, that is, a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al. (2011). However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the lack of an efficient library to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace, and (iii) enumerate all trees on a terrace. In our bioinformatics practical that is set up as a programming contest we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating trees on a terrace. Both implementations yield exactly the same results, are more than one order of magnitude faster, and require one order of magnitude less memory than a previous 3rd party python implementation. The source codes are available under GNU GPL at https://github.com/terraphast. Alexandros.Stamatakis@h-its.org. Supplementary data are available at Bioinformatics online.

  3. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

    PubMed

    Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver

    2008-07-01

    DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree

  4. A Tree-Ring Chronology and Paleoclimate Record for the Younger Dryas-Early Holocene Transition from Northeastern North America

    NASA Technical Reports Server (NTRS)

    Griggs, Carol; Peteet, Dorothy; Kromer, Bernd; Grote, Todd; Southon, John

    2017-01-01

    Spruce and tamarack logs dating from the Younger Dryas and Early Holocene (YDEH; approx. 12.9 - 11.3k cal a BP) were found at Bell Creek in the Lake Ontario lowlands of the Great Lakes region, North America. A 211-year tree-ring chronology dates to approx. 11 755 -11 545 cal a BP, across the YDEH transition. A 23-year period of higher year-to-year ring-width variability dates to around 11 650 cal a BP, infers strong regional climatic perturbations and may represent the end of the YD. Tamarack and spruce were dominant species throughout the YD - EH interval at the site, indicating that boreal conditions persisted into the EH, in contrast to geographical regions immediately south and east of the lowlands, but consistent with the Great Lakes interior lowlands. This infers that Bell Creek was at the eastern boundary of a boreal ecotone, perhaps a result of its lower elevation and the non-analog dynamics of the Laurentide Ice Sheet. This finding suggests that the ecotone boundary extended farther east during the YD - EH transition than previously thought.

  5. Metabarcoding of marine nematodes – evaluation of reference datasets used in tree-based taxonomy assignment approach

    PubMed Central

    2016-01-01

    Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919

  6. Phylogeny of the cycads based on multiple single-copy nuclear genes: congruence of concatenated parsimony, likelihood and species tree inference methods.

    PubMed

    Salas-Leiva, Dayana E; Meerow, Alan W; Calonje, Michael; Griffith, M Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W; Lewis, Carl E; Namoff, Sandra

    2013-11-01

    Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree-species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree-species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia-Lepidozamia-Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae.

  7. Metabarcoding of marine nematodes - evaluation of reference datasets used in tree-based taxonomy assignment approach.

    PubMed

    Holovachov, Oleksandr

    2016-01-01

    Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.

  8. Radial patterns of tree-ring chemical element concentration in two Appalachian hardwood stands

    Treesearch

    D.R. Dewalle; B.R. Swistock; W.E. Sharpe

    1991-01-01

    Radial patterns in tree-ring chemical element concentration in red oak (Quercus rubra L.) and black (Prunus serotina Ehrh.) were analyzed to infer past environmental changes at two mature Appalachian forest sites.

  9. Coupled effects of wind-storms and drought on tree mortality across 115 forest stands from the Western Alps and the Jura mountains.

    PubMed

    Csilléry, Katalin; Kunstler, Georges; Courbaud, Benoît; Allard, Denis; Lassègues, Pierre; Haslinger, Klaus; Gardiner, Barry

    2017-12-01

    Damage due to wind-storms and droughts is increasing in many temperate forests, yet little is known about the long-term roles of these key climatic factors in forest dynamics and in the carbon budget. The objective of this study was to estimate individual and coupled effects of droughts and wind-storms on adult tree mortality across a 31-year period in 115 managed, mixed coniferous forest stands from the Western Alps and the Jura mountains. For each stand, yearly mortality was inferred from management records, yearly drought from interpolated fields of monthly temperature, precipitation and soil water holding capacity, and wind-storms from interpolated fields of daily maximum wind speed. We performed a thorough model selection based on a leave-one-out cross-validation of the time series. We compared different critical wind speeds (CWSs) for damage, wind-storm, and stand variables and statistical models. We found that a model including stand characteristics, drought, and storm strength using a CWS of 25 ms -1 performed the best across most stands. Using this best model, we found that drought increased damage risk only in the most southerly forests, and its effect is generally maintained for up to 2 years. Storm strength increased damage risk in all forests in a relatively uniform way. In some stands, we found positive interaction between drought and storm strength most likely because drought weakens trees, and they became more prone to stem breakage under wind-loading. In other stands, we found negative interaction between drought and storm strength, where excessive rain likely leads to soil water saturation making trees more susceptible to overturning in a wind-storm. Our results stress that temporal data are essential to make valid inferences about ecological impacts of disturbance events, and that making inferences about disturbance agents separately can be of limited validity. Under projected future climatic conditions, the direction and strength of these ecological interactions could also change. © 2017 John Wiley & Sons Ltd.

  10. Using intra annual density fluctuations and d13C to assess the impact of summer drought on Mediterranean ecosystem

    NASA Astrophysics Data System (ADS)

    Battipaglia, G.; Brand, W. A.; Linke, P.; Schaefer, I.; Noetzli, M.; Cherubini, P.

    2009-04-01

    Tree- ring growth and wood density have been used extensively as indicators of climate change, and tree-ring has been commonly applied as a proxy estimate for seasonal integration of temperatures and precipitation with annual resolution (Hughes 2002). While these relationships have been well established in temperate ecosystems (Fritts, 1976; Schweingruber, 1988, Briffa et al., 1998, 2004), in Mediterranean region dendrochronological studies are still scarce (Cherubini et al, 2003). In Mediterranean environment, trees may form intra-annual density fluctuations, also called "false rings" or "double rings" (Tingley 1937; Schulman 1938). They are usually induced by sudden drought events, occurring during the vegetative period, and, allowing intra-annual resolution, they may provide detailed information at a seasonal level, as well as species-specific sensitivity to drought. We investigated the variability of tree- ring width and carbon stable isotopes of a Mediterranean species, Arbutus unedo L., sampled on Elba island, (Tuscany, Italy). The samples were taken at two different sites, one characterized by wet and one by dry conditions. d13C was measured using Laser- Ablation- Combustion -GC-IRMS. Here, we present first results showing the impact of drought on tree growth and on false ring formation at the different sites and we underline the importance of using Laser Ablation to infer drought impact at the intra -annual level. Briffa KR, Schweingruber FH, Jones PD, Osborn TJ, Harris IC, Shiyatov SG, Vaganov EA, Grudd H (1998) Trees tell of past climates: but are they speaking less clearly today? Phil Transact Royal Soc London 353:65-73 Briffa KR, Osborn TJ, Schweingruber FH (2004) Large-scale temperature inferences from tree rings: a review. Glob Panet Change 40:11-26 Cherubini, P., B.L. Gartner, R. Tognetti, O.U. Bräker, W. Schoch & J.L. Innes. 2003. Identification, measurement and interpretation of tree rings in woody species from Mediterranean climates. Biol. Rev. 78: 119-14 Fritts, H.C. 1976. Tree rings and climate. Academic Press, London, UK. Hughes, M.K. 2002. Dendrochronology in climatology - the state of the art. Dendrochronologia 20: 95-116. Schulman, E. 1938. Classification of false annual rings in Monterey pine. Tree-Ring Bull. 4:4-7 Schweingruber FH (1988) Tree-ring: Basics and applications of dendrochronology. Reidel. Publ., Dordrecht, 276 p Tingley, M.A. 1937. Double growth rings in Red Astrachan. Proc. Am. Soc. Hort. Sci. 34: 61.

  11. Effects of Natural and Experimental Drought on Growth and Water Use Efficiency in Amazon trees

    NASA Astrophysics Data System (ADS)

    Vadeboncoeur, M. A.; Brum, M., Jr.; Oliveira, R. S.; Moutinho, V. H. P.; Flores, C. F.; Llerena, C. A.; Palace, M. W.; Asbjornsen, H.

    2016-12-01

    Severe regional droughts in the Amazon basin, mostly associated with El Nino events, have attracted considerable attention over the past decade, especially with regard to their effects on tree mortality, vulnerability to fire, and changes in the terrestrial budgets of carbon, water, and energy. Understanding the complex responses of forest ecosystems to such droughts is key to predicting how these globally critical forest ecosystems will respond to a changing climate with higher temperatures and greater precipitation variability. Though tree rings are not formed by all tropical tree species, they offer a unique retrospective approach for investigating patterns of climatic responses in both carbon cycling (primary production inferred from diameter growth) and water cycling (water use efficiency calculated from stable C isotope ratios). We sampled increment cores from 40 tree species at the Tapajos National Forest in Brazil, as well as the Cocha Cashu Biological Station in Peru, for an isotopic dendrochronological investigation into the effects of past droughts on the growth and water-use efficiency of canopy and mid-story tree species. We found that many but not all trees responded to drought years with periods of reduced growth lasting 2-3 years. Forthcoming data on carbon isotope ratios will allow us to compare the sensitivity of species and sites in terms of water use under drought conditions.

  12. Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

    PubMed

    McTavish, Emily Jane; Steel, Mike; Holder, Mark T

    2015-12-01

    Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree - though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. A comparison of high-resolution pollen-inferred climate data from central Minnesota, USA, to 19th century US military fort climate data and tree-ring inferred climate reconstructions

    NASA Astrophysics Data System (ADS)

    St Jacques, J.; Cumming, B. F.; Sauchyn, D.; Vanstone, J. R.; Dickenson, J.; Smol, J. P.

    2013-12-01

    A vital component of paleoclimatology is the validation of paleoclimatological reconstructions. Unfortunately, there is scant instrumental data prior to the 20th century available for this. Hence, typically, we can only do long-term validation using other proxy-inferred climate reconstructions. Minnesota, USA, with its long military fort climate records beginning in 1820 and early dense network of climate stations, offers a rare opportunity for proxy validation. We compare a high-resolution (4-year), millennium-scale, pollen-inferred paleoclimate record derived from varved Lake Mina in central Minnesota to early military fort records and dendroclimatological records. When inferring a paleoclimate record from a pollen record, we rely upon the pollen-climate relationship being constant in time. However, massive human impacts have significantly altered vegetation; and the relationship between modern instrumental climate data and the modern pollen rain becomes altered from what it was in the past. In the Midwest, selective logging, fire suppression, deforestation and agriculture have strongly influenced the modern pollen rain since Euro-American settlement in the mid-1800s. We assess the signal distortion introduced by using the conventional method of modern post-settlement pollen and climate calibration sets to infer climate at Lake Mina from pre-settlement pollen data. Our first February and May temperature reconstructions are based on a pollen dataset contemporaneous with early settlement to which corresponding climate data from the earliest instrumental records has been added to produce a 'pre-settlement' calibration set. The second February and May temperature reconstructions are based on a conventional 'modern' pollen-climate dataset from core-top pollen samples and modern climate normals. The temperature reconstructions are then compared to the earliest instrumental records from Fort Snelling, Minnesota, and it is shown that the reconstructions based on the pre-settlement calibration set give much more credible reconstructions. We then compare the temperature reconstructions based upon the two calibration sets for AD 1116-2002. Significant signal flattening and bias exist when using the conventional modern pollen-climate calibration set rather than the pre-settlement pollen-climate calibration set, resulting in an overestimation of Little Ice Age monthly mean temperatures of 0.5-1.5 oC. Therefore, regional warming from anthropogenic global warming is significantly underestimated when using the conventional method of building pollen-climate calibration sets. We also compare the Lake Mina pollen-inferred effective moisture record to early 19th century climate data and to a four-century tree-ring inferred moisture reconstruction based upon sites in Minnesota and the Dakotas. This comparison shows that regional tree-ring reconstructions are biased towards dry conditions and record wet periods poorly relative to high-resolution pollen reconstructions, giving a false impression of regional aridity. It also suggests that varve chronologies should be based upon cross-dating to ensure a more accurate chronology.

  14. 1,500 year quantitative reconstruction of winter precipitation in the Pacific Northwest

    PubMed Central

    Steinman, Byron A.; Abbott, Mark B.; Mann, Michael E.; Stansell, Nathan D.; Finney, Bruce P.

    2012-01-01

    Multiple paleoclimate proxies are required for robust assessment of past hydroclimatic conditions. Currently, estimates of drought variability over the past several thousand years are based largely on tree-ring records. We produced a 1,500-y record of winter precipitation in the Pacific Northwest using a physical model-based analysis of lake sediment oxygen isotope data. Our results indicate that during the Medieval Climate Anomaly (MCA) (900–1300 AD) the Pacific Northwest experienced exceptional wetness in winter and that during the Little Ice Age (LIA) (1450–1850 AD) conditions were drier, contrasting with hydroclimatic anomalies in the desert Southwest and consistent with climate dynamics related to the El Niño Southern Oscillation (ENSO) and the Pacific Decadal Oscillation (PDO). These findings are somewhat discordant with drought records from tree rings, suggesting that differences in seasonal sensitivity between the two proxies allow a more compete understanding of the climate system and likely explain disparities in inferred climate trends over centennial timescales. PMID:22753510

  15. Evolution of prokaryote and eukaryote lines inferred from sequence evidence

    NASA Technical Reports Server (NTRS)

    Hunt, L. T.; George, D. G.; Yeh, L.-S.; Dayhoff, M. O.

    1984-01-01

    This paper describes the evolution of prokaryotes and early eukaryotes, including their symbiotic relationships, as inferred from phylogenetic trees of bacterial ferredoxin, 5S ribosomal RNA, ribulose-1,5-biphosphate carboxylase large chain, and mitochondrial cytochrome oxidase polypeptide II.

  16. Topology, divergence dates, and macroevolutionary inferences vary between different tip-dating approaches applied to fossil theropods (Dinosauria).

    PubMed

    Bapst, D W; Wright, A M; Matzke, N J; Lloyd, G T

    2016-07-01

    Dated phylogenies of fossil taxa allow palaeobiologists to estimate the timing of major divergences and placement of extinct lineages, and to test macroevolutionary hypotheses. Recently developed Bayesian 'tip-dating' methods simultaneously infer and date the branching relationships among fossil taxa, and infer putative ancestral relationships. Using a previously published dataset for extinct theropod dinosaurs, we contrast the dated relationships inferred by several tip-dating approaches and evaluate potential downstream effects on phylogenetic comparative methods. We also compare tip-dating analyses to maximum-parsimony trees time-scaled via alternative a posteriori approaches including via the probabilistic cal3 method. Among tip-dating analyses, we find opposing but strongly supported relationships, despite similarity in inferred ancestors. Overall, tip-dating methods infer divergence dates often millions (or tens of millions) of years older than the earliest stratigraphic appearance of that clade. Model-comparison analyses of the pattern of body-size evolution found that the support for evolutionary mode can vary across and between tree samples from cal3 and tip-dating approaches. These differences suggest that model and software choice in dating analyses can have a substantial impact on the dated phylogenies obtained and broader evolutionary inferences. © 2016 The Author(s).

  17. Mesoscale disturbance and ecological response to decadal climatic variability in the American Southwest

    USGS Publications Warehouse

    Swetnam, T.W.; Betancourt, J.L.

    1998-01-01

    Ecological responses to climatic variability in the Southwest include regionally synchronized fires, insect outbreaks, and pulses in tree demography (births and deaths). Multicentury, tree-ring reconstructions of drought, disturbance history, and tree demography reveal climatic effects across scales, from annual to decadal, and from local (<102 km2) to mesoscale (104-106 km2). Climate-disturbance relations are more variable and complex than previously assumed. During the past three centuries, mesoscale outbreaks of the western spruce budworm (Choristoneura occidentalis) were associated with wet, not dry episodes, contrary to conventional wisdom. Regional fires occur during extreme droughts but, in some ecosystems, antecedent wet conditions play a secondary role by regulating accumulation of fuels. Interdecadal changes in fire-climate associations parallel other evidence for shifts in the frequency or amplitude of the Southern Oscillation (SO) during the past three centuries. High interannual, fire-climate correlations (r = 0.7 to 0.9) during specific decades (i.e., circa 1740-80 and 1830-60) reflect periods of high amplitude in the SO and rapid switching from extreme wet to dry years in the Southwest, thereby entraining fire occurrence across the region. Weak correlations from 1780 to 1830 correspond with a decrease in SO frequency or amplitude inferred from independent tree-ring width, ice core, and coral isotope reconstructions. Episodic dry and wet episodes have altered age structures and species composition of woodland and conifer forests. The scarcity of old, living conifers established before circa 1600 suggests that the extreme drought of 1575-95 had pervasive effects on tree populations. The most extreme drought of the past 400 years occurred in the mid-twentieth century (1942-57). This drought resulted in broadscale plant dieoffs in shrublands, woodlands, and forests and accelerated shrub invasion of grasslands. Drought conditions were broken by the post-1976 shift to the negative SO phase and wetter cool seasons in the Southwest. The post-1976 period shows up as an unprecedented surge in tree-ring growth within millennia-length chronologies. This unusual episode may have produced a pulse in tree recruitment and improved rangeland conditions (e.g., higher grass production), though additional study is needed to disentangle the interacting roles of land use and climate. The 1950s drought and the post-1976 wet period and their aftermaths offer natural experiments to study long-term ecosystem response to interdecadal climate variability.Ecological responses to climatic variability in the Southwest include regionally synchronized fires, insect outbreaks, and pulses in tree demography (births and deaths). Multicentury, tree-ring reconstructions of drought, disturbance history, and tree demography reveal climatic effects across scales, from annual to decadal, and from local (<102 km2) to mesoscale (104-106 km2). Climate-disturbance relations are more variable and complex than previously assumed. During the past three centuries, mesoscale outbreaks of the western spruce budworm (Choristoneura occidentalis) were associated with wet, not dry episodes, contrary to conventional wisdom. Regional fires occur during extreme droughts but, in some ecosystems, antecedent wet conditions play a secondary role by regulating accumulation of fuels. Interdecadal changes in fire-climate associations parallel other evidence for shifts in the frequency or amplitude of the Southern Oscillation (SO) during the past three centuries. High interannual, fire-climate correlations (r = 0.7 to 0.9) during specific decades (i.e., circa 1740-80 and 1830-60) reflect periods of high amplitude in the SO and rapid switching from extreme wet to dry years in the Southwest, thereby entraining fire occurrence across the region. Weak correlations from 1780 to 1830 correspond with a decrease in SO frequency or amplitude inferred from independent tree-ring width, ic

  18. Chloroplast Phylogenomics Indicates that Ginkgo biloba Is Sister to Cycads

    PubMed Central

    Wu, Chung-Shien; Chaw, Shu-Miaw; Huang, Ya-Yi

    2013-01-01

    Molecular phylogenetic studies have not yet reached a consensus on the placement of Ginkgoales, which is represented by the only living species, Ginkgo biloba (common name: ginkgo). At least six discrepant placements of ginkgo have been proposed. This study aimed to use the chloroplast phylogenomic approach to examine possible factors that lead to such disagreeing placements. We found the sequence types used in the analyses as the most critical factor in the conflicting placements of ginkgo. In addition, the placement of ginkgo varied in the trees inferred from nucleotide (NU) sequences, which notably depended on breadth of taxon sampling, tree-building methods, codon positions, positions of Gnetopsida (common name: gnetophytes), and including or excluding gnetophytes in data sets. In contrast, the trees inferred from amino acid (AA) sequences congruently supported the monophyly of a ginkgo and Cycadales (common name: cycads) clade, regardless of which factors were examined. Our site-stripping analysis further revealed that the high substitution saturation of NU sequences mainly derived from the third codon positions and contributed to the variable placements of ginkgo. In summary, the factors we surveyed did not affect results inferred from analyses of AA sequences. Congruent topologies in our AA trees give more confidence in supporting the ginkgo–cycad sister-group hypothesis. PMID:23315384

  19. Phylogenic inference using alignment-free methods for applications in microbial community surveys using 16s rRNA gene

    PubMed Central

    2017-01-01

    The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference. PMID:29136663

  20. Taxonomic relationships among Phenacomys voles as inferred by cytochrome b.

    Treesearch

    M. Renee Bellinger; Susan M. Haig; Eric D. Forsmann; Thomas D. Mullins

    2005-01-01

    Taxonomic relationships among red tree voles (Phenacomys longicaudus longicaudus, P. I. silvicola), the Sonoma tree vole (P. pomo), the white-footed vole (P. albipes), and the heather vole (P. intermedius) were examined using 664 base pairs of the mitochondrial...

  1. Tree-average distances on certain phylogenetic networks have their weights uniquely determined.

    PubMed

    Willson, Stephen J

    2012-01-01

    A phylogenetic network N has vertices corresponding to species and arcs corresponding to direct genetic inheritance from the species at the tail to the species at the head. Measurements of DNA are often made on species in the leaf set, and one seeks to infer properties of the network, possibly including the graph itself. In the case of phylogenetic trees, distances between extant species are frequently used to infer the phylogenetic trees by methods such as neighbor-joining. This paper proposes a tree-average distance for networks more general than trees. The notion requires a weight on each arc measuring the genetic change along the arc. For each displayed tree the distance between two leaves is the sum of the weights along the path joining them. At a hybrid vertex, each character is inherited from one of its parents. We will assume that for each hybrid there is a probability that the inheritance of a character is from a specified parent. Assume that the inheritance events at different hybrids are independent. Then for each displayed tree there will be a probability that the inheritance of a given character follows the tree; this probability may be interpreted as the probability of the tree. The tree-average distance between the leaves is defined to be the expected value of their distance in the displayed trees. For a class of rooted networks that includes rooted trees, it is shown that the weights and the probabilities at each hybrid vertex can be calculated given the network and the tree-average distances between the leaves. Hence these weights and probabilities are uniquely determined. The hypotheses on the networks include that hybrid vertices have indegree exactly 2 and that vertices that are not leaves have a tree-child.

  2. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    PubMed

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-11-28

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  3. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    PubMed

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-03-01

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  4. Is the 20th century warming unprecedented in the Siberian north?

    NASA Astrophysics Data System (ADS)

    Sidorova, Olga V.; Saurer, Matthias; Andreev, Andrei; Fritzsche, Diedrich; Opel, Thomas; Naurzbaev, Mukhtar M.; Siegwolf, Rolf

    2013-08-01

    To answer the question "Has the recent warming no analogues in the Siberian north?" we analyzed larch tree samples (Larix gmelinii Rupr.) from permafrost zone in the eastern Taimyr (TAY) (72°N, 102°E) using tree-ring and stable isotope analyses for the Climatic Optimum Period (COP) 4111-3806 BC and Medieval Warm Period (MWP) 917-1150 AD, in comparison to the recent period (RP) 1791-2008 AD. We developed a description of the climatic and environmental changes in the eastern Taimyr using tree-ring width and stable isotope (δ13C, δ18O) data based on statistical verification of the relationships to climatic parameters (temperature and precipitation). Additionally, we compared our new tree-ring and stable isotope data sets with earlier published July temperature and precipitation reconstructions inferred from pollen data of the Lama Lake, Taimyr Peninsula, δ18O ice core data from Akademii Nauk ice cap on Severnaya Zemlya (SZ) and δ18O ice core data from Greenland (GISP2), as well as tree-ring width and stable carbon and oxygen isotope data from northeastern Yakutia (YAK). We found that the COP in TAY was warmer and drier compared to the MWP but rather similar to the RP. Our results indicate that the MWP in TAY started earlier and was wetter than in YAK. July precipitation reconstructions obtained from pollen data of the Lama Lake, oxygen isotope data from SZ and our carbon isotopes in tree cellulose agree well and indicate wetter climate conditions during the MWP. Consistent large-scale patterns were reflected in significant links between oxygen isotope data in tree cellulose from TAY and YAK, and oxygen isotope data from SZ and GISP2 during the MWP and the RP. Finally, we showed that the recent warming is not unprecedented in the Siberian north. Similar climate conditions were recorded by tree-rings, stable isotopes, pollen, and ice core data 6000 years ago.

  5. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    PubMed

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  6. Tropical rain forest tree growth and atmospheric carbon dynamics linked to interannual temperature variation during 1984–2000

    PubMed Central

    Clark, D. A.; Piper, S. C.; Keeling, C. D.; Clark, D. B.

    2003-01-01

    During 1984–2000, canopy tree growth in old-growth tropical rain forest at La Selva, Costa Rica, varied >2-fold among years. The trees' annual diameter increments in this 16-yr period were negatively correlated with annual means of daily minimum temperatures. The tree growth variations also negatively covaried with the net carbon exchange of the terrestrial tropics as a whole, as inferred from nearly pole-to-pole measurements of atmospheric carbon dioxide (CO2) interpreted by an inverse tracer–transport model. Strong reductions in tree growth and large inferred tropical releases of CO2 to the atmosphere occurred during the record-hot 1997–1998 El Niño. These and other recent findings are consistent with decreased net primary production in tropical forests in the warmer years of the last two decades. As has been projected by recent process model studies, such a sensitivity of tropical forest productivity to on-going climate change would accelerate the rate of atmospheric CO2 accumulation. PMID:12719545

  7. Maximum likelihood inference implies a high, not a low, ancestral haploid chromosome number in Araceae, with a critique of the bias introduced by ‘x’

    PubMed Central

    Cusimano, Natalie; Sousa, Aretuza; Renner, Susanne S.

    2012-01-01

    Background and Aims For 84 years, botanists have relied on calculating the highest common factor for series of haploid chromosome numbers to arrive at a so-called basic number, x. This was done without consistent (reproducible) reference to species relationships and frequencies of different numbers in a clade. Likelihood models that treat polyploidy, chromosome fusion and fission as events with particular probabilities now allow reconstruction of ancestral chromosome numbers in an explicit framework. We have used a modelling approach to reconstruct chromosome number change in the large monocot family Araceae and to test earlier hypotheses about basic numbers in the family. Methods Using a maximum likelihood approach and chromosome counts for 26 % of the 3300 species of Araceae and representative numbers for each of the other 13 families of Alismatales, polyploidization events and single chromosome changes were inferred on a genus-level phylogenetic tree for 113 of the 117 genera of Araceae. Key Results The previously inferred basic numbers x = 14 and x = 7 are rejected. Instead, maximum likelihood optimization revealed an ancestral haploid chromosome number of n = 16, Bayesian inference of n = 18. Chromosome fusion (loss) is the predominant inferred event, whereas polyploidization events occurred less frequently and mainly towards the tips of the tree. Conclusions The bias towards low basic numbers (x) introduced by the algebraic approach to inferring chromosome number changes, prevalent among botanists, may have contributed to an unrealistic picture of ancestral chromosome numbers in many plant clades. The availability of robust quantitative methods for reconstructing ancestral chromosome numbers on molecular phylogenetic trees (with or without branch length information), with confidence statistics, makes the calculation of x an obsolete approach, at least when applied to large clades. PMID:22210850

  8. Phylogenics & Tree-Thinking

    ERIC Educational Resources Information Center

    Baum, David A.; Offner, Susan

    2008-01-01

    Phylogenetic trees, which are depictions of the inferred evolutionary relationships among a set of species, now permeate almost all branches of biology and are appearing in increasing numbers in biology textbooks. While few state standards explicitly require knowledge of phylogenetics, most require some knowledge of evolutionary biology, and many…

  9. Gas exchange parameters inferred from {delta}{sup 13}C of conifer annual rings throughout the 20th century

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marshall, J.D.; Monserud, R.A.

    1995-12-31

    In this study the stable isotopes of carbon in plant tissue provided a means of inferring the proportional decrease in carbon dioxide concentration across the stomata, which is closely related to photosynthetic water-use efficiency. The authors analyzed the stable carbon isotope composition of tree rings laid down over the past 80 years to determine whether the proportional decrease in CO{sub 2} concentration across the stomata had increased. Dominant and codominant trees of western white pine (Pinus monticola), ponderosa pine (P. ponderosa), and Douglas-fir (Pseudotsuga menziesii var. glauca) growing at the Priest River Experimental Forest, in northern Idaho, were analyzed. Tomore » avoid confounding age and year, the authors compared the innermost rings of mature trees to trees of intermediate age and to saplings. The isotopic data were corrected for changes in isotopic composition and carbon dioxide concentration using published data from ice cores.« less

  10. Climatic and physiological effects on leaf and tree-ring stable isotopes in California redwoods

    NASA Astrophysics Data System (ADS)

    Ambrose, A. R.; Baxter, W.; Wong, C.; Dawson, T. E.; Carroll, A.; Voelker, S.

    2016-12-01

    Variation in the stable isotope composition of organic matter can provide important information about environmental change and biological responses to it. We analyzed the stable carbon (d13C) and oxygen (d18O) isotope ratios of leaves and of the cellulose from individual tree-rings of California's two redwood species to examine how these trees have responded to environmental variation and change in both time and space. Analyses of leaf d13C for both coast redwood (Sequoia sempervirens) and giant sequoia (Sequoiadendron giganteum) from throughout their geographical ranges show a marked gradient with tree height for trees of all sizes and ages but no clear difference among species or populations. The gradient is best explained by tree response to changes in both microenvironment and physiology that are known to change with height. In contrast, leaf d18O for both species showed no clear relationship with height but very clear differences between species and populations with giant sequoia displaying a much stronger inferred leaf-level response to the higher evaporative conditions present in the Sierra Nevada mountains as compared to the coast. Both species showed population-level differences with the driest and warmest sites most distinct from all of the others. Intra-annual analyses of d13C and d18O in tree-rings over a 21-year period (1974-1994) were also used to explore how climate and tree response to climate was recorded for both species. These analyses revealed unique (local) climatic effects and response to the climate for each species and population of both redwood species. Most pronounced was a significant increase in intrinsic Water Use Efficiency (iWUE) derived from d13C data over the study period in both species, and a distinct d18O response in relation to drought (e.g. 1976/1977) and to warmer days and nights and above-average precipitation (e.g., 1982-1985). Patterns of co-variation in d13C and d18O in both species suggest that during dry and also warm periods these trees appear to first down-regulate their water use and secondly their carbon fixation and that high evaporative conditions drive some of the most marked changes in both variables. This information should be useful for efforts to conserve and protect both redwood species under novel environmental conditions expected in the coming decades.

  11. Rooting phylogenetic trees under the coalescent model using site pattern probabilities.

    PubMed

    Tian, Yuan; Kubatko, Laura

    2017-12-19

    Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.

  12. MRI-based decision tree model for diagnosis of biliary atresia.

    PubMed

    Kim, Yong Hee; Kim, Myung-Joon; Shin, Hyun Joo; Yoon, Haesung; Han, Seok Joo; Koh, Hong; Roh, Yun Ho; Lee, Mi-Jung

    2018-02-23

    To evaluate MRI findings and to generate a decision tree model for diagnosis of biliary atresia (BA) in infants with jaundice. We retrospectively reviewed features of MRI and ultrasonography (US) performed in infants with jaundice between January 2009 and June 2016 under approval of the institutional review board, including the maximum diameter of periportal signal change on MRI (MR triangular cord thickness, MR-TCT) or US (US-TCT), visibility of common bile duct (CBD) and abnormality of gallbladder (GB). Hepatic subcapsular flow was reviewed on Doppler US. We performed conditional inference tree analysis using MRI findings to generate a decision tree model. A total of 208 infants were included, 112 in the BA group and 96 in the non-BA group. Mean age at the time of MRI was 58.7 ± 36.6 days. Visibility of CBD, abnormality of GB and MR-TCT were good discriminators for the diagnosis of BA and the MRI-based decision tree using these findings with MR-TCT cut-off 5.1 mm showed 97.3 % sensitivity, 94.8 % specificity and 96.2 % accuracy. MRI-based decision tree model reliably differentiates BA in infants with jaundice. MRI can be an objective imaging modality for the diagnosis of BA. • MRI-based decision tree model reliably differentiates biliary atresia in neonatal cholestasis. • Common bile duct, gallbladder and periportal signal changes are the discriminators. • MRI has comparable performance to ultrasonography for diagnosis of biliary atresia.

  13. PAH dissipation in spiked soil: impacts of bioavailability, microbial activity, and trees.

    PubMed

    Mueller, Kevin E; Shann, Jodi R

    2006-08-01

    While trees have demonstrated potential in phytoremediation of several organic contaminants, little is known regarding their ability to impact the common soil contaminant PAHs. Several species of native North American trees were planted in soil artificially contaminated with three PAHs. Plant biomass, PAH dissipation, and microbial mineralization were monitored over the course of one year and environmental conditions were allowed to follow typical seasonal patterns. PAH dissipation and mineralization were not affected by planting. Extensive and rapid loss of PAHs was observed and attributed to high bioavailability and microbial activity in all treatments. The rate of this loss may have masked any significant planting effects. Anthracene was found to be more recalcitrant than pyrene or phenanthrene. Parallel soil aging studies indicated that sequestration to soil components was minimal. Contrary to common inferences in literature, amendment with decaying fine roots inhibited PAH degradation by the soil microbial community. Seasonal variation in environmental factors and rhizosphere dynamics may have also reduced or negated the effect of planting and should be taken into account in future phytoremediation trials. The unique root traits of trees may pose a challenge to traditional thought regarding PAH dissipation in the rhizosphere of plants.

  14. Stable Isotope (delta OXYGEN-18, Delta Deuterium, Delta CARBON-13) Dendroclimatological Studies in the Waterloo Region of Southern Ontario, Canada, Between AD 1610 and 1990.

    NASA Astrophysics Data System (ADS)

    Buhay, William Mark

    Oxygen (delta^{18} O), hydrogen (delta^2H) and carbon (delta^{13}C) isotopes were measured in wood cellulose from elm, white pine and maple trees that grew in southwestern Ontario, Canada. The measured oxygen and hydrogen isotopic data were used for model-based reconstructions of delta^{18}{O}_{meteoric water}, mean annual temperature (MAT) and relative humidity for a period, AD 1610 to 1880, that precedes instrumental records of climate. The carbon isotope measurements were compared with the Cellulose Model inferred climate data to reveal additional environmental information. Modifications made to the Cellulose Model focused on the dynamics of oxygen and hydrogen isotopic fractionation in plants during evapotranspiration and photosynthetic assimilation. For instance, kinetic fractionation of ^{18}O was found to be predictable from theoretical considerations of leaf energy balance and boundary layer dynamics. Kinetic fractionation during evapotranspiration is sensitive to the nature of the boundary layer, which is controlled by leaf size and morphology. Generally, plants with small segmented leaves have a lower component of turbidity in the leaf boundary layer, which results in higher kinetic fractionation values, than do plants having large simple leaves and more turbulent boundary layers. Kinetic ^2H enrichment in plant leaf water can also be rationalized in terms of leaf size and morphology when an apparent temperature-dependent isotope effect, acting in opposition to evaporative enrichment, is taken into account. Accounting for this temperature -dependent isotope effect helps to: (1) reconcile hydrogen kinetic fractionation inconsistencies for different leaves; (2) explain a temperature effect previously attributed to variable biochemical fractionation during cellulose synthesis, and; (3) verify hydrogen biochemical effects in plants. This improved characterization of the oxygen and hydrogen isotopic effects in plants, using the modified Cellulose Model, helped to constrain the paleoclimate interpretations from three species of trees that grew in different hydrologic settings. The inferred climate data, integrated with the hydrological setting of the trees and various climate modifying factors in the Great Lakes basin, generated an independent interpretation of summer and winter conditions in southwestern Ontario for the past 380 years. The inferred evidence indicates that conditions in southwestern Ontario between 1610 and 1750 typified those of "Little Ice Age" Europe by being cooler and drier than present. This probably resulted from a southerly positioning of the Polar Front, with respect to southwestern Ontario, which allowed sub-polar airmasses to dominantly influence this region. A subsequent retreat of the Polar Front north after 1750 allowed for a predominance of sub -tropical airmasses that resulted in warm-moist conditions and an increase in winter precipitation in this area between 1750 and 1850. Another advance of the Polar Front position south, sometime after 1850, renewed cool-dry conditions and reduced winter precipitation amounts in southwestern Ontario until the early twentieth century, after which time, climate ameliorated progressively. Typical of the findings in previous studies, a significant correlation between climate parameters and delta^{13}C_ {cellulose} values is observed for a tree (maple) from a groundwater recharge setting. The correlation is best between MAT and delta^ {13}C_{cellulose} values between 1610 and 1850. The breakdown of this correlation after 1850, due to enriched delta ^{13}C_{cellulose} values, could indicate that the tree is responding to an alteration in soil chemistry occurring due to the fallout of anthropogenically produced atmospheric pollutants. This is because the effects of depleted soil nutrients and/or leached phytotoxins on delta^ {13}C_{cellulose} values in wood cellulose, are similar to ones seen in trees that regularly experience drought stress.

  15. Pacific southwest United States Holocene summer paleoclimate inferred from sediment calcite oxygen isotopes (Lake Elsinore, CA)

    NASA Astrophysics Data System (ADS)

    Kirby, M.; Patterson, W. P.; Lachniet, M. S.; Anderson, M.; Noblet, J. A.

    2017-12-01

    Records of past climate inform on the natural range and mechanisms of climate change. In the arid Pacific southwest United States (pswUS), there exist a variety of Holocene records that infer past winter conditions (moisture and/or temperature). Holocene records of summer climate, however, are rare excepting short-lived (<500-1000 yrs) tree ring PDSIs and some pollen-inferred temperature reconstructions. As climate changes due to anthropogenic forcing, the severity of drought is expected to increase in the already water-stressed pswUS. Hot droughts are of considerable concern as summer temperatures rise. As a result, understanding how summer conditions changed in the past is critical to understanding future predictions under varied climate forcings. Here, we present a 9800 year delta-18O(calcite) record from Lake Elsinore, CA. This isotope record is interpreted to reflect late-spring to summer conditions, especially evaporation. Modern water isotope data support this interpretation. Our results reveal a three-part Holocene consisting of a highly evaporative early Holocene, a cooler mid-Holocene, and evaporative late Holocene. Coupled with an inferred winter wetness (run-off) record from Kirby et al. (2010), we estimate the severity of centennial scale Holocene dryness (i.e. dry winters plus hot summers = severe drought). The most severe droughts occur in the early Holocene, decline in the mid-Holocene, and return in the late Holocene. An independently dated isotope record from Lake Elsinore's littoral zone (Kirby et al. 2004) shows similar changes providing confidence in our longer record. Various forcing mechanisms are examined to explain the Elsinore summer record including insolation, Pacific SSTs, and trace gas radiative forcing.

  16. Environmental and climatic conditions at a potential Glacial refugial site of tree species near the Southern Alpine glaciers. New insights from multiproxy sedimentary studies at Lago della Costa (Euganean Hills, Northeastern Italy)

    NASA Astrophysics Data System (ADS)

    Kaltenrieder, Petra; Belis, Claudio A.; Hofstetter, Simone; Ammann, Brigitta; Ravazzi, Cesare; Tinner, Willy

    2009-12-01

    It has been hypothesized that refugia of thermophilous tree species were located in Northern Italy very close to the Alps, though, this hypothesis has yet to be tested thoroughly. In contrast to Central and Southern Italy with its relative wealth of data, only a few fragmentary records are currently available from Northern Italy for the last Glacial (Würm, Weichselian). Our new study site Lago della Costa lies adjacent to the catchment of the megafans of the Alpine forelands and the braided rivers of the Northeastern Po Plain that have so far inhibited the recovery of continuous Glacial and Late-Glacial records. We analyze pollen, plant macrofossils, charcoal and ostracods to reconstruct the vegetation, fire and lake history for the period 33,000-16,000 cal. BP. We compare our data with Glacial records from Southern Europe to discuss similarities and dissimilarities between these potential refugial areas. A comparison with independent paleoclimatic proxies allows to assess potential linkages between environmental and climatic variability. New macrofossil and pollen data at Lago della Costa unambiguously document the local persistence of boreal tree taxa such as Larix decidua and Betula tree species around the study site during the last Glacial. The regular occurrence of pollen of temperate trees in the organic lake sediments (fine-detritus calcareous gyttja) suggests that temperate taxa such as Corylus avellana, Quercus deciduous, Tilia, Ulmus, Fraxinus excelsior, Carpinus, Abies alba and Fagus sylvatica, most likely survived the Last Glacial Maximum (LGM) at favorable sites in the Euganean Hills. The percentage values of temperate trees are comparable with those from Southern Europe (e.g. Monticchio in Southern Italy). We conclude that the Euganean Hills were one of the northernmost refugial areas of temperate taxa in Europe. However, the relative and absolute abundances of pollen of temperate trees are highly variable. Pollen-inferred declines of temperate tree communities (e.g. Quercetum mixtum) and low ostracod-inferred water levels at Lago della Costa correspond to the cold Heinrich events H-2 (LGM; 23,000-19,000 cal. BP) and H-3 (around 28,000 cal. BP), as recorded in the marine sediments of the North Atlantic. Similar patterns of significant temperate tree population collapses during cold Heinrich events are recorded at southern Mediterranean sites (e.g. Monticchio and the Alboran Sea). These findings suggest close linkages between Northern Atlantic and South-Central European climates during the past Glacial.

  17. Phylogenetics of moth-like butterflies (Papilionoidea: Hedylidae) based on a new 13-locus target capture probe set.

    PubMed

    Kawahara, Akito Y; Breinholt, Jesse W; Espeland, Marianne; Storer, Caroline; Plotkin, David; Dexter, Kelly M; Toussaint, Emmanuel F A; St Laurent, Ryan A; Brehm, Gunnar; Vargas, Sergio; Forero, Dimitri; Pierce, Naomi E; Lohman, David J

    2018-06-11

    The Neotropical moth-like butterflies (Hedylidae) are perhaps the most unusual butterfly family. In addition to being species-poor, this family is predominantly nocturnal and has anti-bat ultrasound hearing organs. Evolutionary relationships among the 36 described species are largely unexplored. A new, target capture, anchored hybrid enrichment probe set ('BUTTERFLY2.0') was developed to infer relationships of hedylids and some of their butterfly relatives. The probe set includes 13 genes that have historically been used in butterfly phylogenetics. Our dataset comprised of up to 10,898 aligned base pairs from 22 hedylid species and 19 outgroups. Eleven of the thirteen loci were successfully captured from all samples, and the remaining loci were captured from ≥94% of samples. The inferred phylogeny was consistent with recent molecular studies by placing Hedylidae sister to Hesperiidae, and the tree had robust support for 80% of nodes. Our results are also consistent with morphological studies, with Macrosoma tipulata as the sister species to all remaining hedylids, followed by M. semiermis sister to the remaining species in the genus. We tested the hypothesis that nocturnality evolved once from diurnality in Hedylidae, and demonstrate that the ancestral condition was likely diurnal, with a shift to nocturnality early in the diversification of this family. The BUTTERFLY2.0 probe set includes standard butterfly phylogenetics markers, captures sequences from decades-old museum specimens, and is a cost-effective technique to infer phylogenetic relationships of the butterfly tree of life. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Towards the harmonization between National Forest Inventory and Forest Condition Monitoring. Consistency of plot allocation and effect of tree selection methods on sample statistics in Italy.

    PubMed

    Gasparini, Patrizia; Di Cosmo, Lucio; Cenni, Enrico; Pompei, Enrico; Ferretti, Marco

    2013-07-01

    In the frame of a process aiming at harmonizing National Forest Inventory (NFI) and ICP Forests Level I Forest Condition Monitoring (FCM) in Italy, we investigated (a) the long-term consistency between FCM sample points (a subsample of the first NFI, 1985, NFI_1) and recent forest area estimates (after the second NFI, 2005, NFI_2) and (b) the effect of tree selection method (tree-based or plot-based) on sample composition and defoliation statistics. The two investigations were carried out on 261 and 252 FCM sites, respectively. Results show that some individual forest categories (larch and stone pine, Norway spruce, other coniferous, beech, temperate oaks and cork oak forests) are over-represented and others (hornbeam and hophornbeam, other deciduous broadleaved and holm oak forests) are under-represented in the FCM sample. This is probably due to a change in forest cover, which has increased by 1,559,200 ha from 1985 to 2005. In case of shift from a tree-based to a plot-based selection method, 3,130 (46.7%) of the original 6,703 sample trees will be abandoned, and 1,473 new trees will be selected. The balance between exclusion of former sample trees and inclusion of new ones will be particularly unfavourable for conifers (with only 16.4% of excluded trees replaced by new ones) and less for deciduous broadleaves (with 63.5% of excluded trees replaced). The total number of tree species surveyed will not be impacted, while the number of trees per species will, and the resulting (plot-based) sample composition will have a much larger frequency of deciduous broadleaved trees. The newly selected trees have-in general-smaller diameter at breast height (DBH) and defoliation scores. Given the larger rate of turnover, the deciduous broadleaved part of the sample will be more impacted. Our results suggest that both a revision of FCM network to account for forest area change and a plot-based approach to permit statistical inference and avoid bias in the tree sample composition in terms of DBH (and likely age and structure) are desirable in Italy. As the adoption of a plot-based approach will keep a large share of the trees formerly selected, direct tree-by-tree comparison will remain possible, thus limiting the impact on the time series comparability. In addition, the plot-based design will favour the integration with NFI_2.

  19. Is the extremely rare Iberian endemic plant species Castrilanthemum debeauxii (Compositae, Anthemideae) a 'living fossil'? Evidence from a multi-locus species tree reconstruction.

    PubMed

    Tomasello, Salvatore; Álvarez, Inés; Vargas, Pablo; Oberprieler, Christoph

    2015-01-01

    The present study provides results of multi-species coalescent species tree analyses of DNA sequences sampled from multiple nuclear and plastid regions to infer the phylogenetic relationships among the members of the subtribe Leucanthemopsidinae (Compositae, Anthemideae), to which besides the annual Castrilanthemum debeauxii (Degen, Hervier & É.Rev.) Vogt & Oberp., one of the rarest flowering plant species of the Iberian Peninsula, two other unispecific genera (Hymenostemma, Prolongoa), and the polyploidy complex of the genus Leucanthemopsis belong. Based on sequence information from two single- to low-copy nuclear regions (C16, D35, characterised by Chapman et al. (2007)), the multi-copy region of the nrDNA internal transcribed spacer regions ITS1 and ITS2, and two intergenic spacer regions of the cpDNA gene trees were reconstructed using Bayesian inference methods. For the reconstruction of a multi-locus species tree we applied three different methods: (a) analysis of concatenated sequences using Bayesian inference (MrBayes), (b) a tree reconciliation approach by minimizing the number of deep coalescences (PhyloNet), and (c) a coalescent-based species-tree method in a Bayesian framework ((∗)BEAST). All three species tree reconstruction methods unequivocally support the close relationship of the subtribe with the hitherto unclassified genus Phalacrocarpum, the sister-group relationship of Castrilanthemum with the three remaining genera of the subtribe, and the further sister-group relationship of the clade of Hymenostemma+Prolongoa with a monophyletic genus Leucanthemopsis. Dating of the (∗)BEAST phylogeny supports the long-lasting (Early Miocene, 15-22Ma) taxonomical independence and the switch from the plesiomorphic perennial to the apomorphic annual life-form assumed for the Castrilanthemum lineage that may have occurred not earlier than in the Pliocene (3Ma) when the establishment of a Mediterranean climate with summer droughts triggered evolution towards annuality. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. A Six Nuclear Gene Phylogeny of Citrus (Rutaceae) Taking into Account Hybridization and Lineage Sorting

    PubMed Central

    Keremane, Manjunath L.; Lee, Richard F.; Maureira-Butler, Ivan J.; Roose, Mikeal L.

    2013-01-01

    Background Genus Citrus (Rutaceae) comprises many important cultivated species that generally hybridize easily. Phylogenetic study of a group showing extensive hybridization is challenging. Since the genus Citrus has diverged recently (4–12 Ma), incomplete lineage sorting of ancestral polymorphisms is also likely to cause discrepancies among genes in phylogenetic inferences. Incongruence of gene trees is observed and it is essential to unravel the processes that cause inconsistencies in order to understand the phylogenetic relationships among the species. Methodology and Principal Findings (1) We generated phylogenetic trees using haplotype sequences of six low copy nuclear genes. (2) Published simple sequence repeat data were re-analyzed to study population structure and the results were compared with the phylogenetic trees constructed using sequence data and coalescence simulations. (3) To distinguish between hybridization and incomplete lineage sorting, we developed and utilized a coalescence simulation approach. In other studies, species trees have been inferred despite the possibility of hybridization having occurred and used to generate null distributions of the effect of lineage sorting alone (by coalescent simulation). Since this is problematic, we instead generate these distributions directly from observed gene trees. Of the six trees generated, we used the most resolved three to detect hybrids. We found that 11 of 33 samples appear to be affected by historical hybridization. Analysis of the remaining three genes supported the conclusions from the hybrid detection test. Conclusions We have identified or confirmed probable hybrid origins for several Citrus cultivars using three different approaches–gene phylogenies, population structure analysis and coalescence simulation. Hybridization and incomplete lineage sorting were identified primarily based on differences among gene phylogenies with reference to null expectations via coalescence simulations. We conclude that identifying hybridization as a frequent cause of incongruence among gene trees is critical to correctly infer the phylogeny among species of Citrus. PMID:23874615

  1. Toward the Decision Tree for Inferring Requirements Maturation Types

    NASA Astrophysics Data System (ADS)

    Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi

    Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.

  2. Iteratively Refined Guide Trees Help Improving Alignment and Phylogenetic Inference in the Mushroom Family Bolbitiaceae

    PubMed Central

    Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G.

    2013-01-01

    Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that – although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching – it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe. PMID:23418526

  3. CARBON ISOTOPE DISCRIMINATION AND GROWTH RESPONSE TO STAND DENSITY REDUCTIONS IN OLD PINUS PONDEROSA TREES

    EPA Science Inventory

    Carbon isotope ratios ( 13C) of tree rings are commonly used for paleoclimatic reconstruction and for inferring canopy water-use efficiency (WUE). However, the responsiveness of carbon isotope discrimination ( ) to site disturbance and resource availability has only rarely been ...

  4. AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

    PubMed

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.

  5. AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

    PubMed Central

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. PMID:24892935

  6. Evaluating Great Lakes bald eagle nesting habitat with Bayesian inference

    Treesearch

    Teryl G. Grubb; William W. Bowerman; Allen J. Bath; John P. Giesy; D. V. Chip Weseloh

    2003-01-01

    Bayesian inference facilitated structured interpretation of a nonreplicated, experience-based survey of potential nesting habitat for bald eagles (Haliaeetus leucocephalus) along the five Great Lakes shorelines. We developed a pattern recognition (PATREC) model of our aerial search image with six habitat attributes: (a) tree cover, (b) proximity and...

  7. Evolutionary inference via the Poisson Indel Process.

    PubMed

    Bouchard-Côté, Alexandre; Jordan, Michael I

    2013-01-22

    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.

  8. Evolutionary inference via the Poisson Indel Process

    PubMed Central

    Bouchard-Côté, Alexandre; Jordan, Michael I.

    2013-01-01

    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114–124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments. PMID:23275296

  9. Old Black Hills ponderosa pines tell a story

    Treesearch

    Matthew J. Bunkers; L. Ronald Johnson; James R. Miller; Carolyn Hull Sieg

    1999-01-01

    A single ponderosa pine tree found in the central Black Hills of SouthDakota revealed its age of more than 700 years by its tree rings taken from coring in 1992. The purpose of this study was to examine historic climatic patterns from the 13th century through most of the 20th century as inferred from ring widths of this and other nearby trees. The steep, rocky site...

  10. Phylogenomic analyses data of the avian phylogenomics project.

    PubMed

    Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Narula, Nitish; Liu, Liang; Burt, Dave; Ellegren, Hans; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas Pius; Zhang, Guojie

    2015-01-01

    Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.

  11. Tree-ring based history of climate and disease in western Oregon forests

    EPA Science Inventory

    Annual tree-ring width data are often used to make inferences of past climate and the spatiotemporal climate-growth relationships. However, the climatic signal may be confounded with non-climatic signals such as disease or pest disturbances at unknown times in the past. Signal e...

  12. Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data.

    PubMed

    Ralf, Arwin; Montiel González, Diego; Zhong, Kaiyin; Kayser, Manfred

    2018-05-01

    Next-generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y-chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy, and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data, it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.

  13. Birth-death prior on phylogeny and speed dating

    PubMed Central

    2008-01-01

    Background In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models. PMID:18318893

  14. Birth-death prior on phylogeny and speed dating.

    PubMed

    Akerborg, Orjan; Sennblad, Bengt; Lagergren, Jens

    2008-03-04

    In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.

  15. Visual exploration of parameter influence on phylogenetic trees.

    PubMed

    Hess, Martin; Bremm, Sebastian; Weissgraeber, Stephanie; Hamacher, Kay; Goesele, Michael; Wiemeyer, Josef; von Landesberger, Tatiana

    2014-01-01

    Evolutionary relationships between organisms are frequently derived as phylogenetic trees inferred from multiple sequence alignments (MSAs). The MSA parameter space is exponentially large, so tens of thousands of potential trees can emerge for each dataset. A proposed visual-analytics approach can reveal the parameters' impact on the trees. Given input trees created with different parameter settings, it hierarchically clusters the trees according to their structural similarity. The most important clusters of similar trees are shown together with their parameters. This view offers interactive parameter exploration and automatic identification of relevant parameters. Biologists applied this approach to real data of 16S ribosomal RNA and protein sequences of ion channels. It revealed which parameters affected the tree structures. This led to a more reliable selection of the best trees.

  16. An Efficient Independence Sampler for Updating Branches in Bayesian Markov chain Monte Carlo Sampling of Phylogenetic Trees.

    PubMed

    Aberer, Andre J; Stamatakis, Alexandros; Ronquist, Fredrik

    2016-01-01

    Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a [Formula: see text] distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ~90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  17. Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins

    NASA Technical Reports Server (NTRS)

    Gaucher, Eric A.; Thomson, J. Michael; Burgan, Michelle F.; Benner, Steven A.

    2003-01-01

    Features of the physical environment surrounding an ancestral organism can be inferred by reconstructing sequences of ancient proteins made by those organisms, resurrecting these proteins in the laboratory, and measuring their properties. Here, we resurrect candidate sequences for elongation factors of the Tu family (EF-Tu) found at ancient nodes in the bacterial evolutionary tree, and measure their activities as a function of temperature. The ancient EF-Tu proteins have temperature optima of 55-65 degrees C. This value seems to be robust with respect to uncertainties in the ancestral reconstruction. This suggests that the ancient bacteria that hosted these particular genes were thermophiles, and neither hyperthermophiles nor mesophiles. This conclusion can be compared and contrasted with inferences drawn from an analysis of the lengths of branches in trees joining proteins from contemporary bacteria, the distribution of thermophily in derived bacterial lineages, the inferred G + C content of ancient ribosomal RNA, and the geological record combined with assumptions concerning molecular clocks. The study illustrates the use of experimental palaeobiochemistry and assumptions about deep phylogenetic relationships between bacteria to explore the character of ancient life.

  18. Inference of population splits and mixtures from genome-wide allele frequency data.

    PubMed

    Pickrell, Joseph K; Pritchard, Jonathan K

    2012-01-01

    Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.

  19. How to infer relative fitness from a sample of genomic sequences.

    PubMed

    Dayarian, Adel; Shraiman, Boris I

    2014-07-01

    Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.

  20. How fresh is maple syrup? Sugar maple trees mobilize carbon stored several years previously during early springtime sap-ascent.

    PubMed

    Muhr, Jan; Messier, Christian; Delagrange, Sylvain; Trumbore, Susan; Xu, Xiaomei; Hartmann, Henrik

    2016-03-01

    While trees store substantial amounts of nonstructural carbon (NSC) for later use, storage regulation and mobilization of stored NSC in long-lived organisms like trees are still not well understood. At two different sites with sugar maple (Acer saccharum), we investigated ascending sap (sugar concentration, δ(13) C, Δ(14) C) as the mobilized component of stored stem NSC during early springtime. Using the bomb-spike radiocarbon approach we were able to estimate the average time elapsed since the mobilized carbon (C) was originally fixed from the atmosphere and to infer the turnover time of stem storage. Sites differed in concentration dynamics and overall δ(13) C, indicating different growing conditions. The absence of temporal trends for δ(13) C and Δ(14) C indicated sugar mobilization from a well-mixed pool with average Δ(14) C consistent with a mean turnover time (TT) of three to five years for this pool, with only minor differences between the sites. Sugar maple trees hence appear well buffered against single or even several years of negative plant C balance from environmental stress such as drought or repeated defoliation by insects. Manipulative investigations (e.g. starvation via girdling) combined with Δ(14) C measurements of this mobilized storage pool will provide further new insights into tree storage regulation and functioning. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  1. CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data

    PubMed Central

    Weiss, Scott T.

    2014-01-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com. PMID:24922310

  2. CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data.

    PubMed

    McGeachie, Michael J; Chang, Hsun-Hsien; Weiss, Scott T

    2014-06-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.

  3. Mountain pine beetle population sampling: inferences from Lindgren pheromone traps and tree emergence cages

    Treesearch

    Barbara J. Bentz

    2006-01-01

    Lindgren pheromone traps baited with a mountain pine beetle (Dendroctonus ponderosae Hopkins (Coleoptera: Curculionidae, Scolytinae)) lure were deployed for three consecutive years in lodgepole pine stands in central Idaho. Mountain pine beetle emergence was also monitored each year using cages on infested trees. Distributions of beetles caught in...

  4. Phylogenetic prediction of Alternaria leaf blight resistance in wild and cultivated species of carrots (Daucus, Apiaceae)

    USDA-ARS?s Scientific Manuscript database

    Plant scientists make inferences and predictions from phylogenetic trees to solve scientific problems. Crop losses due to disease damage is an important problem that many plant breeders would like to solve, so the ability to predict traits like disease resistance from phylogenetic trees derived from...

  5. The impact of within-herd genetic variation upon inferred transmission trees for foot-and-mouth disease virus.

    PubMed

    Valdazo-González, Begoña; Kim, Jan T; Soubeyrand, Samuel; Wadsworth, Jemma; Knowles, Nick J; Haydon, Daniel T; King, Donald P

    2015-06-01

    Full-genome sequences have been used to monitor the fine-scale dynamics of epidemics caused by RNA viruses. However, the ability of this approach to confidently reconstruct transmission trees is limited by the knowledge of the genetic diversity of viruses that exist within different epidemiological units. In order to address this question, this study investigated the variability of 45 foot-and-mouth disease virus (FMDV) genome sequences (from 33 animals) that were collected during 2007 from eight premises (10 different herds) in the United Kingdom. Bayesian and statistical parsimony analysis demonstrated that these sequences exhibited clustering which was consistent with a transmission scenario describing herd-to-herd spread of the virus. As an alternative to analysing all of the available samples in future epidemics, the impact of randomly selecting one sequence from each of these herds was used to assess cost-effective methods that might be used to infer transmission trees during FMD outbreaks. Using these approaches, 85% and 91% of the resulting topologies were either identical or differed by only one edge from a reference tree comprising all of the sequences generated within the outbreak. The sequence distances that accrued during sequential transmission events between epidemiological units was estimated to be 4.6 nucleotides, although the genetic variability between viruses recovered from chronic carrier animals was higher than between viruses from animals with acute-stage infection: an observation which poses challenges for the use of simple approaches to infer transmission trees. This study helps to develop strategies for sampling during FMD outbreaks, and provides data that will guide the development of further models to support control policies in the event of virus incursions into FMD free countries. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

    PubMed Central

    2017-01-01

    Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083

  7. Recapitulating phylogenies using k-mers: from trees to networks.

    PubMed

    Bernard, Guillaume; Ragan, Mark A; Chan, Cheong Xin

    2016-01-01

    Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k -mers (subsequences at fixed length k ). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k -mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.

  8. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.

    PubMed

    Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco

    2017-05-01

    Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.

  9. Linking Tree Growth Response to Measured Microclimate - A Field Based Approach

    NASA Astrophysics Data System (ADS)

    Martin, J. T.; Hoylman, Z. H.; Looker, N. T.; Jencso, K. G.; Hu, J.

    2015-12-01

    The general relationship between climate and tree growth is a well established and important tenet shaping both paleo and future perspectives of forest ecosystem growth dynamics. Across much of the American west, water limits growth via physiological mechanisms that tie regional and local climatic conditions to forest productivity in a relatively predictable way, and these growth responses are clearly evident in tree ring records. However, within the annual cycle of a forest landscape, water availability varies across both time and space, and interacts with other potentially growth limiting factors such as temperature, light, and nutrients. In addition, tree growth responses may lag climate drivers and may vary in terms of where in a tree carbon is allocated. As such, determining when and where water actually limits forest growth in real time can be a significant challenge. Despite these challenges, we present data suggestive of real-time growth limitation driven by soil moisture supply and atmospheric water demand reflected in high frequency field measurements of stem radii and cell structure across ecological gradients. The experiment was conducted at the Lubrecht Experimental Forest in western Montana where, over two years, we observed intra-annual growth rates of four dominant conifer species: Douglas fir, Ponderosa Pine, Engelmann Spruce and Western Larch using point dendrometers and microcores. In all four species studied, compensatory use of stored water (inferred from stem water deficit) appears to exhibit a threshold relationship with a critical balance point between water supply and demand. The occurrence of this point in time coincided with a decrease in stem growth rates, and the while the timing varied up to one month across topographic and elevational gradients, the onset date of growth limitation was a reliable predictor of overall annual growth. Our findings support previous model-based observations of nonlinearity in the relationship between climate and annual ring formation, and suggest a rather immediate growth response to critical micro-meteorological conditions occurring at different times across the landscape by linking the timing and magnitude of tree growth responses to in situ measurements of environmental conditions.

  10. Slowdowns in diversification rates from real phylogenies may not be real.

    PubMed

    Cusimano, Natalie; Renner, Susanne S

    2010-07-01

    Studies of diversification patterns often find a slowing in lineage accumulation toward the present. This seemingly pervasive pattern of rate downturns has been taken as evidence for adaptive radiations, density-dependent regulation, and metacommunity species interactions. The significance of rate downturns is evaluated with statistical tests (the gamma statistic and Monte Carlo constant rates (MCCR) test; birth-death likelihood models and Akaike Information Criterion [AIC] scores) that rely on null distributions, which assume that the included species are a random sample of the entire clade. Sampling in real phylogenies, however, often is nonrandom because systematists try to include early-diverging species or representatives of previous intrataxon classifications. We studied the effects of biased sampling, structured sampling, and random sampling by experimentally pruning simulated trees (60 and 150 species) as well as a completely sampled empirical tree (58 species) and then applying the gamma statistic/MCCR test and birth-death likelihood models/AIC scores to assess rate changes. For trees with random species sampling, the true model (i.e., the one fitting the complete phylogenies) could be inferred in most cases. Oversampling deep nodes, however, strongly biases inferences toward downturns, with simulations of structured and biased sampling suggesting that this occurs when sampling percentages drop below 80%. The magnitude of the effect and the sensitivity of diversification rate models is such that a useful rule of thumb may be not to infer rate downturns from real trees unless they have >80% species sampling.

  11. The influence of molecular markers and methods on inferring the phylogenetic relationships between the representatives of the Arini (parrots, Psittaciformes), determined on the basis of their complete mitochondrial genomes.

    PubMed

    Urantowka, Adam Dawid; Kroczak, Aleksandra; Mackiewicz, Paweł

    2017-07-14

    Conures are a morphologically diverse group of Neotropical parrots classified as members of the tribe Arini, which has recently been subjected to a taxonomic revision. The previously broadly defined Aratinga genus of this tribe has been split into the 'true' Aratinga and three additional genera, Eupsittula, Psittacara and Thectocercus. Popular markers used in the reconstruction of the parrots' phylogenies derive from mitochondrial DNA. However, current phylogenetic analyses seem to indicate conflicting relationships between Aratinga and other conures, and also among other Arini members. Therefore, it is not clear if the mtDNA phylogenies can reliably define the species tree. The inconsistencies may result from the variable evolution rate of the markers used or their weak phylogenetic signal. To resolve these controversies and to assess to what extent the phylogenetic relationships in the tribe Arini can be inferred from mitochondrial genomes, we compared representative Arini mitogenomes as well as examined the usefulness of the individual mitochondrial markers and the efficiency of various phylogenetic methods. Single molecular markers produced inconsistent tree topologies, while different methods offered various topologies even for the same marker. A significant disagreement in these tree topologies occurred for cytb, nd2 and nd6 genes, which are commonly used in parrot phylogenies. The strongest phylogenetic signal was found in the control region and RNA genes. However, these markers cannot be used alone in inferring Arini phylogenies because they do not provide fully resolved trees. The most reliable phylogeny of the parrots under study is obtained only on the concatenated set of all mitochondrial markers. The analyses established significantly resolved relationships within the former Aratinga representatives and the main genera of the tribe Arini. Such mtDNA phylogeny can be in agreement with the species tree, owing to its match with synapomorphic features in plumage colouration. Phylogenetic relationships inferred from single mitochondrial markers can be incorrect and contradictory. Therefore, such phylogenies should be considered with caution. Reliable results can be produced by concatenated sets of all or at least the majority of mitochondrial genes and the control region. The results advance a new view on the relationships among the main genera of Arini and resolve the inconsistencies between the taxa that were previously classified as the broadly defined genus Aratinga. Although gene and species trees do not always have to be consistent, the mtDNA phylogenies for Arini can reflect the species tree.

  12. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea.

    PubMed

    McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip

    2012-03-01

    Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a 'taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408,315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.

  13. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

    PubMed Central

    McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip

    2012-01-01

    Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/. PMID:22134646

  14. One tree to link them all: a phylogenetic dataset for the European tetrapoda.

    PubMed

    Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

    2014-08-08

    Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty.

  15. Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement

    PubMed Central

    MacLeod, Dave; Charlebois, Robert L; Doolittle, Ford; Bapteste, Eric

    2005-01-01

    Background When organismal phylogenies based on sequences of single marker genes are poorly resolved, a logical approach is to add more markers, on the assumption that weak but congruent phylogenetic signal will be reinforced in such multigene trees. Such approaches are valid only when the several markers indeed have identical phylogenies, an issue which many multigene methods (such as the use of concatenated gene sequences or the assembly of supertrees) do not directly address. Indeed, even when the true history is a mixture of vertical descent for some genes and lateral gene transfer (LGT) for others, such methods produce unique topologies. Results We have developed software that aims to extract evidence for vertical and lateral inheritance from a set of gene trees compared against an arbitrary reference tree. This evidence is then displayed as a synthesis showing support over the tree for vertical inheritance, overlaid with explicit lateral gene transfer (LGT) events inferred to have occurred over the history of the tree. Like splits-tree methods, one can thus identify nodes at which conflict occurs. Additionally one can make reasonable inferences about vertical and lateral signal, assigning putative donors and recipients. Conclusion A tool such as ours can serve to explore the reticulated dimensionality of molecular evolution, by dissecting vertical and lateral inheritance at high resolution. By this, we mean that individual nodes can be examined not only for congruence, but also for coherence in light of LGT. We assert that our tools will facilitate the comparison of phylogenetic trees, and the interpretation of conflicting data. PMID:15819979

  16. Increased stem density and competition may diminish the positive effects of warming at alpine treeline.

    PubMed

    Wang, Yafeng; Pederson, Neil; Ellison, Aaron M; Buckley, Hannah L; Case, Bradley S; Liang, Eryuan; Julio Camarero, J

    2016-07-01

    The most widespread response to global warming among alpine treeline ecotones is not an upward shift, but an increase in tree density. However, the impact of increasing density on interactions among trees at treeline is not well understood. Here, we test if treeline densification induced by climatic warming leads to increasing intraspecific competition. We mapped and measured the size and age of Smith fir trees growing in two treelines located in the southeastern Tibetan Plateau. We used spatial point-pattern and codispersion analyses to describe the spatial association and covariation among seedlings, juveniles, and adults grouped in 30-yr age classes from the 1860s to the present. Effects of competition on tree height and regeneration were inferred from bivariate mark-correlations. Since the 1950s, a rapid densification occurred at both sites in response to climatic warming. Competition between adults and juveniles or seedlings at small scales intensified as density increased. Encroachment negatively affected height growth and further reduced recruitment around mature trees. We infer that tree recruitment at the studied treelines was more cold-limited prior to 1950 and shifted to a less temperature-constrained regime in response to climatic warming. Therefore, the ongoing densification and encroachment of alpine treelines could alter the way climate drives their transitions toward subalpine forests. © 2016 by the Ecological Society of America.

  17. How does climate influence xylem morphogenesis over the growing season? Insights from long-term intra-ring anatomy in Picea abies

    PubMed Central

    Fonti, Patrick; von Arx, Georg; Carrer, Marco

    2017-01-01

    Background and Aims During the growing season, the cambium of conifer trees produces successive rows of xylem cells, the tracheids, that sequentially pass through the phases of enlargement and secondary wall thickening before dying and becoming functional. Climate variability can strongly influence the kinetics of morphogenetic processes, eventually affecting tracheid shape and size. This study investigates xylem anatomical structure in the stem of Picea abies to retrospectively infer how, in the long term, climate affects the processes of cell enlargement and wall thickening. Methods Tracheid anatomical traits related to the phases of enlargement (diameter) and wall thickening (wall thickness) were innovatively inspected at the intra-ring level on 87-year-long tree-ring series in Picea abies trees along a 900 m elevation gradient in the Italian Alps. Anatomical traits in ten successive tree-ring sectors were related to daily temperature and precipitation data using running correlations. Key Results Close to the altitudinal tree limit, low early-summer temperature negatively affected cell enlargement. At lower elevation, water availability in early summer was positively related to cell diameter. The timing of these relationships shifted forward by about 20 (high elevation) to 40 (low elevation) d from the first to the last tracheids in the ring. Cell wall thickening was affected by climate in a different period in the season. In particular, wall thickness of late-formed tracheids was strongly positively related to August–September temperature at high elevation. Conclusions Morphogenesis of tracheids sequentially formed in the growing season is influenced by climate conditions in successive periods. The distinct climate impacts on cell enlargement and wall thickening indicate that different morphogenetic mechanisms are responsible for different tracheid traits. Our approach of long-term and high-resolution analysis of xylem anatomy can support and extend short-term xylogenesis observations, and increase our understanding of climate control of tree growth and functioning under different environmental conditions. PMID:28130220

  18. Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole J.

    2010-01-01

    One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms.

  19. Efficient Inference for Trees and Alignments: Modeling Monolingual and Bilingual Syntax with Hard and Soft Constraints and Latent Variables

    ERIC Educational Resources Information Center

    Smith, David Arthur

    2010-01-01

    Much recent work in natural language processing treats linguistic analysis as an inference problem over graphs. This development opens up useful connections between machine learning, graph theory, and linguistics. The first part of this dissertation formulates syntactic dependency parsing as a dynamic Markov random field with the novel…

  20. Hydrogen isotope fractionation in wood-producing avocado seedlings: Biological constraints to paleoclimatic interpretations of δD values in tree ring cellulose nitrate

    NASA Astrophysics Data System (ADS)

    Terwilliger, Valery J.; Deniro, Michael J.

    1995-12-01

    Climatic reconstructions from the δD values of wood cellulose nitrate have been compromised because it is unclear whether the isotopic ratios are affected only by temperature or by temperature and humidity. To quantify the effect of humidity on the δD values of leaf and wood cellulose nitrate, we grew avocados (Persea americana Mill. cv. Mexican) from seed at high and low humidities until they set wood. The source water for seed production was isotopically the same as the source water for seedling propagation. The δD values of leaf cellulose nitrate were related to those of leaf water, which were, in turn, influenced by humidity ( P < 0.01). The δD values of wood cellulose nitrate were unrelated to those of leaf water or any other indicator of humidity, but were related to the δD values of water in wood ( P ⩽ 0.05). The δD values of wood cellulose nitrate were identical in three out of five pairs of low and high humidity treatments. These results suggest humidity cannot be reliably inferred from δD values in wood cellulose nitrate. The δD values of cellulose nitrate in both leaves and wood appear to have been influenced by the incorporation of stored metabolites into cellulose. Trees, like avocado seedlings, have considerable post-photosynthetic organic reserves that can be tapped for growth. Conditions that stimulate use of post-photosynthetic carbon reserves are varied for trees. Significant contributions from these reserves could lead to erroneous temperature inferences from δD values of wood cellulose nitrate.

  1. Fundamentals and Recent Developments in Approximate Bayesian Computation

    PubMed Central

    Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka

    2017-01-01

    Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922

  2. SubClonal Hierarchy Inference from Somatic Mutations: Automatic Reconstruction of Cancer Evolutionary Trees from Multi-region Next Generation Sequencing

    PubMed Central

    Niknafs, Noushin; Beleva-Guthrie, Violeta; Naiman, Daniel Q.; Karchin, Rachel

    2015-01-01

    Recent improvements in next-generation sequencing of tumor samples and the ability to identify somatic mutations at low allelic fractions have opened the way for new approaches to model the evolution of individual cancers. The power and utility of these models is increased when tumor samples from multiple sites are sequenced. Temporal ordering of the samples may provide insight into the etiology of both primary and metastatic lesions and rationalizations for tumor recurrence and therapeutic failures. Additional insights may be provided by temporal ordering of evolving subclones—cellular subpopulations with unique mutational profiles. Current methods for subclone hierarchy inference tightly couple the problem of temporal ordering with that of estimating the fraction of cancer cells harboring each mutation. We present a new framework that includes a rigorous statistical hypothesis test and a collection of tools that make it possible to decouple these problems, which we believe will enable substantial progress in the field of subclone hierarchy inference. The methods presented here can be flexibly combined with methods developed by others addressing either of these problems. We provide tools to interpret hypothesis test results, which inform phylogenetic tree construction, and we introduce the first genetic algorithm designed for this purpose. The utility of our framework is systematically demonstrated in simulations. For most tested combinations of tumor purity, sequencing coverage, and tree complexity, good power (≥ 0.8) can be achieved and Type 1 error is well controlled when at least three tumor samples are available from a patient. Using data from three published multi-region tumor sequencing studies of (murine) small cell lung cancer, acute myeloid leukemia, and chronic lymphocytic leukemia, in which the authors reconstructed subclonal phylogenetic trees by manual expert curation, we show how different configurations of our tools can identify either a single tree in agreement with the authors, or a small set of trees, which include the authors’ preferred tree. Our results have implications for improved modeling of tumor evolution and the importance of multi-region tumor sequencing. PMID:26436540

  3. Graphical models for optimal power flow

    DOE PAGES

    Dvijotham, Krishnamurthy; Chertkov, Michael; Van Hentenryck, Pascal; ...

    2016-09-13

    Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithmmore » for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary tree-structured distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for “smart grid” applications like control of distributed energy resources. In conclusion, numerical evaluations on several benchmark networks show that practical OPF problems can be solved effectively using this approach.« less

  4. Phylogeny of the cycads based on multiple single copy nuclear genes: congruence of concatenation and species tree inference methods

    USDA-ARS?s Scientific Manuscript database

    Despite a recent new classification, a stable tree of life for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study we apply five single copy nuclear genes (SCNGs) to the phylogeny of the order Cycadales. We specifically aim to evaluate seve...

  5. Multispectral color photography for mineral exploration by the remote sensing of biogeochemical anomalies

    NASA Technical Reports Server (NTRS)

    Yost, E.

    1975-01-01

    Selected band multispectral photography was evaluated as a mineral exploration tool by detecting stress on trees caused by underground mineralization. Ground truth consisted of two test sites in the Prescott National Forest within which the mineralization had been established by a drilling program. Species of trees were categorized as background, intermediate, and anomalous based upon where they grew with respect to this underlying mineralization. Soil geochemistry and the metal content of ashed samples of the trees were studied in relation to the inferred locus of mineralization. Computer analysis of the reflectance spectra of mineralized trees confirmed that the relative percent reflectance differences of trees growing in anomalous areas was less than that of the same tree species growing in background areas.

  6. Molecular basis of angiosperm tree architecture.

    PubMed

    Hollender, Courtney A; Dardick, Chris

    2015-04-01

    The architecture of trees greatly impacts the productivity of orchards and forestry plantations. Amassing greater knowledge on the molecular genetics that underlie tree form can benefit these industries, as well as contribute to basic knowledge of plant developmental biology. This review describes the fundamental components of branch architecture, a prominent aspect of tree structure, as well as genetic and hormonal influences inferred from studies in model plant systems and from trees with non-standard architectures. The bulk of the molecular and genetic data described here is from studies of fruit trees and poplar, as these species have been the primary subjects of investigation in this field of science. No claim to original US Government works. New Phytologist © 2014 New Phytologist Trust.

  7. A Systematic Bayesian Integration of Epidemiological and Genetic Data

    PubMed Central

    Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin

    2015-01-01

    Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399

  8. Live phylogeny with polytomies: Finding the most compact parsimonious trees.

    PubMed

    Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G

    2017-08-01

    Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Polynomial-Time Algorithms for Building a Consensus MUL-Tree

    PubMed Central

    Cui, Yun; Jansson, Jesper

    2012-01-01

    Abstract A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host–parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. PMID:22963134

  10. Polynomial-time algorithms for building a consensus MUL-tree.

    PubMed

    Cui, Yun; Jansson, Jesper; Sung, Wing-Kin

    2012-09-01

    A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.

  11. So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

    PubMed Central

    2018-01-01

    Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity. PMID:29772020

  12. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    PubMed

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. So many genes, so little time: A practical approach to divergence-time estimation in the genomic era.

    PubMed

    Smith, Stephen A; Brown, Joseph W; Walker, Joseph F

    2018-01-01

    Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that "gene shopping" can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.

  14. Combining morphometrics with molecular taxonomy: how different are similar foliose keratose sponges from the Australian tropics?

    PubMed

    Abdul Wahab, M A; Fromont, J; Whalan, S; Webster, N; Andreakis, N

    2014-04-01

    Sponge taxonomy can be challenging as many groups exhibit extreme morphological plasticity induced by local environmental conditions. Foliose keratose sponges of the sub-family Phyllospongiinae (Dictyoceratida, Thorectidae: Strepsichordaia, Phyllospongia and Carteriospongia) are commonly found in intertidal and subtidal habitats of the Indo-Pacific. Lacking spicules, these sponges can be difficult to differentiate due to the lack of reliable morphological characters for species delineation. We use molecular phylogenies inferred from the nuclear Internal Transcribed Spacer 2 region (ITS2) and morphometrics (19 characters; 52 character states) to identify evolutionarily significant units (ESUs; sensu Moritz) within foliose Phyllosponginiids collected from seven geographic locations across tropical eastern and Western Australia. The ITS2 topology was congruent with the tree derived from Bayesian inference of discrete morphological characters supporting expected taxonomic relationships at the genus level and the identification of five ESUs. However, phylogenies inferred from the ITS2 marker revealed multiple sequence clusters, some of which were characterised by distinct morphological features and specific geographic ranges. Our results are discussed in light of taxonomic incongruences within this study, hidden sponge diversity and the role of vicariant events in influencing present day distribution patterns. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. EAPhy: A Flexible Tool for High-throughput Quality Filtering of Exon-alignments and Data Processing for Phylogenetic Methods.

    PubMed

    Blom, Mozes P K

    2015-08-05

    Recently developed molecular methods enable geneticists to target and sequence thousands of orthologous loci and infer evolutionary relationships across the tree of life. Large numbers of genetic markers benefit species tree inference but visual inspection of alignment quality, as traditionally conducted, is challenging with thousands of loci. Furthermore, due to the impracticality of repeated visual inspection with alternative filtering criteria, the potential consequences of using datasets with different degrees of missing data remain nominally explored in most empirical phylogenomic studies. In this short communication, I describe a flexible high-throughput pipeline designed to assess alignment quality and filter exonic sequence data for subsequent inference. The stringency criteria for alignment quality and missing data can be adapted based on the expected level of sequence divergence. Each alignment is automatically evaluated based on the stringency criteria specified, significantly reducing the number of alignments that require visual inspection. By developing a rapid method for alignment filtering and quality assessment, the consistency of phylogenetic estimation based on exonic sequence alignments can be further explored across distinct inference methods, while accounting for different degrees of missing data.

  16. Inference of Genotype–Phenotype Relationships in the Antigenic Evolution of Human Influenza A (H3N2) Viruses

    PubMed Central

    Steinbrück, Lars; McHardy, Alice Carolyn

    2012-01-01

    Distinguishing mutations that determine an organism's phenotype from (near-) neutral ‘hitchhikers’ is a fundamental challenge in genome research, and is relevant for numerous medical and biotechnological applications. For human influenza viruses, recognizing changes in the antigenic phenotype and a strains' capability to evade pre-existing host immunity is important for the production of efficient vaccines. We have developed a method for inferring ‘antigenic trees’ for the major viral surface protein hemagglutinin. In the antigenic tree, antigenic weights are assigned to all tree branches, which allows us to resolve the antigenic impact of the associated amino acid changes. Our technique predicted antigenic distances with comparable accuracy to antigenic cartography. Additionally, it identified both known and novel sites, and amino acid changes with antigenic impact in the evolution of influenza A (H3N2) viruses from 1968 to 2003. The technique can also be applied for inference of ‘phenotype trees’ and genotype–phenotype relationships from other types of pairwise phenotype distances. PMID:22532796

  17. Universal artifacts affect the branching of phylogenetic trees, not universal scaling laws.

    PubMed

    Altaba, Cristian R

    2009-01-01

    The superficial resemblance of phylogenetic trees to other branching structures allows searching for macroevolutionary patterns. However, such trees are just statistical inferences of particular historical events. Recent meta-analyses report finding regularities in the branching pattern of phylogenetic trees. But is this supported by evidence, or are such regularities just methodological artifacts? If so, is there any signal in a phylogeny? In order to evaluate the impact of polytomies and imbalance on tree shape, the distribution of all binary and polytomic trees of up to 7 taxa was assessed in tree-shape space. The relationship between the proportion of outgroups and the amount of imbalance introduced with them was assessed applying four different tree-building methods to 100 combinations from a set of 10 ingroup and 9 outgroup species, and performing covariance analyses. The relevance of this analysis was explored taking 61 published phylogenies, based on nucleic acid sequences and involving various taxa, taxonomic levels, and tree-building methods. All methods of phylogenetic inference are quite sensitive to the artifacts introduced by outgroups. However, published phylogenies appear to be subject to a rather effective, albeit rather intuitive control against such artifacts. The data and methods used to build phylogenetic trees are varied, so any meta-analysis is subject to pitfalls due to their uneven intrinsic merits, which translate into artifacts in tree shape. The binary branching pattern is an imposition of methods, and seldom reflects true relationships in intraspecific analyses, yielding artifactual polytomies in short trees. Above the species level, the departure of real trees from simplistic random models is caused at least by two natural factors--uneven speciation and extinction rates; and artifacts such as choice of taxa included in the analysis, and imbalance introduced by outgroups and basal paraphyletic taxa. This artifactual imbalance accounts for tree shape convergence of large trees. There is no evidence for any universal scaling in the tree of life. Instead, there is a need for improved methods of tree analysis that can be used to discriminate the noise due to outgroups from the phylogenetic signal within the taxon of interest, and to evaluate realistic models of evolution, correcting the retrospective perspective and explicitly recognizing extinction as a driving force. Artifacts are pervasive, and can only be overcome through understanding the structure and biological meaning of phylogenetic trees. Catalan Abstract in Translation S1.

  18. Genome-wide SNP data suggest complex ancestry of sympatric North Pacific killer whale ecotypes.

    PubMed

    Foote, A D; Morin, P A

    2016-11-01

    Three ecotypes of killer whale occur in partial sympatry in the North Pacific. Individuals assortatively mate within the same ecotype, resulting in correlated ecological and genetic differentiation. A key question is whether this pattern of evolutionary divergence is an example of incipient sympatric speciation from a single panmictic ancestral population, or whether sympatry could have resulted from multiple colonisations of the North Pacific and secondary contact between ecotypes. Here, we infer multilocus coalescent trees from >1000 nuclear single-nucleotide polymorphisms (SNPs) and find evidence of incomplete lineage sorting so that the genealogies of SNPs do not all conform to a single topology. To disentangle whether uncertainty in the phylogenetic inference of the relationships among ecotypes could also result from ancestral admixture events we reconstructed the relationship among the ecotypes as an admixture graph and estimated f 4 -statistics using TreeMix. The results were consistent with episodes of admixture between two of the North Pacific ecotypes and the two outgroups (populations from the Southern Ocean and the North Atlantic). Gene flow may have occurred via unsampled 'ghost' populations rather than directly between the populations sampled here. Our results indicate that because of ancestral admixture events and incomplete lineage sorting, a single bifurcating tree does not fully describe the relationship among these populations. The data are therefore most consistent with the genomic variation among North Pacific killer whale ecotypes resulting from multiple colonisation events, and secondary contact may have facilitated evolutionary divergence. Thus, the present-day populations of North Pacific killer whale ecotypes have a complex ancestry, confounding the tree-based inference of ancestral geography.

  19. Seasonal climate signals from multiple tree ring metrics: A case study of Pinus ponderosa in the upper Columbia River Basin

    NASA Astrophysics Data System (ADS)

    Dannenberg, Matthew P.; Wise, Erika K.

    2016-04-01

    Projected changes in the seasonality of hydroclimatic regimes are likely to have important implications for water resources and terrestrial ecosystems in the U.S. Pacific Northwest. The tree ring record, which has frequently been used to position recent changes in a longer-term context, typically relies on signals embedded in the total ring width of tree rings. Additional climatic inferences at a subannual temporal scale can be made using alternative tree ring metrics such as earlywood and latewood widths and the density of tree ring latewood. Here we examine seasonal precipitation and temperature signals embedded in total ring width, earlywood width, adjusted latewood width, and blue intensity chronologies from a network of six Pinus ponderosa sites in and surrounding the upper Columbia River Basin of the U.S. Pacific Northwest. We also evaluate the potential for combining multiple tree ring metrics together in reconstructions of past cool- and warm-season precipitation. The common signal among all metrics and sites is related to warm-season precipitation. Earlywood and latewood widths differ primarily in their sensitivity to conditions in the year prior to growth. Total and earlywood widths from the lowest elevation sites also reflect cool-season moisture. Effective correlation analyses and composite-plus-scale tests suggest that combining multiple tree ring metrics together may improve reconstructions of warm-season precipitation. For cool-season precipitation, total ring width alone explains more variance than any other individual metric or combination of metrics. The composite-plus-scale tests show that variance-scaled precipitation reconstructions in the upper Columbia River Basin may be asymmetric in their ability to capture extreme events.

  20. Multi-gene phylogeny of jacks and pompanos (Carangidae), including placement of monotypic vadigo Campogramma glaycos.

    PubMed

    Damerau, M; Freese, M; Hanel, R

    2018-01-01

    In this study, the phylogenetic trees of jacks and pompanos (Carangidae), an ecologically and morphologically diverse, globally distributed fish family, are inferred from a complete, concatenated data set of two mitochondrial (cytochrome c oxidase I, cytochrome b) loci and one nuclear (myosin heavy chain 6) locus. Maximum likelihood and Bayesian inferences are largely congruent and show a clear separation of Carangidae into the four subfamilies: Scomberoidinae, Trachinotinae, Naucratinae and Caranginae. The inclusion of the carangid sister lineages Coryphaenidae (dolphinfishes) and Rachycentridae (cobia), however, render Carangidae paraphyletic. The phylogenetic trees also show with high statistical support that the monotypic vadigo Campogramma glaycos is the sister to all other species within the Naucratinae. © 2017 The Fisheries Society of the British Isles.

  1. Prehistoric human influence on the abundance and distribution of deadwood in alpine landscapes

    Treesearch

    Donald K. Grayson; Constance I. Millar

    2008-01-01

    Scientists have long inferred the locations of past treelines from the distribution of deadwood above modern tree boundaries. Although it is recognized that deadwood above treeline may have decayed, the absence of such wood is routinely taken to imply the absence of trees for periods ranging from the past few millennia to the entire Holocene. Reconstructed treeline...

  2. Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.

    PubMed

    Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab

    2017-09-01

    Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.

  3. Phylogenetic Invariants for Metazoan Mitochondrial Genome Evolution.

    PubMed

    Sankoff; Blanchette

    1998-01-01

    The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable N-tuples, which are identically zero, over all choices of branch length, for some trees. Evaluating the invariants associated with all possible trees, using observed N-tuple frequencies over all sequence positions, enables us to rapidly infer the generating tree. An aspect of evolution at the genomic level much studied recently is the rearrangements of gene order along the chromosome from one species to another. Instead of the substitutions responsible for sequence evolution, we examine the non-local processes responsible for genome rearrangements such as inversion of arbitrarily long segments of chromosomes. By treating the potential adjacency of each possible pair of genes as a position", an appropriate substitution" model can be recognized as governing the rearrangement process, and a probabilistically principled phylogenetic inference can be set up. We calculate the invariants for this process for N=5, and apply them to mitochondrial genome data from coelomate metazoans, showing how they resolve key aspects of branching order.

  4. Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus).

    PubMed

    Moody, Michael L; Rieseberg, Loren H

    2012-07-01

    The annual sunflowers (Helianthus sect. Helianthus) present a formidable challenge for phylogenetic inference because of ancient hybrid speciation, recent introgression, and suspected issues with deep coalescence. Here we analyze sequence data from 11 nuclear DNA (nDNA) genes for multiple genotypes of species within the section to (1) reconstruct the phylogeny of this group, (2) explore the utility of nDNA gene trees for detecting hybrid speciation and introgression; and (3) test an empirical method of hybrid identification based on the phylogenetic congruence of nDNA gene trees from tightly linked genes. We uncovered considerable topological heterogeneity among gene trees with or without three previously identified hybrid species included in the analyses, as well as a general lack of reciprocal monophyly of species. Nonetheless, partitioned Bayesian analyses provided strong support for the reciprocal monophyly of all species except H. annuus (0.89 PP), the most widespread and abundant annual sunflower. Previous hypotheses of relationships among taxa were generally strongly supported (1.0 PP), except among taxa typically associated with H. annuus, apparently due to the paraphyly of the latter in all gene trees. While the individual nDNA gene trees provided a useful means for detecting recent hybridization, identification of ancient hybridization was problematic for all ancient hybrid species, even when linkage was considered. We discuss biological factors that affect the efficacy of phylogenetic methods for hybrid identification.

  5. Understanding the Scalability of Bayesian Network Inference using Clique Tree Growth Curves

    NASA Technical Reports Server (NTRS)

    Mengshoel, Ole Jakob

    2009-01-01

    Bayesian networks (BNs) are used to represent and efficiently compute with multi-variate probability distributions in a wide range of disciplines. One of the main approaches to perform computation in BNs is clique tree clustering and propagation. In this approach, BN computation consists of propagation in a clique tree compiled from a Bayesian network. There is a lack of understanding of how clique tree computation time, and BN computation time in more general, depends on variations in BN size and structure. On the one hand, complexity results tell us that many interesting BN queries are NP-hard or worse to answer, and it is not hard to find application BNs where the clique tree approach in practice cannot be used. On the other hand, it is well-known that tree-structured BNs can be used to answer probabilistic queries in polynomial time. In this article, we develop an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN's non-root nodes to the number of root nodes, or (ii) the expected number of moral edges in their moral graphs. Our approach is based on combining analytical and experimental results. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for each set. For the special case of bipartite BNs, we consequently have two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, we systematically increase the degree of the root nodes in bipartite Bayesian networks, and find that root clique growth is well-approximated by Gompertz growth curves. It is believed that this research improves the understanding of the scaling behavior of clique tree clustering, provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms, and presents an aid for analytical trade-off studies of clique tree clustering using growth curves.

  6. Phylogenetic affinity of tree shrews to Glires is attributed to fast evolution rate.

    PubMed

    Lin, Jiannan; Chen, Guangfeng; Gu, Liang; Shen, Yuefeng; Zheng, Meizhu; Zheng, Weisheng; Hu, Xinjie; Zhang, Xiaobai; Qiu, Yu; Liu, Xiaoqing; Jiang, Cizhong

    2014-02-01

    Previous phylogenetic analyses have led to incongruent evolutionary relationships between tree shrews and other suborders of Euarchontoglires. What caused the incongruence remains elusive. In this study, we identified 6845 orthologous genes between seventeen placental mammals. Tree shrews and Primates were monophyletic in the phylogenetic trees derived from the first or/and second codon positions whereas tree shrews and Glires formed a monophyly in the trees derived from the third or all codon positions. The same topology was obtained in the phylogeny inference using the slowly and fast evolving genes, respectively. This incongruence was likely attributed to the fast substitution rate in tree shrews and Glires. Notably, sequence GC content only was not informative to resolve the controversial phylogenetic relationships between tree shrews, Glires, and Primates. Finally, estimation in the confidence of the tree selection strongly supported the phylogenetic affiliation of tree shrews to Primates as a monophyly. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Scale dependence of disease impacts on quaking aspen (Populus tremuloides) mortality in the southwestern United States

    USGS Publications Warehouse

    Bell, David M.; Bradford, John B.; Lauenroth, William K.

    2015-01-01

    By examining variation in disease prevalence, mortality of healthy trees, and mortality of diseased trees, we showed that the role of disease in aspen tree mortality depended on the scale of inference. For variation among individuals in diameter, disease tended to expose intermediate-size trees experiencing moderate risk to greater risk. For spatial variation in summer temperature, disease exposed lower risk populations to greater mortality probabilities, but the magnitude of this exposure depended on summer precipitation. Furthermore, the importance of diameter and slenderness in mediating responses to climate supports the increasing emphasis on trait variation in studies of ecological responses to global change.

  8. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

    PubMed

    Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

    2013-07-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology.

  9. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

    PubMed Central

    Kuraku, Shigehiro; Zmasek, Christian M.; Nishimura, Osamu; Katoh, Kazutaka

    2013-01-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614

  10. A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories.

    PubMed

    Knowles, L Lacey; Huang, Huateng; Sukumaran, Jeet; Smith, Stephen A

    2018-03-01

    Discordant gene trees are commonly encountered when sequences from thousands of loci are applied to estimate phylogenetic relationships. Several processes contribute to this discord. Yet, we have no methods that jointly model different sources of conflict when estimating phylogenies. An alternative to analyzing entire genomes or all the sequenced loci is to identify a subset of loci for phylogenetic analysis. If we can identify data partitions that are most likely to reflect descent from a common ancestor (i.e., discordant loci that indeed reflect incomplete lineage sorting [ILS], as opposed to some other process, such as lateral gene transfer [LGT]), we can analyze this subset using powerful coalescent-based species-tree approaches. Test data sets were simulated where discord among loci could arise from ILS and LGT. Data sets where analyzed using the newly developed program CLASSIPHY (Huang et al., ) to assess whether our ability to distinguish the cause of discord among loci varied when ILS and LGT occurred in the recent versus deep past and whether the accuracy of these inferences were affected by the mutational process. We show that accuracy of probabilistic classification of individual loci by the cause of discord differed when ILS and LGT events occurred more recently compared with the distant past and that the signal-to-noise ratio arising from the mutational process contributes to difficulties in inferring LGT data partitions. We discuss our findings in terms of the promise and limitations of identifying subsets of loci for species-tree inference that will not violate the underlying coalescent model (i.e., data partitions in which ILS, and not LGT, contributes to discord). We also discuss the empirical implications of our work given the many recalcitrant nodes in the tree of life (e.g., origins of angiosperms, amniotes, or Neoaves), and recent arguments for concatenating loci. © 2018 Botanical Society of America.

  11. Phylogeny of the cycads based on multiple single-copy nuclear genes: congruence of concatenated parsimony, likelihood and species tree inference methods

    PubMed Central

    Salas-Leiva, Dayana E.; Meerow, Alan W.; Calonje, Michael; Griffith, M. Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W.; Lewis, Carl E.; Namoff, Sandra

    2013-01-01

    Background and aims Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree–species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. Methods DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree–species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Key Results Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia–Lepidozamia–Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. Conclusions A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae. PMID:23997230

  12. Response of sphagnum peatland testate amoebae to a 1-year transplantation experiment along an artificial hydrological gradient.

    PubMed

    Marcisz, Katarzyna; Fournier, Bertrand; Gilbert, Daniel; Lamentowicz, Mariusz; Mitchell, Edward A D

    2014-05-01

    Peatland testate amoebae (TA) are well-established bioindicators for depth to water table (DWT), but effects of hydrological changes on TA communities have never been tested experimentally. We tested this in a field experiment by placing Sphagnum carpets (15 cm diameter) collected in hummock, lawn and pool microsites (origin) at three local conditions (dry, moist and wet) using trenches dug in a peatland. One series of samples was seeded with microorganism extract from all microsites. TA community were analysed at T0: 8-2008, T1: 5-2009 and T2: 8-2009. We analysed the data using conditional inference trees, principal response curves (PRC) and DWT inferred from TA communities using a transfer function used for paleoecological reconstruction. Density declined from T0 to T1 and then increased sharply by T2. Species richness, Simpson diversity and Simpson evenness were lower at T2 than at T0 and T1. Seeded communities had higher species richness in pool samples at T0. Pool samples tended to have higher density, lower species richness, Simpson diversity and Simpson Evenness than hummock and/or lawn samples until T1. In the PRC, the effect of origin was significant at T0 and T1, but the effect faded away by T2. Seeding effect was strongest at T1 and lowest vanished by T2. Local condition effect was strong but not in line with the wetness gradient at T1 but started to reflect it by T2. Likewise, TA-inferred DWT started to match the experimental conditions by T2, but more so in hummock and lawn samples than in pool samples. This study confirmed that TA responds to hydrological changes over a 1-year period. However, sensitivity of TA to hydrological fluctuations, and thus the accuracy of inferred DWT changes, was habitat specific, pool TA communities being least responsive to environmental changes. Lawns and hummocks may be thus better suited than pools for paleoecological reconstructions. This, however, contrasts with the higher prediction error and species' tolerance for DWT with increasing dryness observed in transfer function models.

  13. Using a stochastic model and cross-scale analysis to evaluate controls on historical low-severity fire regimes

    Treesearch

    Maureen C. Kennedy; Donald McKenzie

    2010-01-01

    Fire-scarred trees provide a deep temporal record of historical fire activity, but identifying the mechanisms therein that controlled landscape fire patterns is not straightforward. We use a spatially correlated metric for fire co-occurrence between pairs of trees (the Sørensen distance variogram), with output from a neutral model for fire history, to infer the...

  14. Summer drought reconstruction in northeastern Spain inferred from a tree ring latewood network since 1734

    NASA Astrophysics Data System (ADS)

    Tejedor, E.; Saz, M. A.; Esper, J.; Cuadrat, J. M.; de Luis, M.

    2017-08-01

    Drought recurrence in the Mediterranean is regarded as a fundamental factor for socioeconomic development and the resilience of natural systems in context of global change. However, knowledge of past droughts has been hampered by the absence of high-resolution proxies. We present a drought reconstruction for the northeast of the Iberian Peninsula based on a new dendrochronology network considering the Standardized Evapotranspiration Precipitation Index (SPEI). A total of 774 latewood width series from 387 trees of P. sylvestris and P. uncinata was combined in an interregional chronology. The new chronology, calibrated against gridded climate data, reveals a robust relationship with the SPEI representing drought conditions of July and August. We developed a summer drought reconstruction for the period 1734-2013 representative for the northeastern and central Iberian Peninsula. We identified 16 extremely dry and 17 extremely wet summers and four decadal scale dry and wet periods, including 2003-2013 as the driest episode of the reconstruction.

  15. The evolutionary history of ferns inferred from 25 low-copy nuclear genes.

    PubMed

    Rothfels, Carl J; Li, Fay-Wei; Sigel, Erin M; Huiet, Layne; Larsson, Anders; Burge, Dylan O; Ruhsam, Markus; Deyholos, Michael; Soltis, Douglas E; Stewart, C Neal; Shaw, Shane W; Pokorny, Lisa; Chen, Tao; dePamphilis, Claude; DeGironimo, Lisa; Chen, Li; Wei, Xiaofeng; Sun, Xiao; Korall, Petra; Stevenson, Dennis W; Graham, Sean W; Wong, Gane K-S; Pryer, Kathleen M

    2015-07-01

    • Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies. © 2015 Botanical Society of America, Inc.

  16. System and method for responding to ground and flight system malfunctions

    NASA Technical Reports Server (NTRS)

    Anderson, Julie J. (Inventor); Fussell, Ronald M. (Inventor)

    2010-01-01

    A system for on-board anomaly resolution for a vehicle has a data repository. The data repository stores data related to different systems, subsystems, and components of the vehicle. The data stored is encoded in a tree-based structure. A query engine is coupled to the data repository. The query engine provides a user and automated interface and provides contextual query to the data repository. An inference engine is coupled to the query engine. The inference engine compares current anomaly data to contextual data stored in the data repository using inference rules. The inference engine generates a potential solution to the current anomaly by referencing the data stored in the data repository.

  17. Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

    PubMed

    Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

    2016-01-01

    We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.

  18. Projecting range-wide sun bear population trends using tree cover and camera-trap bycatch data.

    PubMed

    Scotson, Lorraine; Fredriksson, Gabriella; Ngoprasert, Dusit; Wong, Wai-Ming; Fieberg, John

    2017-01-01

    Monitoring population trends of threatened species requires standardized techniques that can be applied over broad areas and repeated through time. Sun bears Helarctos malayanus are a forest dependent tropical bear found throughout most of Southeast Asia. Previous estimates of global population trends have relied on expert opinion and cannot be systematically replicated. We combined data from 1,463 camera traps within 31 field sites across sun bear range to model the relationship between photo catch rates of sun bears and tree cover. Sun bears were detected in all levels of tree cover above 20%, and the probability of presence was positively associated with the amount of tree cover within a 6-km2 buffer of the camera traps. We used the relationship between catch rates and tree cover across space to infer temporal trends in sun bear abundance in response to tree cover loss at country and global-scales. Our model-based projections based on this "space for time" substitution suggested that sun bear population declines associated with tree cover loss between 2000-2014 in mainland southeast Asia were ~9%, with declines highest in Cambodia and lowest in Myanmar. During the same period, sun bear populations in insular southeast Asia (Malaysia, Indonesia and Brunei) were projected to have declined at a much higher rate (22%). Cast forward over 30-years, from the year 2000, by assuming a constant rate of change in tree cover, we projected population declines in the insular region that surpassed 50%, meeting the IUCN criteria for endangered if sun bears were listed on the population level. Although this approach requires several assumptions, most notably that trends in abundance across space can be used to infer temporal trends, population projections using remotely sensed tree cover data may serve as a useful alternative (or supplement) to expert opinion. The advantages of this approach is that it is objective, data-driven, repeatable, and it requires that all assumptions be clearly stated.

  19. Comparative phylogeography of a coevolved community: concerted population expansions in Joshua trees and four yucca moths

    USGS Publications Warehouse

    Smith, Christopher Irwin; Tank, Shantel; Godsoe, William; Levenick, Jim; Strand, Eva; Esque, Todd C.; Pellmyr, Olle

    2011-01-01

    Comparative phylogeographic studies have had mixed success in identifying common phylogeographic patterns among co-distributed organisms. Whereas some have found broadly similar patterns across a diverse array of taxa, others have found that the histories of different species are more idiosyncratic than congruent. The variation in the results of comparative phylogeographic studies could indicate that the extent to which sympatrically-distributed organisms share common biogeographic histories varies depending on the strength and specificity of ecological interactions between them. To test this hypothesis, we examined demographic and phylogeographic patterns in a highly specialized, coevolved community – Joshua trees (Yucca brevifolia) and their associated yucca moths. This tightly-integrated, mutually interdependent community is known to have experienced significant range changes at the end of the last glacial period, so there is a strong a priori expectation that these organisms will show common signatures of demographic and distributional changes over time. Using a database of >5000 GPS records for Joshua trees, and multi-locus DNA sequence data from the Joshua tree and four species of yucca moth, we combined paleaodistribution modeling with coalescent-based analyses of demographic and phylgeographic history. We extensively evaluated the power of our methods to infer past population size and distributional changes by evaluating the effect of different inference procedures on our results, comparing our palaeodistribution models to Pleistocene-aged packrat midden records, and simulating DNA sequence data under a variety of alternative demographic histories. Together the results indicate that these organisms have shared a common history of population expansion, and that these expansions were broadly coincident in time. However, contrary to our expectations, none of our analyses indicated significant range or population size reductions at the end of the last glacial period, and the inferred demographic changes substantially predate Holocene climate changes.

  20. Comparative phylogeography of a coevolved community: Concerted population expansions in Joshua trees and four Yucca moths

    USGS Publications Warehouse

    Smith, C.I.; Tank, S.; Godsoe, W.; Levenick, J.; Strand, Espen; Esque, T.; Pellmyr, O.

    2011-01-01

    Comparative phylogeographic studies have had mixed success in identifying common phylogeographic patterns among co-distributed organisms. Whereas some have found broadly similar patterns across a diverse array of taxa, others have found that the histories of different species are more idiosyncratic than congruent. The variation in the results of comparative phylogeographic studies could indicate that the extent to which sympatrically-distributed organisms share common biogeographic histories varies depending on the strength and specificity of ecological interactions between them. To test this hypothesis, we examined demographic and phylogeographic patterns in a highly specialized, coevolved community - Joshua trees (Yucca brevifolia) and their associated yucca moths. This tightly-integrated, mutually interdependent community is known to have experienced significant range changes at the end of the last glacial period, so there is a strong a priori expectation that these organisms will show common signatures of demographic and distributional changes over time. Using a database of >5000 GPS records for Joshua trees, and multi-locus DNA sequence data from the Joshua tree and four species of yucca moth, we combined paleaodistribution modeling with coalescent-based analyses of demographic and phylgeographic history. We extensively evaluated the power of our methods to infer past population size and distributional changes by evaluating the effect of different inference procedures on our results, comparing our palaeodistribution models to Pleistocene-aged packrat midden records, and simulating DNA sequence data under a variety of alternative demographic histories. Together the results indicate that these organisms have shared a common history of population expansion, and that these expansions were broadly coincident in time. However, contrary to our expectations, none of our analyses indicated significant range or population size reductions at the end of the last glacial period, and the inferred demographic changes substantially predate Holocene climate changes.

  1. Genome-wide SNP data suggest complex ancestry of sympatric North Pacific killer whale ecotypes

    PubMed Central

    Foote, A D; Morin, P A

    2016-01-01

    Three ecotypes of killer whale occur in partial sympatry in the North Pacific. Individuals assortatively mate within the same ecotype, resulting in correlated ecological and genetic differentiation. A key question is whether this pattern of evolutionary divergence is an example of incipient sympatric speciation from a single panmictic ancestral population, or whether sympatry could have resulted from multiple colonisations of the North Pacific and secondary contact between ecotypes. Here, we infer multilocus coalescent trees from >1000 nuclear single-nucleotide polymorphisms (SNPs) and find evidence of incomplete lineage sorting so that the genealogies of SNPs do not all conform to a single topology. To disentangle whether uncertainty in the phylogenetic inference of the relationships among ecotypes could also result from ancestral admixture events we reconstructed the relationship among the ecotypes as an admixture graph and estimated f4-statistics using TreeMix. The results were consistent with episodes of admixture between two of the North Pacific ecotypes and the two outgroups (populations from the Southern Ocean and the North Atlantic). Gene flow may have occurred via unsampled ‘ghost' populations rather than directly between the populations sampled here. Our results indicate that because of ancestral admixture events and incomplete lineage sorting, a single bifurcating tree does not fully describe the relationship among these populations. The data are therefore most consistent with the genomic variation among North Pacific killer whale ecotypes resulting from multiple colonisation events, and secondary contact may have facilitated evolutionary divergence. Thus, the present-day populations of North Pacific killer whale ecotypes have a complex ancestry, confounding the tree-based inference of ancestral geography. PMID:27485668

  2. Species trees from consensus single nucleotide polymorphism (SNP) data: Testing phylogenetic approaches with simulated and empirical data.

    PubMed

    Schmidt-Lebuhn, Alexander N; Aitken, Nicola C; Chuah, Aaron

    2017-11-01

    Datasets of hundreds or thousands of SNPs (Single Nucleotide Polymorphisms) from multiple individuals per species are increasingly used to study population structure, species delimitation and shallow phylogenetics. The principal software tool to infer species or population trees from SNP data is currently the BEAST template SNAPP which uses a Bayesian coalescent analysis. However, it is computationally extremely demanding and tolerates only small amounts of missing data. We used simulated and empirical SNPs from plants (Australian Craspedia, Asteraceae, and Pelargonium, Geraniaceae) to compare species trees produced (1) by SNAPP, (2) using SVD quartets, and (3) using Bayesian and parsimony analysis with several different approaches to summarising data from multiple samples into one set of traits per species. Our aims were to explore the impact of tree topology and missing data on the results, and to test which data summarising and analyses approaches would best approximate the results obtained from SNAPP for empirical data. SVD quartets retrieved the correct topology from simulated data, as did SNAPP except in the case of a very unbalanced phylogeny. Both methods failed to retrieve the correct topology when large amounts of data were missing. Bayesian analysis of species level summary data scoring the two alleles of each SNP as independent characters and parsimony analysis of data scoring each SNP as one character produced trees with branch length distributions closest to the true trees on which SNPs were simulated. For empirical data, Bayesian inference and Dollo parsimony analysis of data scored allele-wise produced phylogenies most congruent with the results of SNAPP. In the case of study groups divergent enough for missing data to be phylogenetically informative (because of additional mutations preventing amplification of genomic fragments or bioinformatic establishment of homology), scoring of SNP data as a presence/absence matrix irrespective of allele content might be an additional option. As this depends on sampling across species being reasonably even and a random distribution of non-informative instances of missing data, however, further exploration of this approach is needed. Properly chosen data summary approaches to inferring species trees from SNP data may represent a potential alternative to currently available individual-level coalescent analyses especially for quick data exploration and when dealing with computationally demanding or patchy datasets. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.

  3. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

    PubMed

    Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

    2016-12-01

    The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

  4. Phylogenetic search through partial tree mixing

    PubMed Central

    2012-01-01

    Background Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. Results When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda Conclusions The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution. PMID:23320449

  5. Rapid Multi-Locus Sequence Typing Using Microfluidic Biochips

    DTIC Science & Technology

    2010-05-12

    Sequence Types. The evolutionary history of all the B. cereus MLST concatenated Sequence Types (545 taxa, 2,394 nucleotide positions) was inferred using...the Neighbor-Joining method [28]. The bootstrap consensus tree inferred from 100 replicates was taken to represent the evolutionary history of the... Chlamydia (manuscript in preparation) and performed pilot studies on Staphylococcus aureus and Streptoccus pneumoniae (Data S4 and Text S2). Another potential

  6. PhyloExplorer: a web server to validate, explore and query phylogenetic trees

    PubMed Central

    Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

    2009-01-01

    Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253

  7. Mountain landscapes offer few opportunities for high-elevation tree species migration

    USGS Publications Warehouse

    Bell, David M.; Bradford, John B.; Lauenroth, William K.

    2014-01-01

    Climate change is anticipated to alter plant species distributions. Regional context, notably the spatial complexity of climatic gradients, may influence species migration potential. While high-elevation species may benefit from steep climate gradients in mountain regions, their persistence may be threatened by limited suitable habitat as land area decreases with elevation. To untangle these apparently contradictory predictions for mountainous regions, we evaluated the climatic suitability of four coniferous forest tree species of the western United States based on species distribution modeling (SDM) and examined changes in climatically suitable areas under predicted climate change. We used forest structural information relating to tree species dominance, productivity, and demography from an extensive forest inventory system to assess the strength of inferences made with a SDM approach. We found that tree species dominance, productivity, and recruitment were highest where climatic suitability (i.e., probability of species occurrence under certain climate conditions) was high, supporting the use of predicted climatic suitability in examining species risk to climate change. By predicting changes in climatic suitability over the next century, we found that climatic suitability will likely decline, both in areas currently occupied by each tree species and in nearby unoccupied areas to which species might migrate in the future. These trends were most dramatic for high elevation species. Climatic changes predicted over the next century will dramatically reduce climatically suitable areas for high-elevation tree species while a lower elevation species, Pinus ponderosa, will be well positioned to shift upslope across the region. Reductions in suitable area for high-elevation species imply that even unlimited migration would be insufficient to offset predicted habitat loss, underscoring the vulnerability of these high-elevation species to climatic changes.

  8. Insights into plant water uptake from xylem-water isotope measurements in two tropical catchments with contrasting moisture conditions

    USGS Publications Warehouse

    Evaristo, Jaivime; McDonnell, Jeffrey J.; Scholl, Martha A.; Bruijnzeel, L. Adrian; Chun, Kwok P.

    2016-01-01

    Water transpired by trees has long been assumed to be sourced from the same subsurface water stocks that contribute to groundwater recharge and streamflow. However, recent investigations using dual water stable isotopes have shown an apparent ecohydrological separation between tree-transpired water and stream water. Here we present evidence for such ecohydrological separation in two tropical environments in Puerto Rico where precipitation seasonality is relatively low and where precipitation is positively correlated with primary productivity. We determined the stable isotope signature of xylem water of 30 mahogany (Swietenia spp.) trees sampled during two periods with contrasting moisture status. Our results suggest that the separation between transpiration water and groundwater recharge/streamflow water might be related less to the temporal phasing of hydrologic inputs and primary productivity, and more to the fundamental processes that drive evaporative isotopic enrichment of residual soil water within the soil matrix. The lack of an evaporative signature of both groundwater and streams in the study area suggests that these water balance components have a water source that is transported quickly to deeper subsurface storage compared to waters that trees use. A Bayesian mixing model used to partition source water proportions of xylem water showed that groundwater contribution was greater for valley-bottom, riparian trees than for ridge-top trees. Groundwater contribution was also greater at the xeric site than at the mesic–hydric site. These model results (1) underline the utility of a simple linear mixing model, implemented in a Bayesian inference framework, in quantifying source water contributions at sites with contrasting physiographic characteristics, and (2) highlight the informed judgement that should be made in interpreting mixing model results, of import particularly in surveying groundwater use patterns by vegetation from regional to global scales. 

  9. Factors affecting spruce establishment and recruitment near western treeline, Alaska

    NASA Astrophysics Data System (ADS)

    Miller, A. E.; Sherriff, R.; Wilson, T. L.

    2015-12-01

    Regional warming and increases in tree growth are contributing to increased productivity near the western forest margin in Alaska. The effects of warming on seedling recruitment has received little attention, in spite of forecasted forest expansion near western treeline. Here, we used stand structure and environmental data from white spruce (Picea glauca) stands (n = 95) sampled across a longitudinal gradient to explore factors influencing white spruce growth, establishment and recruitment in southwest Alaska. Using tree-ring chronologies developed from a subset of the plots (n = 30), we estimated establishment dates and basal area increment (BAI) for trees of all age classes across a range of site conditions. We used GLMs (generalized linear models) to explore the relationship between tree growth and temperature in undisturbed, low elevation sites along the gradient, using BAI averaged over the years 1975-2000. In addition, we examined the relationship between growing degree days (GDD) and seedling establishment over the previous three decades. We used total counts of live seedlings, saplings and live and dead trees, representing four cohorts, to evaluate whether geospatial, climate, and measured plot covariates predicted abundance of the different size classes. We hypothesized that the relationship between abundance and longitude would vary by size class, and that this relationship would be mediated by growing season temperature. We found that mean BAI for trees in undisturbed, low elevation sites increased with July maximum temperature, and that the slope of the relationship with temperature changed with longitude (interaction significant with 90% confidence). White spruce establishment was positively associated with longer summers and/or greater heat accumulation, as inferred from GDD. Seedling, sapling and tree abundance were also positively correlated with temperature across the study area. The response to longitude was mixed, with smaller size classes (seedlings, small saplings) most abundant at the western end of the gradient, and larger size classes (trees) most abundant to the east, suggesting a moving front of white spruce establishment near western treeline.

  10. On the pull-out of fibers with fractal-tree structure and the interference of strength and fracture toughness of composites

    NASA Astrophysics Data System (ADS)

    Fe, Shaoyun; Zhou, Benlian; Lung, Chiwei

    1992-06-01

    An approximate theory of pull-out of fiber with fractal-tree structure from a matrix is developed with the aim of quantifying the effects of the fractal-tree structure of the fiber. In the experimental investigation of the pull-out of the synthetic fiber with fractal-tree structure, it was generally observed that the force and energy of fiber pullout increase with the branching angle. The application of this theory to experiment is successful. The strength and fracture toughness of composites reinforced by this kind of fiber are inferred to be greater than those of composites reinforced by plane fibers.

  11. Inferring phylogenetic trees from the knowledge of rare evolutionary events.

    PubMed

    Hellmuth, Marc; Hernandez-Rosales, Maribel; Long, Yangjing; Stadler, Peter F

    2018-06-01

    Rare events have played an increasing role in molecular phylogenetics as potentially homoplasy-poor characters. In this contribution we analyze the phylogenetic information content from a combinatorial point of view by considering the binary relation on the set of taxa defined by the existence of a single event separating two taxa. We show that the graph-representation of this relation must be a tree. Moreover, we characterize completely the relationship between the tree of such relations and the underlying phylogenetic tree. With directed operations such as tandem-duplication-random-loss events in mind we demonstrate how non-symmetric information constrains the position of the root in the partially reconstructed phylogeny.

  12. Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

    PubMed

    Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-Ban; Chua, Xin-Yi; Cong, Yingnan; Hogan, James M; Maetschke, Stefan R; Ragan, Mark A

    2017-06-30

    We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed. © The Author 2017. Published by Oxford University Press.

  13. Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference.

    PubMed

    Santander-Jiménez, Sergio; Vega-Rodríguez, Miguel A

    2013-10-01

    The development of increasingly popular multiobjective metaheuristics has allowed bioinformaticians to deal with optimization problems in computational biology where multiple objective functions must be taken into account. One of the most relevant research topics that can benefit from these techniques is phylogenetic inference. Throughout the years, different researchers have proposed their own view about the reconstruction of ancestral evolutionary relationships among species. As a result, biologists often report different phylogenetic trees from a same dataset when considering distinct optimality principles. In this work, we detail a multiobjective swarm intelligence approach based on the novel Artificial Bee Colony algorithm for inferring phylogenies. The aim of this paper is to propose a complementary view of phylogenetics according to the maximum parsimony and maximum likelihood criteria, in order to generate a set of phylogenetic trees that represent a compromise between these principles. Experimental results on a variety of nucleotide data sets and statistical studies highlight the relevance of the proposal with regard to other multiobjective algorithms and state-of-the-art biological methods. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  14. How does climate influence xylem morphogenesis over the growing season? Insights from long-term intra-ring anatomy in Picea abies.

    PubMed

    Castagneri, Daniele; Fonti, Patrick; von Arx, Georg; Carrer, Marco

    2017-04-01

    During the growing season, the cambium of conifer trees produces successive rows of xylem cells, the tracheids, that sequentially pass through the phases of enlargement and secondary wall thickening before dying and becoming functional. Climate variability can strongly influence the kinetics of morphogenetic processes, eventually affecting tracheid shape and size. This study investigates xylem anatomical structure in the stem of Picea abies to retrospectively infer how, in the long term, climate affects the processes of cell enlargement and wall thickening. Tracheid anatomical traits related to the phases of enlargement (diameter) and wall thickening (wall thickness) were innovatively inspected at the intra-ring level on 87-year-long tree-ring series in Picea abies trees along a 900 m elevation gradient in the Italian Alps. Anatomical traits in ten successive tree-ring sectors were related to daily temperature and precipitation data using running correlations. Close to the altitudinal tree limit, low early-summer temperature negatively affected cell enlargement. At lower elevation, water availability in early summer was positively related to cell diameter. The timing of these relationships shifted forward by about 20 (high elevation) to 40 (low elevation) d from the first to the last tracheids in the ring. Cell wall thickening was affected by climate in a different period in the season. In particular, wall thickness of late-formed tracheids was strongly positively related to August-September temperature at high elevation. Morphogenesis of tracheids sequentially formed in the growing season is influenced by climate conditions in successive periods. The distinct climate impacts on cell enlargement and wall thickening indicate that different morphogenetic mechanisms are responsible for different tracheid traits. Our approach of long-term and high-resolution analysis of xylem anatomy can support and extend short-term xylogenesis observations, and increase our understanding of climate control of tree growth and functioning under different environmental conditions. © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. The Phylogeny and Biogeographic History of Ashes (Fraxinus, Oleaceae) Highlight the Roles of Migration and Vicariance in the Diversification of Temperate Trees

    PubMed Central

    Hinsinger, Damien Daniel; Basak, Jolly; Gaudeul, Myriam; Cruaud, Corinne; Bertolino, Paola; Frascaria-Lacoste, Nathalie; Bousquet, Jean

    2013-01-01

    The cosmopolitan genus Fraxinus, which comprises about 40 species of temperate trees and shrubs occupying various habitats in the Northern Hemisphere, represents a useful model to study speciation in long-lived angiosperms. We used nuclear external transcribed spacers (nETS), phantastica gene sequences, and two chloroplast loci (trnH-psbA and rpl32-trnL) in combination with previously published and newly obtained nITS sequences to produce a time-calibrated multi-locus phylogeny of the genus. We then inferred the biogeographic history and evolution of floral morphology. An early dispersal event could be inferred from North America to Asia during the Oligocene, leading to the diversification of the section Melioides sensus lato. Another intercontinental dispersal originating from the Eurasian section of Fraxinus could be dated from the Miocene and resulted in the speciation of F. nigra in North America. In addition, vicariance was inferred to account for the distribution of the other Old World species (sections Sciadanthus, Fraxinus and Ornus). Geographic speciation likely involving dispersal and vicariance could also be inferred from the phylogenetic grouping of geographically close taxa. Molecular dating suggested that the initial divergence of the taxonomical sections occurred during the middle and late Eocene and Oligocene periods, whereas diversification within sections occurred mostly during the late Oligocene and Miocene, which is consistent with the climate warming and accompanying large distributional changes observed during these periods. These various results underline the importance of dispersal and vicariance in promoting geographic speciation and diversification in Fraxinus. Similarities in life history, reproductive and demographic attributes as well as geographical distribution patterns suggest that many other temperate trees should exhibit similar speciation patterns. On the other hand, the observed parallel evolution and reversions in floral morphology would imply a major influence of environmental pressure. The phylogeny obtained and its biogeographical implications should facilitate future studies on the evolution of complex adaptive characters, such as habitat preference, and their possible roles in promoting divergent evolution in trees. PMID:24278282

  16. The phylogeny and biogeographic history of ashes (fraxinus, oleaceae) highlight the roles of migration and vicariance in the diversification of temperate trees.

    PubMed

    Hinsinger, Damien Daniel; Basak, Jolly; Gaudeul, Myriam; Cruaud, Corinne; Bertolino, Paola; Frascaria-Lacoste, Nathalie; Bousquet, Jean

    2013-01-01

    The cosmopolitan genus Fraxinus, which comprises about 40 species of temperate trees and shrubs occupying various habitats in the Northern Hemisphere, represents a useful model to study speciation in long-lived angiosperms. We used nuclear external transcribed spacers (nETS), phantastica gene sequences, and two chloroplast loci (trnH-psbA and rpl32-trnL) in combination with previously published and newly obtained nITS sequences to produce a time-calibrated multi-locus phylogeny of the genus. We then inferred the biogeographic history and evolution of floral morphology. An early dispersal event could be inferred from North America to Asia during the Oligocene, leading to the diversification of the section Melioides sensus lato. Another intercontinental dispersal originating from the Eurasian section of Fraxinus could be dated from the Miocene and resulted in the speciation of F. nigra in North America. In addition, vicariance was inferred to account for the distribution of the other Old World species (sections Sciadanthus, Fraxinus and Ornus). Geographic speciation likely involving dispersal and vicariance could also be inferred from the phylogenetic grouping of geographically close taxa. Molecular dating suggested that the initial divergence of the taxonomical sections occurred during the middle and late Eocene and Oligocene periods, whereas diversification within sections occurred mostly during the late Oligocene and Miocene, which is consistent with the climate warming and accompanying large distributional changes observed during these periods. These various results underline the importance of dispersal and vicariance in promoting geographic speciation and diversification in Fraxinus. Similarities in life history, reproductive and demographic attributes as well as geographical distribution patterns suggest that many other temperate trees should exhibit similar speciation patterns. On the other hand, the observed parallel evolution and reversions in floral morphology would imply a major influence of environmental pressure. The phylogeny obtained and its biogeographical implications should facilitate future studies on the evolution of complex adaptive characters, such as habitat preference, and their possible roles in promoting divergent evolution in trees.

  17. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets.

    PubMed

    Springer, Mark S; Gatesy, John

    2018-02-26

    coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation).

  18. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets

    PubMed Central

    Springer, Mark S.; Gatesy, John

    2018-01-01

    Summary coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset—the ‘recombination ratchet’—is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d’etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation). PMID:29495400

  19. A machine-learning approach reveals that alignment properties alone can accurately predict inference of lateral gene transfer from discordant phylogenies.

    PubMed

    Roettger, Mayo; Martin, William; Dagan, Tal

    2009-09-01

    Among the methods currently used in phylogenomic practice to detect the presence of lateral gene transfer (LGT), one of the most frequently employed is the comparison of gene tree topologies for different genes. In cases where the phylogenies for different genes are incompatible, or discordant, for well-supported branches there are three simple interpretations for the result: 1) gene duplications (paralogy) followed by many independent gene losses have occurred, 2) LGT has occurred, or 3) the phylogeny is well supported but for reasons unknown is nonetheless incorrect. Here, we focus on the third possibility by examining the properties of 22,437 published multiple sequence alignments, the Bayesian maximum likelihood trees for which either do or do not suggest the occurrence of LGT by the criterion of discordant branches. The alignments that produce discordant phylogenies differ significantly in several salient alignment properties from those that do not. Using a support vector machine, we were able to predict the inference of discordant tree topologies with up to 80% accuracy from alignment properties alone.

  20. Genealogy and gene trees.

    PubMed

    Rasmuson, Marianne

    2008-02-01

    Heredity can be followed in persons or in genes. Persons can be identified only a few generations back, but simplified models indicate that universal ancestors to all now living persons have occurred in the past. Genetic variability can be characterized as variants of DNA sequences. Data are available only from living persons, but from the pattern of variation gene trees can be inferred by means of coalescence models. The merging of lines backwards in time leads to a MRCA (most recent common ancestor). The time and place of living for this inferred person can give insights in human evolutionary history. Demographic processes are incorporated in the model, but since culture and customs are known to influence demography the models used ought to be tested against available genealogy. The Icelandic data base offers a possibility to do so and points to some discrepancies. Mitochondrial DNA and Y chromosome patterns give a rather consistent view of human evolutionary history during the latest 100 000 years but the earlier epochs of human evolution demand gene trees with longer branches. The results of such studies reveal as yet unsolved problems about the sources of our genome.

  1. A parsimonious tree-grow method for haplotype inference.

    PubMed

    Li, Zhenping; Zhou, Wenfeng; Zhang, Xiang-Sun; Chen, Luonan

    2005-09-01

    Haplotype information has become increasingly important in analyzing fine-scale molecular genetics data, such as disease genes mapping and drug design. Parsimony haplotyping is one of haplotyping problems belonging to NP-hard class. In this paper, we aim to develop a novel algorithm for the haplotype inference problem with the parsimony criterion, based on a parsimonious tree-grow method (PTG). PTG is a heuristic algorithm that can find the minimum number of distinct haplotypes based on the criterion of keeping all genotypes resolved during tree-grow process. In addition, a block-partitioning method is also proposed to improve the computational efficiency. We show that the proposed approach is not only effective with a high accuracy, but also very efficient with the computational complexity in the order of O(m2n) time for n single nucleotide polymorphism sites in m individual genotypes. The software is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/ptg/ chen@elec.osaka-sandai.ac.jp Supporting materials is available from http://zhangroup.aporc.org/bioinfo/ptg/bti572supplementary.pdf

  2. Phylogenomics with paralogs

    PubMed Central

    Hellmuth, Marc; Wieseke, Nicolas; Lechner, Marcus; Lenhof, Hans-Peter; Middendorf, Martin; Stadler, Peter F.

    2015-01-01

    Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer. PMID:25646426

  3. The transmission process: A combinatorial stochastic process for the evolution of transmission trees over networks.

    PubMed

    Sainudiin, Raazesh; Welch, David

    2016-12-07

    We derive a combinatorial stochastic process for the evolution of the transmission tree over the infected vertices of a host contact network in a susceptible-infected (SI) model of an epidemic. Models of transmission trees are crucial to understanding the evolution of pathogen populations. We provide an explicit description of the transmission process on the product state space of (rooted planar ranked labelled) binary transmission trees and labelled host contact networks with SI-tags as a discrete-state continuous-time Markov chain. We give the exact probability of any transmission tree when the host contact network is a complete, star or path network - three illustrative examples. We then develop a biparametric Beta-splitting model that directly generates transmission trees with exact probabilities as a function of the model parameters, but without explicitly modelling the underlying contact network, and show that for specific values of the parameters we can recover the exact probabilities for our three example networks through the Markov chain construction that explicitly models the underlying contact network. We use the maximum likelihood estimator (MLE) to consistently infer the two parameters driving the transmission process based on observations of the transmission trees and use the exact MLE to characterize equivalence classes over the space of contact networks with a single initial infection. An exploratory simulation study of the MLEs from transmission trees sampled from three other deterministic and four random families of classical contact networks is conducted to shed light on the relation between the MLEs of these families with some implications for statistical inference along with pointers to further extensions of our models. The insights developed here are also applicable to the simplest models of "meme" evolution in online social media networks through transmission events that can be distilled from observable actions such as "likes", "mentions", "retweets" and "+1s" along with any concomitant comments. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  4. Synthesis of phylogeny and taxonomy into a comprehensive tree of life

    PubMed Central

    Hinchliff, Cody E.; Smith, Stephen A.; Allman, James F.; Burleigh, J. Gordon; Chaudhary, Ruchi; Coghill, Lyndon M.; Crandall, Keith A.; Deng, Jiabin; Drew, Bryan T.; Gazis, Romina; Gude, Karl; Hibbett, David S.; Katz, Laura A.; Laughinghouse, H. Dail; McTavish, Emily Jane; Midford, Peter E.; Owen, Christopher L.; Ree, Richard H.; Rees, Jonathan A.; Soltis, Douglas E.; Williams, Tiffani; Cranston, Karen A.

    2015-01-01

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966

  5. Synthesis of phylogeny and taxonomy into a comprehensive tree of life.

    PubMed

    Hinchliff, Cody E; Smith, Stephen A; Allman, James F; Burleigh, J Gordon; Chaudhary, Ruchi; Coghill, Lyndon M; Crandall, Keith A; Deng, Jiabin; Drew, Bryan T; Gazis, Romina; Gude, Karl; Hibbett, David S; Katz, Laura A; Laughinghouse, H Dail; McTavish, Emily Jane; Midford, Peter E; Owen, Christopher L; Ree, Richard H; Rees, Jonathan A; Soltis, Douglas E; Williams, Tiffani; Cranston, Karen A

    2015-10-13

    Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.

  6. Is multiple-sequence alignment required for accurate inference of phylogeny?

    PubMed

    Höhl, Michael; Ragan, Mark A

    2007-04-01

    The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.

  7. Optimization of analytical parameters for inferring relationships among Escherichia coli isolates from repetitive-element PCR by maximizing correspondence with multilocus sequence typing data.

    PubMed

    Goldberg, Tony L; Gillespie, Thomas R; Singer, Randall S

    2006-09-01

    Repetitive-element PCR (rep-PCR) is a method for genotyping bacteria based on the selective amplification of repetitive genetic elements dispersed throughout bacterial chromosomes. The method has great potential for large-scale epidemiological studies because of its speed and simplicity; however, objective guidelines for inferring relationships among bacterial isolates from rep-PCR data are lacking. We used multilocus sequence typing (MLST) as a "gold standard" to optimize the analytical parameters for inferring relationships among Escherichia coli isolates from rep-PCR data. We chose 12 isolates from a large database to represent a wide range of pairwise genetic distances, based on the initial evaluation of their rep-PCR fingerprints. We conducted MLST with these same isolates and systematically varied the analytical parameters to maximize the correspondence between the relationships inferred from rep-PCR and those inferred from MLST. Methods that compared the shapes of densitometric profiles ("curve-based" methods) yielded consistently higher correspondence values between data types than did methods that calculated indices of similarity based on shared and different bands (maximum correspondences of 84.5% and 80.3%, respectively). Curve-based methods were also markedly more robust in accommodating variations in user-specified analytical parameter values than were "band-sharing coefficient" methods, and they enhanced the reproducibility of rep-PCR. Phylogenetic analyses of rep-PCR data yielded trees with high topological correspondence to trees based on MLST and high statistical support for major clades. These results indicate that rep-PCR yields accurate information for inferring relationships among E. coli isolates and that accuracy can be enhanced with the use of analytical methods that consider the shapes of densitometric profiles.

  8. GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

    PubMed Central

    2010-01-01

    Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164

  9. GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

    PubMed

    Thomas, Paul D

    2010-06-09

    Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.

  10. What is the danger of the anomaly zone for empirical phylogenetics?

    PubMed

    Huang, Huateng; Knowles, L Lacey

    2009-10-01

    The increasing number of observations of gene trees with discordant topologies in phylogenetic studies has raised awareness about the problems of incongruence between species trees and gene trees. Moreover, theoretical treatments focusing on the impact of coalescent variance on phylogenetic study have also identified situations where the most probable gene trees are ones that do not match the underlying species tree (i.e., anomalous gene trees [AGTs]). However, although the theoretical proof of the existence of AGTs is alarming, the actual risk that AGTs pose to empirical phylogenetic study is far from clear. Establishing the conditions (i.e., the branch lengths in a species tree) for which AGTs are possible does not address the critical issue of how prevalent they might be. Furthermore, theoretical characterization of the species trees for which AGTs may pose a problem (i.e., the anomaly zone or the species histories for which AGTs are theoretically possible) is based on consideration of just one source of variance that contributes to species tree and gene tree discord-gene lineage coalescence. Yet, empirical data contain another important stochastic component-mutational variance. Estimated gene trees will differ from the underlying gene trees (i.e., the actual genealogy) because of the random process of mutation. Here, we take a simulation approach to investigate the prevalence of AGTs, among estimated gene trees, thereby characterizing the boundaries of the anomaly zone taking into account both coalescent and mutational variances. We also determine the frequency of realized AGTs, which is critical to putting the theoretical work on AGTs into a realistic biological context. Two salient results emerge from this investigation. First, our results show that mutational variance can indeed expand the parameter space (i.e., the relative branch lengths in a species tree) where AGTs might be observed in empirical data. By exploring the underlying cause for the expanded anomaly zone, we identify aspects of empirical data relevant to avoiding the problems that AGTs pose for species tree inference from multilocus data. Second, for the empirical species histories where AGTs are possible, unresolved trees-not AGTs-predominate the pool of estimated gene trees. This result suggests that the risk of AGTs, while they exist in theory, may rarely be realized in practice. By considering the biological realities of both mutational and coalescent variances, the study has refined, and redefined, what the actual challenges are for empirical phylogenetic study of recently diverged taxa that have speciated rapidly-AGTs themselves are unlikely to pose a significant danger to empirical phylogenetic study.

  11. Dominant controls of transpiration along a hillslope transect inferred from ecohydrological measurements and thermodynamic limits

    NASA Astrophysics Data System (ADS)

    Renner, Maik; Hassler, Sibylle K.; Blume, Theresa; Weiler, Markus; Hildebrandt, Anke; Guderle, Marcus; Schymanski, Stanislaus J.; Kleidon, Axel

    2016-05-01

    We combine ecohydrological observations of sap flow and soil moisture with thermodynamically constrained estimates of atmospheric evaporative demand to infer the dominant controls of forest transpiration in complex terrain. We hypothesize that daily variations in transpiration are dominated by variations in atmospheric demand, while site-specific controls, including limiting soil moisture, act on longer timescales. We test these hypotheses with data of a measurement setup consisting of five sites along a valley cross section in Luxembourg. Both hillslopes are covered by forest dominated by European beech (Fagus sylvatica L.). Two independent measurements are used to estimate stand transpiration: (i) sap flow and (ii) diurnal variations in soil moisture, which were used to estimate the daily root water uptake. Atmospheric evaporative demand is estimated through thermodynamically constrained evaporation, which only requires absorbed solar radiation and temperature as input data without any empirical parameters. Both transpiration estimates are strongly correlated to atmospheric demand at the daily timescale. We find that neither vapor pressure deficit nor wind speed add to the explained variance, supporting the idea that they are dependent variables on land-atmosphere exchange and the surface energy budget. Estimated stand transpiration was in a similar range at the north-facing and the south-facing hillslopes despite the different aspect and the largely different stand composition. We identified an inverse relationship between sap flux density and the site-average sapwood area per tree as estimated by the site forest inventories. This suggests that tree hydraulic adaptation can compensate for heterogeneous conditions. However, during dry summer periods differences in topographic factors and stand structure can cause spatially variable transpiration rates. We conclude that absorption of solar radiation at the surface forms a dominant control for turbulent heat and mass exchange and that vegetation across the hillslope adjusts to this constraint at the tree and stand level. These findings should help to improve the description of land-surface-atmosphere exchange at regional scales.

  12. Breaking up and getting together: evolution of symbiosis and cloning by fission in sea anemones (Genus Anthopleura).

    PubMed

    Geller, J B; Walton, E D

    2001-09-01

    Clonal growth and symbiosis with photosynthetic zooxanthellae typify many genera of marine organisms, suggesting that these traits are usually conserved. However, some, such as Anthopleura, a genus of sea anemones, contain members lacking one or both of these traits. The evolutionary origins of these traits in 13 species of Anthopleura were inferred from a molecular phylogeny derived from 395 bp of the mitochondrial 16S rRNA gene and 410 bp of the mitochondrial cytochrome oxidase subunit III gene. Sequences from these genes were combined and analyzed by maximum-parsimony, maximum-likelihood, and neighbor-joining methods. Best trees from each method indicated a minimum of four changes in growth mode and that symbiosis with zooxanthellae has arisen independently in eastern and western Pacific species. Alternative trees in which species sharing growth modes or the symbiotic condition were constrained to be monophyletic were significantly worse than best trees. Although clade composition was mostly consistent with geographic sympatry, A. artemisia from California was included in the western Pacific clade. Likewise, A. midori from Japan was not placed in a clade containing only other Asian congeners. The history of Anthopleura includes repeated shifts between clonality and solitariness, repeated attainment of symbiosis with zooxanthellae, and intercontinental dispersal.

  13. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.

    PubMed

    Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H

    2014-02-26

    Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.

  14. Exact Algorithms for Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

    PubMed

    Kordi, Misagh; Bansal, Mukul S

    2017-06-01

    Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of a non-binary input gene tree, and (iii) apply our algorithms to a large empirical data set of over 4700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.

  15. Linking wood anatomy and xylogenesis allows pinpointing of climate and drought influences on growth of coexisting conifers in continental Mediterranean climate.

    PubMed

    Pacheco, Arturo; Camarero, J Julio; Carrer, Marco

    2016-04-01

    Forecasted warmer and drier conditions will probably lead to reduced growth rates and decreased carbon fixation in long-term woody pools in drought-prone areas. We therefore need a better understanding of how climate stressors such as drought constrain wood formation and drive changes in wood anatomy. Drying trends could lead to reduced growth if they are more intense in spring, when radial growth rates of conifers in continental Mediterranean climates peak. Since tree species from the aforementioned areas have to endure dry summers and also cold winters, we chose two coexisting species: Aleppo pine (Pinus halepensisMill., Pinaceae) and Spanish juniper (Juniperus thuriferaL., Cupressaceae) (10 randomly selected trees per species), to analyze how growth (tree-ring width) and wood-anatomical traits (lumen transversal area, cell-wall thickness, presence of intra-annual density fluctuations-IADFs-in the latewood) responded to climatic variables (minimum and maximum temperatures, precipitation, soil moisture deficit) calculated for different time intervals. Tree-ring width and mean lumen area showed similar year-to-year variability, which indicates that they encoded similar climatic signals. Wet and cool late-winter to early-spring conditions increased lumen area expansion, particularly in pine. In juniper, cell-wall thickness increased when early summer conditions became drier and the frequency of latewood IADFs increased in parallel with late-summer to early-autumn wet conditions. Thus, latewood IADFs of the juniper capture increased water availability during the late growing season, which is reflected in larger tracheid lumens. Soil water availability was one of the main drivers of wood formation and radial growth for the two species. These analyses allow long-term (several decades) growth and wood-anatomical responses to climate to be inferred at intra-annual scales, which agree with the growing patterns already described by xylogenesis approaches for the same species. A plastic bimodal growth behavior, driven by dry summer conditions, is coherent with the presented wood-anatomical data. The different wood-anatomical responses to drought stress are observed as IADFs with contrasting characteristics and responses to climate. These different responses suggest distinct capacities to access soil water between the two conifer species. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. MulRF: a software package for phylogenetic analysis using multi-copy gene trees.

    PubMed

    Chaudhary, Ruchi; Fernández-Baca, David; Burleigh, John Gordon

    2015-02-01

    MulRF is a platform-independent software package for phylogenetic analysis using multi-copy gene trees. It seeks the species tree that minimizes the Robinson-Foulds (RF) distance to the input trees using a generalization of the RF distance to multi-labeled trees. The underlying generic tree distance measure and fast running time make MulRF useful for inferring phylogenies from large collections of gene trees, in which multiple evolutionary processes as well as phylogenetic error may contribute to gene tree discord. MulRF implements several features for customizing the species tree search and assessing the results, and it provides a user-friendly graphical user interface (GUI) with tree visualization. The species tree search is implemented in C++ and the GUI in Java Swing. MulRF's executable as well as sample datasets and manual are available at http://genome.cs.iastate.edu/CBL/MulRF/, and the source code is available at https://github.com/ruchiherself/MulRFRepo. ruchic@ufl.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Spatial analysis of Northern Goshawk Territories in the Black Hills, South Dakota

    USGS Publications Warehouse

    Klaver, Robert W.; Backlund, Douglas; Bartelt, Paul E.; Erickson, Michael G.; Knowles, Craig J.; Knowles, Pamela R.; Wimberly, Michael

    2012-01-01

    The Northern Goshawk (Accipiter gentilis) is the largest of the three North American species ofAccipiter and is more closely associated with older forests than are the other species. Its reliance on older forests has resulted in concerns about its status, extensive research into its habitat relationships, and litigation. Our objective was to model the spatial patterns of goshawk territories in the Black Hills, South Dakota, to make inferences about the underlying processes. We used a modification of Ripley's K function that accounts for inhomogeneous intensity to determine whether territoriality or habitat determined the spacing of goshawks in the Black Hills, finding that habitat conditions rather than territoriality were the determining factor. A spatial model incorporating basal area of trees in a stand of forest, canopy cover, age of trees >23 cm in diameter, number of trees per hectare, and geographic coordinates provided good fit to the spatial patterns of territories. There was no indication of repulsion at close distances that would imply spacing was determined by territoriality. These findings contrast with those for the Kaibab Plateau, Arizona, where territoriality is an important limiting factor. Forest stands where the goshawk nested historically are now younger and have trees of smaller diameter, probably having been modified by logging, fire, and insects. These results have important implications for the goshawk's ecology in the Black Hills with respect to mortality, competition, forest fragmentation, and nest-territory protection.

  18. Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants.

    PubMed

    Sumner, Jeremy G; Taylor, Amelia; Holland, Barbara R; Jarvis, Peter D

    2017-12-01

    Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.

  19. Breastfeeding and the prevention of breast cancer: a retrospective review of clinical histories.

    PubMed

    González-Jiménez, Emilio; García, Pedro A; Aguilar, María José; Padilla, Carlos A; Álvarez, Judit

    2014-09-01

    To evaluate at what age parous and nonparous women were diagnosed with breast cancer. Factors taken into account for parous women were whether they had breastfed their children, and if so, the length of the lactation period. Other factors considered for both groups were obesity, family histories of cancer, smoking habits and alcohol consumption. Breast cancer is the most common form of cancer in younger women in Western countries. Its growing incidence as well as the increasingly early age of diagnosis led us to carefully analyse its possible causes and the preventive measures to be taken. This is a particularly important goal in epidemiological research. A retrospective study of the clinical histories of patients diagnosed with breast cancer at the San Cecilio University Hospital in Granada (Spain). In this study, we analysed 504 medical records of female patients, 19-91 years of age, who had been diagnosed and treated for breast cancer from 2004-2009 at the San Cecilio University Hospital in Granada (Spain). Relevant data (age of diagnosis, period of lactation, family history of cancer, obesity, alcohol consumption and smoking habits) were collected from the clinical histories of each patient and analysed. A conditional inference tree was used to relate the age of diagnosis to smoking habits and the length of the lactation period. The conditional inference tree identified significant differences between the age of the patients at breast cancer diagnosis, smoking habits (p < 0·001) and lactation period if the subjects had breastfed their children for more than six months (p = 0·006), regardless of whether they had a family history of cancer. Our study concluded that breastfeeding for over six months not only provides children with numerous health benefits, but also protects mothers from breast cancer when the mothers are nonsmokers. Nurses play a crucial role in encouraging new mothers to breastfeed their children, and this helps to prevent breast cancer. © 2013 John Wiley & Sons Ltd.

  20. Predictors of Sudden Cardiac Death in Doberman Pinschers with Dilated Cardiomyopathy.

    PubMed

    Klüser, L; Holler, P J; Simak, J; Tater, G; Smets, P; Rügamer, D; Küchenhoff, H; Wess, G

    2016-05-01

    Doberman Pinschers with dilated cardiomyopathy (DCM) are at high risk of sudden cardiac death (SCD). Risk factors for SCD are poorly defined. To assess cardiac biomarkers, Holter-ECG, echocardiographic variables and canine characteristics in a group of Doberman Pinschers with DCM dying of SCD and in a DCM control group to identify factors predicting SCD. A longitudinal prospective study was performed in 95 Doberman Pinschers with DCM. Forty-one dogs died within 3 months after the last cardiac examination (SCD-group) and were compared to 54 Doberman Pinschers with DCM surviving 1 year after inclusion. Holter-ECG, echocardiography, measurement of N-terminal prohormone of brain-natriuretic peptide (NT-proBNP), and cardiac Troponin I (cTnI) concentrations were recorded for all dogs. Volume overload of the left ventricle (left ventricular end-diastolic volume (LVEDV/BSA) > 91.3 mL/m²) was the single best variable to predict SCD. The probability of SCD increases 8.5-fold (CI0.95  = 0.8-35.3) for every 50 mL/m²-unit increment in LVEDV/BSA. Ejection fraction (EF), left ventricular end-systolic volume (LVESV/BSA) and NT-proBNP were highly correlated with LVEDV/BSA (r = -0.63, 0.96, 0.86, respectively). Generated conditional inference trees (CTREEs) revealed that the presence of ventricular tachycardia (VT), increased concentration of cTnI, and the fastest rate (FR) of ventricular premature complexes (VPC) ≥260 beats per minute (bpm) are additional important variables to predict SCD. Conditional inference trees provided in this study might be useful for risk assessment of SCD in Doberman Pinschers with DCM. Copyright © 2016 The Authors. Journal of Veterinary Internal Medicine published by Wiley Periodicals, Inc. on behalf of the American College of Veterinary Internal Medicine.

  1. Mobile Context Provider for Social Networking

    NASA Astrophysics Data System (ADS)

    Santos, André C.; Cardoso, João M. P.; Ferreira, Diogo R.; Diniz, Pedro C.

    The ability to infer user context based on a mobile device together with a set of external sensors opens up the way to new context-aware services and applications. In this paper, we describe a mobile context provider that makes use of sensors available in a smartphone as well as sensors externally connected via bluetooth. We describe the system architecture from sensor data acquisition to feature extraction, context inference and the publication of context information to well-known social networking services such as Twitter and Hi5. In the current prototype, context inference is based on decision trees, but the middleware allows the integration of other inference engines. Experimental results suggest that the proposed solution is a promising approach to provide user context to both local and network-level services.

  2. Late Holocene forest dynamics, volcanism, and climate change at Whitewing Mountain and San Joaquin Ridge, Mono County, Sierra Nevada, CA, USA

    Treesearch

    Constance I. Millar; John C. King; Robert D. Westfall; Harry A. Alden; Diane L. Delany

    2006-01-01

    Deadwood tree stems scattered above treeline on tephra-covered slopes of Whitewing Mtn (3051 m) and San Joaquin Ridge (3122 m) show evidence of being killed in an eruption from adjacent Glass Creek Vent, Inyo Craters. Using tree-ring methods, we dated deadwood to AD 815-1350 and infer from death dates that the eruption occurred in late summer AD 1350. Based on wood...

  3. Phylogenetic framework for coevolutionary studies: a compass for exploring jungles of tangled trees.

    PubMed

    Martínez-Aquino, Andrés

    2016-08-01

    Phylogenetics is used to detect past evolutionary events, from how species originated to how their ecological interactions with other species arose, which can mirror cophylogenetic patterns. Cophylogenetic reconstructions uncover past ecological relationships between taxa through inferred coevolutionary events on trees, for example, codivergence, duplication, host-switching, and loss. These events can be detected by cophylogenetic analyses based on nodes and the length and branching pattern of the phylogenetic trees of symbiotic associations, for example, host-parasite. In the past 2 decades, algorithms have been developed for cophylogetenic analyses and implemented in different software, for example, statistical congruence index and event-based methods. Based on the combination of these approaches, it is possible to integrate temporal information into cophylogenetical inference, such as estimates of lineage divergence times between 2 taxa, for example, hosts and parasites. Additionally, the advances in phylogenetic biogeography applying methods based on parametric process models and combined Bayesian approaches, can be useful for interpreting coevolutionary histories in a scenario of biogeographical area connectivity through time. This article briefly reviews the basics of parasitology and provides an overview of software packages in cophylogenetic methods. Thus, the objective here is to present a phylogenetic framework for coevolutionary studies, with special emphasis on groups of parasitic organisms. Researchers wishing to undertake phylogeny-based coevolutionary studies can use this review as a "compass" when "walking" through jungles of tangled phylogenetic trees.

  4. A parametric method for assessing diversification-rate variation in phylogenetic trees.

    PubMed

    Shah, Premal; Fitzpatrick, Benjamin M; Fordyce, James A

    2013-02-01

    Phylogenetic hypotheses are frequently used to examine variation in rates of diversification across the history of a group. Patterns of diversification-rate variation can be used to infer underlying ecological and evolutionary processes responsible for patterns of cladogenesis. Most existing methods examine rate variation through time. Methods for examining differences in diversification among groups are more limited. Here, we present a new method, parametric rate comparison (PRC), that explicitly compares diversification rates among lineages in a tree using a variety of standard statistical distributions. PRC can identify subclades of the tree where diversification rates are at variance with the remainder of the tree. A randomization test can be used to evaluate how often such variance would appear by chance alone. The method also allows for comparison of diversification rate among a priori defined groups. Further, the application of the PRC method is not restricted to monophyletic groups. We examined the performance of PRC using simulated data, which showed that PRC has acceptable false-positive rates and statistical power to detect rate variation. We apply the PRC method to the well-studied radiation of North American Plethodon salamanders, and support the inference that the large-bodied Plethodon glutinosus clade has a higher historical rate of diversification compared to other Plethodon salamanders. © 2012 The Author(s). Evolution© 2012 The Society for the Study of Evolution.

  5. Phylogenetic framework for coevolutionary studies: a compass for exploring jungles of tangled trees

    PubMed Central

    2016-01-01

    Abstract Phylogenetics is used to detect past evolutionary events, from how species originated to how their ecological interactions with other species arose, which can mirror cophylogenetic patterns. Cophylogenetic reconstructions uncover past ecological relationships between taxa through inferred coevolutionary events on trees, for example, codivergence, duplication, host-switching, and loss. These events can be detected by cophylogenetic analyses based on nodes and the length and branching pattern of the phylogenetic trees of symbiotic associations, for example, host–parasite. In the past 2 decades, algorithms have been developed for cophylogetenic analyses and implemented in different software, for example, statistical congruence index and event-based methods. Based on the combination of these approaches, it is possible to integrate temporal information into cophylogenetical inference, such as estimates of lineage divergence times between 2 taxa, for example, hosts and parasites. Additionally, the advances in phylogenetic biogeography applying methods based on parametric process models and combined Bayesian approaches, can be useful for interpreting coevolutionary histories in a scenario of biogeographical area connectivity through time. This article briefly reviews the basics of parasitology and provides an overview of software packages in cophylogenetic methods. Thus, the objective here is to present a phylogenetic framework for coevolutionary studies, with special emphasis on groups of parasitic organisms. Researchers wishing to undertake phylogeny-based coevolutionary studies can use this review as a “compass” when “walking” through jungles of tangled phylogenetic trees. PMID:29491928

  6. Millennial Climatic Fluctuations Are Key to the Structure of Last Glacial Ecosystems

    PubMed Central

    Huntley, Brian; Allen, Judy R. M.; Collingham, Yvonne C.; Hickler, Thomas; Lister, Adrian M.; Singarayer, Joy; Stuart, Anthony J.; Sykes, Martin T.; Valdes, Paul J.

    2013-01-01

    Whereas fossil evidence indicates extensive treeless vegetation and diverse grazing megafauna in Europe and northern Asia during the last glacial, experiments combining vegetation models and climate models have to-date simulated widespread persistence of trees. Resolving this conflict is key to understanding both last glacial ecosystems and extinction of most of the mega-herbivores. Using a dynamic vegetation model (DVM) we explored the implications of the differing climatic conditions generated by a general circulation model (GCM) in “normal” and “hosing” experiments. Whilst the former approximate interstadial conditions, the latter, designed to mimic Heinrich Events, approximate stadial conditions. The “hosing” experiments gave simulated European vegetation much closer in composition to that inferred from fossil evidence than did the “normal” experiments. Given the short duration of interstadials, and the rate at which forest cover expanded during the late-glacial and early Holocene, our results demonstrate the importance of millennial variability in determining the character of last glacial ecosystems. PMID:23613985

  7. Millennial climatic fluctuations are key to the structure of last glacial ecosystems.

    PubMed

    Huntley, Brian; Allen, Judy R M; Collingham, Yvonne C; Hickler, Thomas; Lister, Adrian M; Singarayer, Joy; Stuart, Anthony J; Sykes, Martin T; Valdes, Paul J

    2013-01-01

    Whereas fossil evidence indicates extensive treeless vegetation and diverse grazing megafauna in Europe and northern Asia during the last glacial, experiments combining vegetation models and climate models have to-date simulated widespread persistence of trees. Resolving this conflict is key to understanding both last glacial ecosystems and extinction of most of the mega-herbivores. Using a dynamic vegetation model (DVM) we explored the implications of the differing climatic conditions generated by a general circulation model (GCM) in "normal" and "hosing" experiments. Whilst the former approximate interstadial conditions, the latter, designed to mimic Heinrich Events, approximate stadial conditions. The "hosing" experiments gave simulated European vegetation much closer in composition to that inferred from fossil evidence than did the "normal" experiments. Given the short duration of interstadials, and the rate at which forest cover expanded during the late-glacial and early Holocene, our results demonstrate the importance of millennial variability in determining the character of last glacial ecosystems.

  8. Reconciliation of Gene and Species Trees

    PubMed Central

    Rusin, L. Y.; Lyubetskaya, E. V.; Gorbunov, K. Y.; Lyubetsky, V. A.

    2014-01-01

    The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i) inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii) trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii) can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree. PMID:24800245

  9. Variation across mitochondrial gene trees provides evidence for systematic error: How much gene tree variation is biological?

    PubMed

    Richards, Emilie J; Brown, Jeremy M; Barley, Anthony J; Chong, Rebecca A; Thomson, Robert C

    2018-02-19

    The use of large genomic datasets in phylogenetics has highlighted extensive topological variation across genes. Much of this discordance is assumed to result from biological processes. However, variation among gene trees can also be a consequence of systematic error driven by poor model fit, and the relative importance of biological versus methodological factors in explaining gene tree variation is a major unresolved question. Using mitochondrial genomes to control for biological causes of gene tree variation, we estimate the extent of gene tree discordance driven by systematic error and employ posterior prediction to highlight the role of model fit in producing this discordance. We find that the amount of discordance among mitochondrial gene trees is similar to the amount of discordance found in other studies that assume only biological causes of variation. This similarity suggests that the role of systematic error in generating gene tree variation is underappreciated and critical evaluation of fit between assumed models and the data used for inference is important for the resolution of unresolved phylogenetic questions.

  10. Individualistic and Time-Varying Tree-Ring Growth to Climate Sensitivity

    PubMed Central

    Carrer, Marco

    2011-01-01

    The development of dendrochronological time series in order to analyze climate-growth relationships usually involves first a rigorous selection of trees and then the computation of the mean tree-growth measurement series. This study suggests a change in the perspective, passing from an analysis of climate-growth relationships that typically focuses on the mean response of a species to investigating the whole range of individual responses among sample trees. Results highlight that this new approach, tested on a larch and stone pine tree-ring dataset, outperforms, in terms of information obtained, the classical one, with significant improvements regarding the strength, distribution and time-variability of the individual tree-ring growth response to climate. Moreover, a significant change over time of the tree sensitivity to climatic variability has been detected. Accordingly, the best-responder trees at any one time may not always have been the best-responders and may not continue to be so. With minor adjustments to current dendroecological protocol and adopting an individualistic approach, we can improve the quality and reliability of the ecological inferences derived from the climate-growth relationships. PMID:21829523

  11. A detailed phylogeny for the Methanomicrobiales

    NASA Technical Reports Server (NTRS)

    Rouviere, P.; Mandelco, L.; Winker, S.; Woese, C. R.

    1992-01-01

    The small subunit rRNA sequence of twenty archaea, members of the Methanomicrobiales, permits a detailed phylogenetic tree to be inferred for the group. The tree confirms earlier studies, based on far fewer sequences, in showing the group to be divided into two major clusters, temporarily designated the "methanosarcina" group and the "methanogenium" group. The tree also defines phylogenetic relationships within these two groups, which in some cases do not agree with the phylogenetic relationships implied by current taxonomic names--a problem most acute for the genus Methanogenium and its relatives. The present phylogenetic characterization provides the basis for a consistent taxonomic restructuring of this major methanogenic taxon.

  12. How Do Evergreens Stay Ever-Green? Hands on Science.

    ERIC Educational Resources Information Center

    Kepler, Lynne

    1993-01-01

    Provides instructional techniques, using samples from evergreen trees, to explain to school children the concept of adaptation. The techniques help children develop skills in observation, classification, communication, inferring, and predicting. A teacher's reproducible is included. (GLR)

  13. Habitat use affects morphological diversification in dragon lizards

    PubMed Central

    COLLAR, D C; SCHULTE, J A; O’MEARA, B C; LOSOS, J B

    2010-01-01

    Habitat use may lead to variation in diversity among evolutionary lineages because habitats differ in the variety of ways they allow for species to make a living. Here, we show that structural habitats contribute to differential diversification of limb and body form in dragon lizards (Agamidae). Based on phylogenetic analysis and ancestral state reconstructions for 90 species, we find that multiple lineages have independently adopted each of four habitat use types: rock-dwelling, terrestriality, semi-arboreality and arboreality. Given these reconstructions, we fit models of evolution to species’ morphological trait values and find that rock-dwelling and arboreality limit diversification relative to terrestriality and semi-arboreality. Models preferred by Akaike information criterion infer slower rates of size and shape evolution in lineages inferred to occupy rocks and trees, and model-averaged rate estimates are slowest for these habitat types. These results suggest that ground-dwelling facilitates ecomorphological differentiation and that use of trees or rocks impedes diversification. PMID:20345808

  14. Estimating diversifying selection and functional constraint in the presence of recombination.

    PubMed

    Wilson, Daniel J; McVean, Gilean

    2006-03-01

    Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques.

  15. Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination

    PubMed Central

    Wilson, Daniel J.; McVean, Gilean

    2006-01-01

    Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques. PMID:16387887

  16. Analyzing contentious relationships and outlier genes in phylogenomics.

    PubMed

    Walker, Joseph F; Brown, Joseph W; Smith, Stephen A

    2018-06-08

    Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.

  17. Reconstruction of precipitation variability in Estonia since the eighteenth century, inferred from oak and spruce tree rings

    NASA Astrophysics Data System (ADS)

    Helama, Samuli; Sohar, Kristina; Läänelaid, Alar; Bijak, Szymon; Jaagus, Jaak

    2018-06-01

    There is plenty of evidence for intensification of the global hydrological cycle. In Europe, the northern areas are predicted to receive more precipitation in the future and observational evidence suggests a parallel trend over the past decades. As a consequence, it would be essential to place the recent trend in precipitation in the context of proxy-based estimates of reconstructed precipitation variability over the past centuries. Tree rings are frequently used as proxy data for palaeoclimate reconstructions. Here we use deciduous ( Quercus robur) and coniferous ( Picea abies) tree-ring width chronologies from western Estonia to deduce past early-summer (June) precipitation variability since 1771. Statistical model transforming our tree-ring data into estimates of precipitation sums explains 42% of the variance in instrumental variability. Comparisons with products of gridded reconstructions of soil moisture and summer precipitation illustrate robust correlations with soil moisture (Palmer Drought Severity Index), but lowered correlation with summer precipitation estimates prior to mid-nineteenth century, these instabilities possibly reflecting the general uncertainties inherent to early meteorological and proxy data. Reconstructed precipitation variability was negatively correlated to the teleconnection indices of the North Atlantic Oscillation and the Scandinavia pattern, on annual to decadal and longer scales. These relationships demonstrate the positive precipitation anomalies to result from increase in zonal inflow and cyclonic activity, the negative anomalies being linked with the high pressure conditions enhanced during the atmospheric blocking episodes. Recently, the instrumental data have demonstrated a remarkable increase in summer (June) precipitation in the study region. Our tree-ring based reconstruction reproduces this trend in the context of precipitation history since eighteenth century and quantifies the unprecedented abundance of June precipitation over the recent years.

  18. Process-based modeling of species' responses to climate change - a proof of concept using western North American trees

    NASA Astrophysics Data System (ADS)

    Evans, M. E.; Merow, C.; Record, S.; Menlove, J.; Gray, A.; Cundiff, J.; McMahon, S.; Enquist, B. J.

    2013-12-01

    Current attempts to forecast how species' distributions will change in response to climate change suffer under a fundamental trade-off: between modeling many species superficially vs. few species in detail (between correlative vs. mechanistic models). The goals of this talk are two-fold: first, we present a Bayesian multilevel modeling framework, dynamic range modeling (DRM), for building process-based forecasts of many species' distributions at a time, designed to address the trade-off between detail and number of distribution forecasts. In contrast to 'species distribution modeling' or 'niche modeling', which uses only species' occurrence data and environmental data, DRMs draw upon demographic data, abundance data, trait data, occurrence data, and GIS layers of climate in a single framework to account for two processes known to influence range dynamics - demography and dispersal. The vision is to use extensive databases on plant demography, distributions, and traits - in the Botanical Information and Ecology Network, the Forest Inventory and Analysis database (FIA), and the International Tree Ring Data Bank - to develop DRMs for North American trees. Second, we present preliminary results from building the core submodel of a DRM - an integral projection model (IPM) - for a sample of dominant tree species in western North America. IPMs are used to infer demographic niches - i.e., the set of environmental conditions under which population growth rate is positive - and project population dynamics through time. Based on >550,000 data points derived from FIA for nine tree species in western North America, we show IPM-based models of their current and future distributions, and discuss how IPMs can be used to forecast future forest productivity, mortality patterns, and inform efforts at assisted migration.

  19. A Well-Resolved Phylogeny of the Trees of Puerto Rico Based on DNA Barcode Sequence Data

    PubMed Central

    Muscarella, Robert; Uriarte, María; Erickson, David L.; Swenson, Nathan G.; Zimmerman, Jess K.; Kress, W. John

    2014-01-01

    Background The use of phylogenetic information in community ecology and conservation has grown in recent years. Two key issues for community phylogenetics studies, however, are (i) low terminal phylogenetic resolution and (ii) arbitrarily defined species pools. Methodology/principal findings We used three DNA barcodes (plastid DNA regions rbcL, matK, and trnH-psbA) to infer a phylogeny for 527 native and naturalized trees of Puerto Rico, representing the vast majority of the entire tree flora of the island (89%). We used a maximum likelihood (ML) approach with and without a constraint tree that enforced monophyly of recognized plant orders. Based on 50% consensus trees, the ML analyses improved phylogenetic resolution relative to a comparable phylogeny generated with Phylomatic (proportion of internal nodes resolved: constrained ML = 74%, unconstrained ML = 68%, Phylomatic = 52%). We quantified the phylogenetic composition of 15 protected forests in Puerto Rico using the constrained ML and Phylomatic phylogenies. We found some evidence that tree communities in areas of high water stress were relatively phylogenetically clustered. Reducing the scale at which the species pool was defined (from island to soil types) changed some of our results depending on which phylogeny (ML vs. Phylomatic) was used. Overall, the increased terminal resolution provided by the ML phylogeny revealed additional patterns that were not observed with a less-resolved phylogeny. Conclusions/significance With the DNA barcode phylogeny presented here (based on an island-wide species pool), we show that a more fully resolved phylogeny increases power to detect nonrandom patterns of community composition in several Puerto Rican tree communities. Especially if combined with additional information on species functional traits and geographic distributions, this phylogeny will (i) facilitate stronger inferences about the role of historical processes in governing the assembly and composition of Puerto Rican forests, (ii) provide insight into Caribbean biogeography, and (iii) aid in incorporating evolutionary history into conservation planning. PMID:25386879

  20. A well-resolved phylogeny of the trees of Puerto Rico based on DNA barcode sequence data.

    PubMed

    Muscarella, Robert; Uriarte, María; Erickson, David L; Swenson, Nathan G; Zimmerman, Jess K; Kress, W John

    2014-01-01

    The use of phylogenetic information in community ecology and conservation has grown in recent years. Two key issues for community phylogenetics studies, however, are (i) low terminal phylogenetic resolution and (ii) arbitrarily defined species pools. We used three DNA barcodes (plastid DNA regions rbcL, matK, and trnH-psbA) to infer a phylogeny for 527 native and naturalized trees of Puerto Rico, representing the vast majority of the entire tree flora of the island (89%). We used a maximum likelihood (ML) approach with and without a constraint tree that enforced monophyly of recognized plant orders. Based on 50% consensus trees, the ML analyses improved phylogenetic resolution relative to a comparable phylogeny generated with Phylomatic (proportion of internal nodes resolved: constrained ML = 74%, unconstrained ML = 68%, Phylomatic = 52%). We quantified the phylogenetic composition of 15 protected forests in Puerto Rico using the constrained ML and Phylomatic phylogenies. We found some evidence that tree communities in areas of high water stress were relatively phylogenetically clustered. Reducing the scale at which the species pool was defined (from island to soil types) changed some of our results depending on which phylogeny (ML vs. Phylomatic) was used. Overall, the increased terminal resolution provided by the ML phylogeny revealed additional patterns that were not observed with a less-resolved phylogeny. With the DNA barcode phylogeny presented here (based on an island-wide species pool), we show that a more fully resolved phylogeny increases power to detect nonrandom patterns of community composition in several Puerto Rican tree communities. Especially if combined with additional information on species functional traits and geographic distributions, this phylogeny will (i) facilitate stronger inferences about the role of historical processes in governing the assembly and composition of Puerto Rican forests, (ii) provide insight into Caribbean biogeography, and (iii) aid in incorporating evolutionary history into conservation planning.

  1. A 3,500-year tree-ring record of annual precipitation on the northeastern Tibetan Plateau.

    PubMed

    Yang, Bao; Qin, Chun; Wang, Jianglin; He, Minhui; Melvin, Thomas M; Osborn, Timothy J; Briffa, Keith R

    2014-02-25

    An annually resolved and absolutely dated ring-width chronology spanning 4,500 y has been constructed using subfossil, archaeological, and living-tree juniper samples from the northeastern Tibetan Plateau. The chronology represents changing mean annual precipitation and is most reliable after 1500 B.C. Reconstructed precipitation for this period displays a trend toward more moist conditions: the last 10-, 25-, and 50-y periods all appear to be the wettest in at least three and a half millennia. Notable historical dry periods occurred in the 4th century BCE and in the second half of the 15th century CE. The driest individual year reconstructed (since 1500 B.C.) is 1048 B.C., whereas the wettest is 2010. Precipitation variability in this region appears not to be associated with inferred changes in Asian monsoon intensity during recent millennia. The chronology displays a statistical association with the multidecadal and longer-term variability of reconstructed mean Northern Hemisphere temperatures over the last two millennia. This suggests that any further large-scale warming might be associated with even greater moisture supply in this region.

  2. Phylogenetic relationships and timing of diversification in gonorynchiform fishes inferred using nuclear gene DNA sequences (Teleostei: Ostariophysi).

    PubMed

    Near, Thomas J; Dornburg, Alex; Friedman, Matt

    2014-11-01

    The Gonorynchiformes are the sister lineage of the species-rich Otophysi and provide important insights into the diversification of ostariophysan fishes. Phylogenies of gonorynchiforms inferred using morphological characters and mtDNA gene sequences provide differing resolutions with regard to the sister lineage of all other gonorynchiforms (Chanos vs. Gonorynchus) and support for monophyly of the two miniaturized lineages Cromeria and Grasseichthys. In this study the phylogeny and divergence times of gonorynchiforms are investigated with DNA sequences sampled from nine nuclear genes and a published morphological character matrix. Bayesian phylogenetic analyses reveal substantial congruence among individual gene trees with inferences from eight genes placing Gonorynchus as the sister lineage to all other gonorynchiforms. Seven gene trees resolve Cromeria and Grasseichthys as a clade, supporting previous inferences using morphological characters. Phylogenies resulting from either concatenating the nuclear genes, performing a multispecies coalescent species tree analysis, or combining the morphological and nuclear gene DNA sequences resolve Gonorynchus as the living sister lineage of all other gonorynchiforms, strongly support the monophyly of Cromeria and Grasseichthys, and resolve a clade containing Parakneria, Cromeria, and Grasseichthys. The morphological dataset, which includes 13 gonorynchiform fossil taxa that range in age from Early Cretaceous to Eocene, was analyzed in combination with DNA sequences from the nine nuclear genes and a relaxed molecular clock to estimate times of evolutionary divergence. This "tip dating" strategy accommodates uncertainty in the phylogenetic resolution of fossil taxa that provide calibration information in the relaxed molecular clock analysis. The estimated age of the most recent common ancestor (MRCA) of living gonorynchiforms is slightly older than estimates from previous node dating efforts, but the molecular tip dating estimated ages of Kneriinae (Kneria, Parakneria, Cromeria, and Grasseichthys) and the two paedomorphic lineages, Cromeria and Grasseichthys, are considerably younger. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Your place or mine? A phylogenetic comparative analysis of marital residence in Indo-European and Austronesian societies

    PubMed Central

    Fortunato, Laura; Jordan, Fiona

    2010-01-01

    Accurate reconstruction of prehistoric social organization is important if we are to put together satisfactory multidisciplinary scenarios about, for example, the dispersal of human groups. Such considerations apply in the case of Indo-European and Austronesian, two large-scale language families that are thought to represent Neolithic expansions. Ancestral kinship patterns have mostly been inferred through reconstruction of kin terminologies in ancestral proto-languages using the linguistic comparative method, and through geographical or distributional arguments based on the comparative patterns of kin terms and ethnographic kinship ‘facts’. While these approaches are detailed and valuable, the processes through which conclusions have been drawn from the data fail to provide explicit criteria for systematic testing of alternative hypotheses. Here, we use language trees derived using phylogenetic tree-building techniques on Indo-European and Austronesian vocabulary data. With these trees, ethnographic data and Bayesian phylogenetic comparative methods, we statistically reconstruct past marital residence and infer rates of cultural change between different residence forms, showing Proto-Indo-European to be virilocal and Proto-Malayo-Polynesian uxorilocal. The instability of uxorilocality and the rare loss of virilocality once gained emerge as common features of both families. PMID:21041215

  4. Targeting legume loci: A comparison of three methods for target enrichment bait design in Leguminosae phylogenomics.

    PubMed

    Vatanparast, Mohammad; Powell, Adrian; Doyle, Jeff J; Egan, Ashley N

    2018-03-01

    The development of pipelines for locus discovery has spurred the use of target enrichment for plant phylogenomics. However, few studies have compared pipelines from locus discovery and bait design, through validation, to tree inference. We compared three methods within Leguminosae (Fabaceae) and present a workflow for future efforts. Using 30 transcriptomes, we compared Hyb-Seq, MarkerMiner, and the Yang and Smith (Y&S) pipelines for locus discovery, validated 7501 baits targeting 507 loci across 25 genera via Illumina sequencing, and inferred gene and species trees via concatenation- and coalescent-based methods. Hyb-Seq discovered loci with the longest mean length. MarkerMiner discovered the most conserved loci with the least flagged as paralogous. Y&S offered the most parsimony-informative sites and putative orthologs. Target recovery averaged 93% across taxa. We optimized our targeted locus set based on a workflow designed to minimize paralog/ortholog conflation and thus present 423 loci for legume phylogenomics. Methods differed across criteria important for phylogenetic marker development. We recommend Hyb-Seq as a method that may be useful for most phylogenomic projects. Our targeted locus set is a resource for future, community-driven efforts to reconstruct the legume tree of life.

  5. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

    PubMed

    Stamatakis, Alexandros

    2006-11-01

    RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets > or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively. icwww.epfl.ch/~stamatak

  6. Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes.

    PubMed

    Wolf, P

    1997-10-01

    Inferring basal relationships among vascular plants poses a major challenge to plant systematists. The divergence events that describe these relationships occurred long ago and considerable homoplasy has since accrued for both molecular and morphological characters. A potential solution is to examine phylogenetic analyses from multiple data sets. Here I present a new source of phylogenetic data for ferns and other pteridophytes. I sequenced the chloroplast gene atpB from 23 pteridophyte taxa and used maximum parsimony to infer relationships. A 588-bp region of the gene appeared to contain a statistically significant amount of phylogenetic signal and the resulting trees were largely congruent with similar analyses of nucleotide sequences from rbcL. However, a combined analysis of atpB plus rbcL produced a better resolved tree than did either data set alone. In the shortest trees, leptosporangiate ferns formed a monophyletic group. Also, I detected a well-supported clade of Psilotaceae (Psilotum and Tmesipteris) plus Ophioglossaceae (Ophioglossum and Botrychium). The demonstrated utility of atpB suggests that sequences from this gene should play a role in phylogenetic analyses that incorporate data from chloroplast genes, nuclear genes, morphology, and fossil data.

  7. Using Genotype Abundance to Improve Phylogenetic Inference

    PubMed Central

    Mesin, Luka; Victora, Gabriel D; Minin, Vladimir N; Matsen, Frederick A

    2018-01-01

    Abstract Modern biological techniques enable very dense genetic sampling of unfolding evolutionary histories, and thus frequently sample some genotypes multiple times. This motivates strategies to incorporate genotype abundance information in phylogenetic inference. In this article, we synthesize a stochastic process model with standard sequence-based phylogenetic optimality, and show that tree estimation is substantially improved by doing so. Our method is validated with extensive simulations and an experimental single-cell lineage tracing study of germinal center B cell receptor affinity maturation. PMID:29474671

  8. The contribution of respiration in tree-stems to the Dole Effect

    NASA Astrophysics Data System (ADS)

    Angert, A.; Muhr, J.; Negron Juarez, R.; Alegria Muñoz, W.; Kraemer, G.; Ramirez Santillan, J.; Chambers, J. Q.; Trumbore, S. E.

    2012-01-01

    Understanding the variability and the current value of the Dole Effect, which has been used to infer past changes in biospheric productivity, requires accurate information on the discrimination associated with respiratory oxygen consumption in each of the biosphere components. Respiration in tree stems is an important component of the land carbon cycle. Here we measured, for the first time, the discrimination associated with tree stem oxygen uptake. The measurements included tropical forest trees, which are major contributors to the global fluxes of carbon and oxygen. We found discrimination in the range of 12.6-21.5 ‰, indicating both diffusion limitation, resulting in O2 discrimination values below 20 ‰, and alternative oxidase respiration, which resulted in discrimination values greater than 20 ‰. Discrimination varied seasonally, between and within tree species. Calculations based on these results show that variability in woody plants discrimination can result in significant variations in the global Dole Effect.

  9. The contribution of respiration in tree stems to the Dole Effect

    NASA Astrophysics Data System (ADS)

    Angert, A.; Muhr, J.; Negron Juarez, R.; Alegria Muñoz, W.; Kraemer, G.; Ramirez Santillan, J.; Chambers, J. Q.; Trumbore, S. E.

    2012-10-01

    Understanding the variability and the current value of the Dole Effect, which has been used to infer past changes in biospheric productivity, requires accurate information on the isotopic discrimination associated with respiratory oxygen consumption in each of the biosphere components. Respiration in tree stems is an important component of the land carbon cycle. Here we measured, for the first time, the discrimination associated with tree stem oxygen uptake. The measurements included tropical forest trees, which are major contributors to the global fluxes of carbon and oxygen. We found discrimination in the range of 12.6-21.5‰, indicating both diffusion limitation, resulting in O2 discrimination values below 20‰, and alternative oxidase respiration, which resulted in discrimination values greater than 20‰. Discrimination varied seasonally, between and within tree species. Calculations based on these results show that variability in woody plants discrimination can result in significant variations in the global Dole Effect.

  10. Comparative Phylogeography of a Coevolved Community: Concerted Population Expansions in Joshua Trees and Four Yucca Moths

    PubMed Central

    Smith, Christopher Irwin; Tank, Shantel; Godsoe, William; Levenick, Jim; Strand, Eva; Esque, Todd; Pellmyr, Olle

    2011-01-01

    Comparative phylogeographic studies have had mixed success in identifying common phylogeographic patterns among co-distributed organisms. Whereas some have found broadly similar patterns across a diverse array of taxa, others have found that the histories of different species are more idiosyncratic than congruent. The variation in the results of comparative phylogeographic studies could indicate that the extent to which sympatrically-distributed organisms share common biogeographic histories varies depending on the strength and specificity of ecological interactions between them. To test this hypothesis, we examined demographic and phylogeographic patterns in a highly specialized, coevolved community – Joshua trees (Yucca brevifolia) and their associated yucca moths. This tightly-integrated, mutually interdependent community is known to have experienced significant range changes at the end of the last glacial period, so there is a strong a priori expectation that these organisms will show common signatures of demographic and distributional changes over time. Using a database of >5000 GPS records for Joshua trees, and multi-locus DNA sequence data from the Joshua tree and four species of yucca moth, we combined paleaodistribution modeling with coalescent-based analyses of demographic and phylgeographic history. We extensively evaluated the power of our methods to infer past population size and distributional changes by evaluating the effect of different inference procedures on our results, comparing our palaeodistribution models to Pleistocene-aged packrat midden records, and simulating DNA sequence data under a variety of alternative demographic histories. Together the results indicate that these organisms have shared a common history of population expansion, and that these expansions were broadly coincident in time. However, contrary to our expectations, none of our analyses indicated significant range or population size reductions at the end of the last glacial period, and the inferred demographic changes substantially predate Holocene climate changes. PMID:22028785

  11. Inference of Epidemiological Dynamics Based on Simulated Phylogenies Using Birth-Death and Coalescent Models

    PubMed Central

    Boskova, Veronika; Bonhoeffer, Sebastian; Stadler, Tanja

    2014-01-01

    Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population. PMID:25375100

  12. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452

  13. Do Macrophylogenies Yield Stable Macroevolutionary Inferences? An Example from Squamate Reptiles.

    PubMed

    Title, Pascal O; Rabosky, Daniel L

    2017-09-01

    Advances in the generation, retrieval, and analysis of phylogenetic data have enabled researchers to create phylogenies that contain many thousands of taxa. These "macrophylogenies"-large trees that typically derive from megaphylogeny, supermatrix, or supertree approaches-provide researchers with an unprecedented ability to conduct evolutionary analyses across broad phylogenetic scales. Many studies have now used these phylogenies to explore the dynamics of speciation, extinction, and phenotypic evolution across large swaths of the tree of life. These trees are characterized by substantial phylogenetic uncertainty on multiple levels, and the stability of macroevolutionary inferences from these data sets has not been rigorously explored. As a case study, we tested whether five recently published phylogenies for squamate reptiles-each consisting of more than 4000 species-yield congruent inferences about the processes that underlie variation in species richness across replicate evolutionary radiations of Australian snakes and lizards. We find discordance across the five focal phylogenies with respect to clade age and several diversification rate metrics, and in the effects of clade age on species richness. We also find that crown clade ages reported in the literature on these Australian groups are in conflict with all of the large phylogenies examined. Macrophylogenies offer an unprecedented opportunity to address evolutionary and ecological questions at broad phylogenetic scales, but accurately representing the uncertainty that is inherent to such analyses remains a critical challenge to our field. [Australia; macroevolution; macrophylogeny; squamates; time calibration.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Current and historical composition and size structure of upland forests across a soil gradient in north Mississippi

    Treesearch

    Sherry B. Surrette; Steven M. Aquilani; J. Stephen Brewer

    2008-01-01

    Comparisons of current and historical tree species composition and size structure along natural productivity gradients are useful for inferring effects of disturbance regimes and productivity on patterns of succession.

  15. Five centuries of Czech May-June precipitation and drought variability inferred from instrumental measurements, tree rings and documentary archives

    NASA Astrophysics Data System (ADS)

    Brázdil, R.; Büntgen, U.; Dobrovolný, P.; Trnka, M.; Kyncl, T.

    2010-09-01

    Precipitation is one of the most important meteorological elements for different natural processes as well as for human society. Its long term fluctuations in the Czech Lands (recent Czech Republic) can be studied using long instrumental series (Brno since January 1803, Prague-Klementinum since May 1804), a tree-ring chronology from southern Moravian fir Abies alba Mill. developed from living and historical trees (since A.D. 1376), and monthly precipitation indices derived from documentary evidence (from A.D. 1500). The analysis focuses on May-June precipitation and drought patterns represented by the Z-index for the past 500 years showing the highest response of the tree-ring chronology to the mentioned months in the calibration/verification period between 1803 and 1932. Tree-ring and documentary-based May-June Z-index reconstructions explaining ca 30-40% of its variability are compared with existing reconstructions of hydroclimatic patterns of the Central European region. Uncertainties of tree-ring and documentary datasets and corresponding reconstructions are discussed.

  16. Taxonomic relationships among Phenacomys voles as inferred by cytochrome b

    USGS Publications Warehouse

    Bellinger, M.R.; Haig, S.M.; Forsman, E.D.; Mullins, T.D.

    2005-01-01

    Taxonomic relationships among red tree voles (Phenacomys longicaudus longicaudus, P. l. silvicola), the Sonoma tree vole (P. pomo), the white-footed vole (P. albipes), and the heather vole (P. intermedius) were examined using 664 base pairs of the mitochondrial cytochrome b gene. Results indicate specific differences among red tree voles, Sonoma tree voles, white-footed voles, and heather voles, but no clear difference between the 2 Oregon subspecies of red tree voles (P. l. longicaudus and P. l. silvicola). Our data further indicated a close relationship between tree voles and albipes, validating inclusion of albipes in the subgenus Arborimus. These 3 congeners shared a closer relationship to P. intermedius than to other arvicolids. A moderate association between porno and albipes was indicated by maximum parsimony and neighbor-joining phylogenetic analyses. Molecular clock estimates suggest a Pleistocene radiation of the Arborimus clade, which is concordant with pulses of diversification observed in other murid rodents. The generic rank of Arborimus is subject to interpretation of data.

  17. Uncertain decision tree inductive inference

    NASA Astrophysics Data System (ADS)

    Zarban, L.; Jafari, S.; Fakhrahmad, S. M.

    2011-10-01

    Induction is the process of reasoning in which general rules are formulated based on limited observations of recurring phenomenal patterns. Decision tree learning is one of the most widely used and practical inductive methods, which represents the results in a tree scheme. Various decision tree algorithms have already been proposed such as CLS, ID3, Assistant C4.5, REPTree and Random Tree. These algorithms suffer from some major shortcomings. In this article, after discussing the main limitations of the existing methods, we introduce a new decision tree induction algorithm, which overcomes all the problems existing in its counterparts. The new method uses bit strings and maintains important information on them. This use of bit strings and logical operation on them causes high speed during the induction process. Therefore, it has several important features: it deals with inconsistencies in data, avoids overfitting and handles uncertainty. We also illustrate more advantages and the new features of the proposed method. The experimental results show the effectiveness of the method in comparison with other methods existing in the literature.

  18. Reversible polymorphism-aware phylogenetic models and their application to tree inference.

    PubMed

    Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin

    2016-10-21

    We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Improved canopy reflectance modeling and scene inference through improved understanding of scene pattern

    NASA Technical Reports Server (NTRS)

    Franklin, Janet; Simonett, David

    1988-01-01

    The Li-Strahler reflectance model, driven by LANDSAT Thematic Mapper (TM) data, provided regional estimates of tree size and density within 20 percent of sampled values in two bioclimatic zones in West Africa. This model exploits tree geometry in an inversion technique to predict average tree size and density from reflectance data using a few simple parameters measured in the field (spatial pattern, shape, and size distribution of trees) and in the imagery (spectral signatures of scene components). Trees are treated as simply shaped objects, and multispectral reflectance of a pixel is assumed to be related only to the proportions of tree crown, shadow, and understory in the pixel. These, in turn, are a direct function of the number and size of trees, the solar illumination angle, and the spectral signatures of crown, shadow and understory. Given the variance in reflectance from pixel to pixel within a homogeneous area of woodland, caused by the variation in the number and size of trees, the model can be inverted to give estimates of average tree size and density. Because the inversion is sensitive to correct determination of component signatures, predictions are not accurate for small areas.

  20. Geochemical evidence for hydroclimatic variability over the last 2460 years from Crevice Lake in Yellowstone National Park, USA

    USGS Publications Warehouse

    Stevens, L.R.; Dean, W.E.

    2008-01-01

    A 2460-year-long hydroclimatic record for Crevice Lake, Yellowstone National Park, Montana was constructed from the ??18O values of endogenic carbonates. The ??18O record is compared to the Palmer Hydrologic Drought Index (PHDI) and Pacific Decadal Oscillation (PDO) indices, as well as inferred discharge of the Yellowstone River. During the last century, high ??18O values coincide with drought conditions and the warm phase of the PDO index. Low ??18O values coincide with wet years and a negative PDO index. Comparison of tree-ring inferred discharge of the Yellowstone River with the ??18O record over the last 300 years indicates that periods of high discharge (i.e., wet winters with significant snow pack) correspond with low ??18O values. Extrapolating this relationship we infer wet winters and high river discharge for the periods of 1090-1030, 970-870, 670-620, and 500-430 cal years BP. The wet intervals at 670 and 500 cal BP are synchronous with similar events in Banff, Canada and Walker Lake, Nevada. The wet intervals at 970 and 670 cal BP overlap with wet intervals at Walker Lake and major drought events identified in the western Great Basin. These results suggest that the northern border of Yellowstone National Park straddles the boundary between Northern Rocky Mountains and Great Basin climate regimes. ?? 2007 Elsevier Ltd and INQUA.

  1. Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks

    NASA Astrophysics Data System (ADS)

    Sun, Wei; Chang, K. C.

    2005-05-01

    Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.

  2. Statistical learning and selective inference.

    PubMed

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  3. Context Inference for Mobile Applications in the UPCASE Project

    NASA Astrophysics Data System (ADS)

    Santos, André C.; Tarrataca, Luís; Cardoso, João M. P.; Ferreira, Diogo R.; Diniz, Pedro C.; Chainho, Paulo

    The growing processing capabilities of mobile devices coupled with portable and wearable sensors have enabled the development of context-aware services tailored to the user environment and its daily activities. The problem of determining the user context at each particular point in time is one of the main challenges in this area. In this paper, we describe the approach pursued in the UPCASE project, which makes use of sensors available in the mobile device as well as sensors externally connected via Bluetooth. We describe the system architecture from raw data acquisition to feature extraction and context inference. As a proof of concept, the inference of contexts is based on a decision tree to learn and identify contexts automatically and dynamically at runtime. Preliminary results suggest that this is a promising approach for context inference in several application scenarios.

  4. Implications of Liebig’s law of the minimum for tree-ring reconstructions of climate

    NASA Astrophysics Data System (ADS)

    Stine, A. R.; Huybers, P.

    2017-11-01

    A basic principle of ecology, known as Liebig’s Law of the Minimum, is that plant growth reflects the strongest limiting environmental factor. This principle implies that a limiting environmental factor can be inferred from historical growth and, in dendrochronology, such reconstruction is generally achieved by averaging collections of standardized tree-ring records. Averaging is optimal if growth reflects a single limiting factor and noise but not if growth also reflects locally variable stresses that intermittently limit growth. In this study a collection of Arctic tree ring records is shown to follow scaling relationships that are inconsistent with the signal-plus-noise model of tree growth but consistent with Liebig’s Law acting at the local level. Also consistent with law-of-the-minimum behavior is that reconstructions based on the least-stressed trees in a given year better-follow variations in temperature than typical approaches where all tree-ring records are averaged. Improvements in reconstruction skill occur across all frequencies, with the greatest increase at the lowest frequencies. More comprehensive statistical-ecological models of tree growth may offer further improvement in reconstruction skill.

  5. Inferring long-term carbon sequestration from tree rings at Harvard Forest: A calibration approach using tree ring widths and geochemistry / flux tower data

    NASA Astrophysics Data System (ADS)

    Belmecheri, S.; Maxwell, S.; Davis, K. J.; Alan, T. H.

    2012-12-01

    Improving the prediction skill of terrestrial carbon cycle models is important for reducing the uncertainties in global carbon cycle and climate projections. Additional evaluation and calibration of carbon models is required, using both observations and long-term proxy-derived data. Centennial-length data could be obtained from tree-rings archives that provide long continuous series of past forest growth changes with accurate annual resolution. Here we present results from a study conducted at Harvard Forest (Petersham, Massachusetts). The study examines the potential relationship between δ13C in dominant trees and GPP and/or NEE measured by the Harvard Forest flux tower (1992-2010). We have analyzed the δ13C composition of late wood-cellulose over the last 18 years from eastern hemlock (Tsuga canadensis) and northern red oak (Quercus rubra) trees growing in the flux tower footprint. δ13C values, corrected for the declining trend of atmospheric δ13C, show a decreasing trend from 1992 to 2010 and therefore a significant increase in discrimination (Δ). The intra-cellular CO2 (Ci) calculated from Δ shows a significant increase for both tree species and follows the same rate of atmospheric CO2 (Ca) increase (Ci/Ca increases). Interestingly, the net Ci and Δ increase observed for both species did not result in an increase of the iWUE. Ci/Ca is strongly related to the growing season Palmer Drought Severity Index (PDSI) for both species thus indicating a significant relationship between soil moisture conditions and stomatal conductance. The Ci trend is interpreted as a result of higher CO2 assimilation in response to increasing soil moisture allowing a longer stomata opening and therefore stimulating tree growth. This interpretation is consistent with the observed increase in GPP and the strengthening of the carbon sink (more negative NEE). Additionally, the decadal trends of basal area increment (BAI) calculated from tree-ring widths exhibit a positive trend over the last two decade. Tree-ring width and δ13C results show the potential of these parameters as proxies for reconstructions of past CO2 assimilation and carbon sequestration by woody biomass beyond the time span covered by calibration data, and extending to the centennial time scales encompassed by tree-ring records.

  6. A Test of Carbon and Oxygen Stable Isotope Ratio Process Models in Tree Rings.

    NASA Astrophysics Data System (ADS)

    Roden, J. S.; Farquhar, G. D.

    2008-12-01

    Stable isotopes ratios of carbon and oxygen in tree ring cellulose have been used to infer environmental change. Process-based models have been developed to clarify the potential of historic tree ring records for meaningful paleoclimatic reconstructions. However, isotopic variation can be influenced by multiple environmental factors making simplistic interpretations problematic. Recently, the dual isotope approach, where the variation in one stable isotope ratio (e.g. oxygen) is used to constrain the interpretation of variation in another (e.g. carbon), has been shown to have the potential to de-convolute isotopic analysis. However, this approach requires further testing to determine its applicability for paleo-reconstructions using tree-ring time series. We present a study where the information needed to parameterize mechanistic models for both carbon and oxygen stable isotope ratios were collected in controlled environment chambers for two species (Pinus radiata and Eucalyptus globulus). The seedlings were exposed to treatments designed to modify leaf temperature, transpiration rates, stomatal conductance and photosynthetic capacity. Both species were grown for over 100 days under two humidity regimes that differed by 20%. Stomatal conductance was significantly different between species and for seedlings under drought conditions but not between other treatments or humidity regimes. The treatments produced large differences in transpiration rate and photosynthesis. Treatments that effected photosynthetic rates but not stomatal conductance influenced carbon isotope discrimination more than those that influenced primarily conductance. The various treatments produced a range in oxygen isotope ratios of 7 ‰. Process models predicted greater oxygen isotope enrichment in tree ring cellulose than observed. The oxygen isotope ratios of bulk leaf water were reasonably well predicted by current steady-state models. However, the fractional difference between models that predict bulk leaf water versus the site of evaporation did not increase with transpiration rates. In conclusion, although the dual isotope approach may better constrain interpretation of isotopic variation, more work is required before its predictive power can be applied to tree-ring archives.

  7. A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees.

    PubMed

    Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa

    2017-01-01

    Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?

    PubMed Central

    Stadler, Tanja; Degnan, James H.; Rosenberg, Noah A.

    2016-01-01

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth–death and multispecies coalescent model can explain the difference in empirical trees and birth–death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion. PMID:26968785

  9. Interpreting the gamma statistic in phylogenetic diversification rate studies: a rate decrease does not necessarily indicate an early burst.

    PubMed

    Fordyce, James A

    2010-07-23

    Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.

  10. Binary Decision Trees for Preoperative Periapical Cyst Screening Using Cone-beam Computed Tomography.

    PubMed

    Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa

    2017-03-01

    Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.

  11. Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.

    PubMed

    Dhar, Amrit; Minin, Vladimir N

    2017-05-01

    Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.

  12. Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time

    PubMed Central

    Dhar, Amrit

    2017-01-01

    Abstract Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences. PMID:28177780

  13. 11-year cycle solar modulation of cosmic ray intensity inferred from C-14 content variation in dated tree rings

    NASA Technical Reports Server (NTRS)

    Fan, C. Y.; Chen, T. M.; Yun, S. X.; Dai, K. M.

    1983-01-01

    A liquid scintillation-photomultiplier tube counter system was used to measure the Delta-C-14 values of 60 tree rings, dating from 1866 to 1925, that were taken from a white spruce grown in Canada at 68 deg N, 130 deg W. A 10-percent variation is found which is anticorrelated with sunspot numbers, although the amplitude of the variation is 2-3 times higher than expected in trees grown at lower latitudes. A large dip in the data at about 1875 suggests an anomalously large modulation of cosmic ray intensity during the 1867-1878 AD solar cycle, which was the most active of the 19th century.

  14. Evolution of early life inferred from protein and ribonucleic acid sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Schwartz, R. M.

    1978-01-01

    The chemical structures of ferredoxin, 5S ribosomal RNA, and c-type cytochrome sequences have been employed to construct a phylogenetic tree which connects all major photosynthesizing organisms: the three types of bacteria, blue-green algae, and chloroplasts. Anaerobic and aerobic bacteria, eukaryotic cytoplasmic components and mitochondria are also included in the phylogenetic tree. Anaerobic nonphotosynthesizing bacteria similar to Clostridium were the earliest organisms, arising more than 3.2 billion years ago. Bacterial photosynthesis evolved nearly 3.0 billion years ago, while oxygen-evolving photosynthesis, originating in the blue-green algal line, came into being about 2.0 billion years ago. The phylogenetic tree supports the symbiotic theory of the origin of eukaryotes.

  15. Tree ring evidence for limited direct CO2 fertilization of forests over the 20th century

    NASA Astrophysics Data System (ADS)

    Gedalof, Ze'ev; Berg, Aaron A.

    2010-09-01

    The effect that rising atmospheric CO2 levels will have on forest productivity and water use efficiency remains uncertain, yet it has critical implications for future rates of carbon sequestration and forest distributions. Efforts to understand the effect that rising CO2 will have on forests are largely based on growth chamber studies of seedlings, and the relatively small number of FACE sites. Inferences from these studies are limited by their generally short durations, artificial growing conditions, unnatural step-increases in CO2 concentrations, and poor replication. Here we analyze the global record of annual radial tree growth, derived from the International Tree ring Data Bank (ITRDB), for evidence of increasing growth rates that cannot be explained by climatic change alone, and for evidence of decreasing sensitivity to drought. We find that approximately 20 percent of sites globally exhibit increasing trends in growth that cannot be attributed to climatic causes, nitrogen deposition, elevation, or latitude, which we attribute to a direct CO2 fertilization effect. No differences were found between species in their likelihood to exhibit growth increases attributable to CO2 fertilization, although Douglas-fir (Pseudotsuga menziesii) and ponderosa pine (Pinus ponderosa), the two most commonly sampled species in the ITRDB, exhibit a CO2 fertilization signal at frequencies very near their upper and lower confidence limits respectively. Overall these results suggest that CO2 fertilization of forests will not counteract emissions or slow warming in any substantial fashion, but do suggest that future forest dynamics may differ from those seen today depending on site conditions and individual species' responses to elevated CO2.

  16. MINER: exploratory analysis of gene interaction networks by machine learning from expression data.

    PubMed

    Kadupitige, Sidath Randeni; Leung, Kin Chun; Sellmeier, Julia; Sivieng, Jane; Catchpoole, Daniel R; Bain, Michael E; Gaëta, Bruno A

    2009-12-03

    The reconstruction of gene regulatory networks from high-throughput "omics" data has become a major goal in the modelling of living systems. Numerous approaches have been proposed, most of which attempt only "one-shot" reconstruction of the whole network with no intervention from the user, or offer only simple correlation analysis to infer gene dependencies. We have developed MINER (Microarray Interactive Network Exploration and Representation), an application that combines multivariate non-linear tree learning of individual gene regulatory dependencies, visualisation of these dependencies as both trees and networks, and representation of known biological relationships based on common Gene Ontology annotations. MINER allows biologists to explore the dependencies influencing the expression of individual genes in a gene expression data set in the form of decision, model or regression trees, using their domain knowledge to guide the exploration and formulate hypotheses. Multiple trees can then be summarised in the form of a gene network diagram. MINER is being adopted by several of our collaborators and has already led to the discovery of a new significant regulatory relationship with subsequent experimental validation. Unlike most gene regulatory network inference methods, MINER allows the user to start from genes of interest and build the network gene-by-gene, incorporating domain expertise in the process. This approach has been used successfully with RNA microarray data but is applicable to other quantitative data produced by high-throughput technologies such as proteomics and "next generation" DNA sequencing.

  17. Dissecting Molecular Evolution in the Highly Diverse Plant Clade Caryophyllales Using Transcriptome Sequencing

    PubMed Central

    Yang, Ya; Moore, Michael J.; Brockington, Samuel F.; Soltis, Douglas E.; Wong, Gane Ka-Shu; Carpenter, Eric J.; Zhang, Yong; Chen, Li; Yan, Zhixiang; Xie, Yinlong; Sage, Rowan F.; Covshoff, Sarah; Hibberd, Julian M.; Nelson, Matthew N.; Smith, Stephen A.

    2015-01-01

    Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups. PMID:25837578

  18. Chloroplast variation is incongruent with classification of the Australian bloodwood eucalypts (genus Corymbia, family Myrtaceae)

    PubMed Central

    Schuster, Tanja M.; Setaro, Sabrina D.; Tibbits, Josquin F. G.; Batty, Erin L.; Fowler, Rachael M.; McLay, Todd G. B.; Wilcox, Stephen; Ades, Peter K.

    2018-01-01

    Previous molecular phylogenetic analyses have resolved the Australian bloodwood eucalypt genus Corymbia (~100 species) as either monophyletic or paraphyletic with respect to Angophora (9–10 species). Here we assess relationships of Corymbia and Angophora using a large dataset of chloroplast DNA sequences (121,016 base pairs; from 90 accessions representing 55 Corymbia and 8 Angophora species, plus 33 accessions of related genera), skimmed from high throughput sequencing of genomic DNA, and compare results with new analyses of nuclear ITS sequences (119 accessions) from previous studies. Maximum likelihood and maximum parsimony analyses of cpDNA resolve well supported trees with most nodes having >95% bootstrap support. These trees strongly reject monophyly of Corymbia, its two subgenera (Corymbia and Blakella), most taxonomic sections (Abbreviatae, Maculatae, Naviculares, Septentrionales), and several species. ITS trees weakly indicate paraphyly of Corymbia (bootstrap support <50% for maximum likelihood, and 71% for parsimony), but are highly incongruent with the cpDNA analyses, in that they support monophyly of both subgenera and some taxonomic sections of Corymbia. The striking incongruence between cpDNA trees and both morphological taxonomy and ITS trees is attributed largely to chloroplast introgression between taxa, because of geographic sharing of chloroplast clades across taxonomic groups. Such introgression has been widely inferred in studies of the related genus Eucalyptus. This is the first report of its likely prevalence in Corymbia and Angophora, but this is consistent with previous morphological inferences of hybridisation between species. Our findings (based on continent-wide sampling) highlight a need for more focussed studies to assess the extent of hybridisation and introgression in the evolutionary history of these genera, and that critical testing of the classification of Corymbia and Angophora requires additional sequence data from nuclear genomes. PMID:29668710

  19. Anchored phylogenomics illuminates the skipper butterfly tree of life.

    PubMed

    Toussaint, Emmanuel F A; Breinholt, Jesse W; Earl, Chandra; Warren, Andrew D; Brower, Andrew V Z; Yago, Masaya; Dexter, Kelly M; Espeland, Marianne; Pierce, Naomi E; Lohman, David J; Kawahara, Akito Y

    2018-06-19

    Butterflies (Papilionoidea) are perhaps the most charismatic insect lineage, yet phylogenetic relationships among them remain incompletely studied and controversial. This is especially true for skippers (Hesperiidae), one of the most species-rich and poorly studied butterfly families. To infer a robust phylogenomic hypothesis for Hesperiidae, we sequenced nearly 400 loci using Anchored Hybrid Enrichment and sampled all tribes and more than 120 genera of skippers. Molecular datasets were analyzed using maximum-likelihood, parsimony and coalescent multi-species phylogenetic methods. All analyses converged on a novel, robust phylogenetic hypothesis for skippers. Different optimality criteria and methodologies recovered almost identical phylogenetic trees with strong nodal support at nearly all nodes and all taxonomic levels. Our results support Coeliadinae as the sister group to the remaining skippers, the monotypic Euschemoninae as the sister group to all other subfamilies but Coeliadinae, and the monophyly of Eudaminae plus Pyrginae. Within Pyrginae, Celaenorrhinini and Tagiadini are sister groups, the Neotropical firetips, Pyrrhopygini, are sister to all other tribes but Celaenorrhinini and Tagiadini. Achlyodini is recovered as the sister group to Carcharodini, and Erynnini as sister group to Pyrgini. Within the grass skippers (Hesperiinae), there is strong support for the monophyly of Aeromachini plus remaining Hesperiinae. The giant skippers (Agathymus and Megathymus) once classified as a subfamily, are recovered as monophyletic with strong support, but are deeply nested within Hesperiinae. Anchored Hybrid Enrichment sequencing resulted in a large amount of data that built the foundation for a new, robust evolutionary tree of skippers. The newly inferred phylogenetic tree resolves long-standing systematic issues and changes our understanding of the skipper tree of life. These resultsenhance understanding of the evolution of one of the most species-rich butterfly families.

  20. Early-branching euteleost relationships: areas of congruence between concatenation and coalescent model inferences.

    PubMed

    Campbell, Matthew A; Alfaro, Michael E; Belasco, Max; López, J Andrés

    2017-01-01

    Phylogenetic inference based on evidence from DNA sequences has led to significant strides in the development of a stable and robustly supported framework for the vertebrate tree of life. To date, the bulk of those advances have relied on sequence data from a small number of genome regions that have proven unable to produce satisfactory answers to consistently recalcitrant phylogenetic questions. Here, we re-examine phylogenetic relationships among early-branching euteleostean fish lineages classically grouped in the Protacanthopterygii using DNA sequence data surrounding ultraconserved elements. We report and examine a dataset of thirty-four OTUs with 17,957 aligned characters from fifty-three nuclear loci. Phylogenetic analysis is conducted in concatenated, joint gene trees and species tree estimation and summary coalescent frameworks. All analytical frameworks yield supporting evidence for existing hypotheses of relationship for the placement of Lepidogalaxias salamandroides , monophyly of the Stomiatii and the presence of an esociform + salmonid clade. Lepidogalaxias salamandroides and the Esociformes + Salmoniformes are successive sister lineages to all other euteleosts in the majority of analyses. The concatenated and joint gene trees and species tree analysis types produce high support values for this arrangement. However, inter-relationships of Argentiniformes, Stomiatii and Neoteleostei remain uncertain as they varied by analysis type while receiving strong and contradictory indices of support. Topological differences between analysis types are also apparent within the otomorph and the percomorph taxa in the data set. Our results identify concordant areas with strong support for relationships within and between early-branching euteleost lineages but they also reveal limitations in the ability of larger datasets to conclusively resolve other aspects of that phylogeny.

  1. Early-branching euteleost relationships: areas of congruence between concatenation and coalescent model inferences

    PubMed Central

    Alfaro, Michael E.; Belasco, Max; López, J. Andrés

    2017-01-01

    Phylogenetic inference based on evidence from DNA sequences has led to significant strides in the development of a stable and robustly supported framework for the vertebrate tree of life. To date, the bulk of those advances have relied on sequence data from a small number of genome regions that have proven unable to produce satisfactory answers to consistently recalcitrant phylogenetic questions. Here, we re-examine phylogenetic relationships among early-branching euteleostean fish lineages classically grouped in the Protacanthopterygii using DNA sequence data surrounding ultraconserved elements. We report and examine a dataset of thirty-four OTUs with 17,957 aligned characters from fifty-three nuclear loci. Phylogenetic analysis is conducted in concatenated, joint gene trees and species tree estimation and summary coalescent frameworks. All analytical frameworks yield supporting evidence for existing hypotheses of relationship for the placement of Lepidogalaxias salamandroides, monophyly of the Stomiatii and the presence of an esociform + salmonid clade. Lepidogalaxias salamandroides and the Esociformes + Salmoniformes are successive sister lineages to all other euteleosts in the majority of analyses. The concatenated and joint gene trees and species tree analysis types produce high support values for this arrangement. However, inter-relationships of Argentiniformes, Stomiatii and Neoteleostei remain uncertain as they varied by analysis type while receiving strong and contradictory indices of support. Topological differences between analysis types are also apparent within the otomorph and the percomorph taxa in the data set. Our results identify concordant areas with strong support for relationships within and between early-branching euteleost lineages but they also reveal limitations in the ability of larger datasets to conclusively resolve other aspects of that phylogeny. PMID:28929008

  2. Bayesian, maximum parsimony and UPGMA models for inferring the phylogenies of antelopes using mitochondrial markers.

    PubMed

    Khan, Haseeb A; Arif, Ibrahim A; Bahkali, Ali H; Al Farhan, Ahmad H; Al Homaidan, Ali A

    2008-10-06

    This investigation was aimed to compare the inference of antelope phylogenies resulting from the 16S rRNA, cytochrome-b (cyt-b) and d-loop segments of mitochondrial DNA using three different computational models including Bayesian (BA), maximum parsimony (MP) and unweighted pair group method with arithmetic mean (UPGMA). The respective nucleotide sequences of three Oryx species (Oryx leucoryx, Oryx dammah and Oryx gazella) and an out-group (Addax nasomaculatus) were aligned and subjected to BA, MP and UPGMA models for comparing the topologies of respective phylogenetic trees. The 16S rRNA region possessed the highest frequency of conserved sequences (97.65%) followed by cyt-b (94.22%) and d-loop (87.29%). There were few transitions (2.35%) and none transversions in 16S rRNA as compared to cyt-b (5.61% transitions and 0.17% transversions) and d-loop (11.57% transitions and 1.14% transversions) while comparing the four taxa. All the three mitochondrial segments clearly differentiated the genus Addax from Oryx using the BA or UPGMA models. The topologies of all the gamma-corrected Bayesian trees were identical irrespective of the marker type. The UPGMA trees resulting from 16S rRNA and d-loop sequences were also identical (Oryx dammah grouped with Oryx leucoryx) to Bayesian trees except that the UPGMA tree based on cyt-b showed a slightly different phylogeny (Oryx dammah grouped with Oryx gazella) with a low bootstrap support. However, the MP model failed to differentiate the genus Addax from Oryx. These findings demonstrate the efficiency and robustness of BA and UPGMA methods for phylogenetic analysis of antelopes using mitochondrial markers.

  3. Bayesian, Maximum Parsimony and UPGMA Models for Inferring the Phylogenies of Antelopes Using Mitochondrial Markers

    PubMed Central

    Khan, Haseeb A.; Arif, Ibrahim A.; Bahkali, Ali H.; Al Farhan, Ahmad H.; Al Homaidan, Ali A.

    2008-01-01

    This investigation was aimed to compare the inference of antelope phylogenies resulting from the 16S rRNA, cytochrome-b (cyt-b) and d-loop segments of mitochondrial DNA using three different computational models including Bayesian (BA), maximum parsimony (MP) and unweighted pair group method with arithmetic mean (UPGMA). The respective nucleotide sequences of three Oryx species (Oryx leucoryx, Oryx dammah and Oryx gazella) and an out-group (Addax nasomaculatus) were aligned and subjected to BA, MP and UPGMA models for comparing the topologies of respective phylogenetic trees. The 16S rRNA region possessed the highest frequency of conserved sequences (97.65%) followed by cyt-b (94.22%) and d-loop (87.29%). There were few transitions (2.35%) and none transversions in 16S rRNA as compared to cyt-b (5.61% transitions and 0.17% transversions) and d-loop (11.57% transitions and 1.14% transversions) while comparing the four taxa. All the three mitochondrial segments clearly differentiated the genus Addax from Oryx using the BA or UPGMA models. The topologies of all the gamma-corrected Bayesian trees were identical irrespective of the marker type. The UPGMA trees resulting from 16S rRNA and d-loop sequences were also identical (Oryx dammah grouped with Oryx leucoryx) to Bayesian trees except that the UPGMA tree based on cyt-b showed a slightly different phylogeny (Oryx dammah grouped with Oryx gazella) with a low bootstrap support. However, the MP model failed to differentiate the genus Addax from Oryx. These findings demonstrate the efficiency and robustness of BA and UPGMA methods for phylogenetic analysis of antelopes using mitochondrial markers. PMID:19204824

  4. Organellar Genomes from a ∼5,000-Year-Old Archaeological Maize Sample Are Closely Related to NB Genotype

    PubMed Central

    Pérez-Zamorano, Bernardo; Vallebueno-Estrada, Miguel; Martínez González, Javier; García Cook, Angel; Montiel, Rafael; Vielle-Calzada, Jean-Philippe

    2017-01-01

    The story of how preColumbian civilizations developed goes hand-in-hand with the process of plant domestication by Mesoamerican inhabitants. Here, we present the almost complete sequence of a mitochondrial genome and a partial chloroplast genome from an archaeological maize sample collected at the Valley of Tehuacán, México. Accelerator mass spectrometry dated the maize sample to be 5,040–5,300 years before present (95% probability). Phylogenetic analysis of the mitochondrial genome shows that the archaeological sample branches basal to the other Zea mays genomes, as expected. However, this analysis also indicates that fertile genotype NB is closely related to the archaeological maize sample and evolved before cytoplasmic male sterility genotypes (CMS-S, CMS-T, and CMS-C), thus contradicting previous phylogenetic analysis of mitochondrial genomes from maize. We show that maximum-likelihood infers a tree where CMS genotypes branch at the base of the tree when including sites that have a relative fast rate of evolution thus suggesting long-branch attraction. We also show that Bayesian analysis infer a topology where NB and the archaeological maize sample are at the base of the tree even when including faster sites. We therefore suggest that previous trees suffered from long-branch attraction. We also show that the phylogenetic analysis of the ancient chloroplast is congruent with genotype NB to be more closely related to the archaeological maize sample. As shown here, the inclusion of ancient genomes on phylogenetic trees greatly improves our understanding of the domestication process of maize, one of the most important crops worldwide. PMID:28338960

  5. On the distribution of interspecies correlation for Markov models of character evolution on Yule trees.

    PubMed

    Mulder, Willem H; Crawford, Forrest W

    2015-01-07

    Efforts to reconstruct phylogenetic trees and understand evolutionary processes depend fundamentally on stochastic models of speciation and mutation. The simplest continuous-time model for speciation in phylogenetic trees is the Yule process, in which new species are "born" from existing lineages at a constant rate. Recent work has illuminated some of the structural properties of Yule trees, but it remains mostly unknown how these properties affect sequence and trait patterns observed at the tips of the phylogenetic tree. Understanding the interplay between speciation and mutation under simple models of evolution is essential for deriving valid phylogenetic inference methods and gives insight into the optimal design of phylogenetic studies. In this work, we derive the probability distribution of interspecies covariance under Brownian motion and Ornstein-Uhlenbeck models of phenotypic change on a Yule tree. We compute the probability distribution of the number of mutations shared between two randomly chosen taxa in a Yule tree under discrete Markov mutation models. Our results suggest summary measures of phylogenetic information content, illuminate the correlation between site patterns in sequences or traits of related organisms, and provide heuristics for experimental design and reconstruction of phylogenetic trees. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Student Interpretations of Phylogenetic Trees in an Introductory Biology Course

    PubMed Central

    Dees, Jonathan; Niemi, Jarad; Montplaisir, Lisa

    2014-01-01

    Phylogenetic trees are widely used visual representations in the biological sciences and the most important visual representations in evolutionary biology. Therefore, phylogenetic trees have also become an important component of biology education. We sought to characterize reasoning used by introductory biology students in interpreting taxa relatedness on phylogenetic trees, to measure the prevalence of correct taxa-relatedness interpretations, and to determine how student reasoning and correctness change in response to instruction and over time. Counting synapomorphies and nodes between taxa were the most common forms of incorrect reasoning, which presents a pedagogical dilemma concerning labeled synapomorphies on phylogenetic trees. Students also independently generated an alternative form of correct reasoning using monophyletic groups, the use of which decreased in popularity over time. Approximately half of all students were able to correctly interpret taxa relatedness on phylogenetic trees, and many memorized correct reasoning without understanding its application. Broad initial instruction that allowed students to generate inferences on their own contributed very little to phylogenetic tree understanding, while targeted instruction on evolutionary relationships improved understanding to some extent. Phylogenetic trees, which can directly affect student understanding of evolution, appear to offer introductory biology instructors a formidable pedagogical challenge. PMID:25452489

  7. Drought and reproductive effort interact to control growth of a temperate broadleaved tree species (Fagus sylvatica).

    PubMed

    Hacket-Pain, Andrew J; Lageard, Jonathan G A; Thomas, Peter A

    2017-06-01

    Interannual variation in radial growth is influenced by a range of physiological processes, including variation in annual reproductive effort, although the importance of reproductive allocation has rarely been quantified. In this study, we use long stand-level records of annual seed production, radial growth (tree ring width) and meteorological conditions to analyse the relative importance of summer drought and reproductive effort in controlling the growth of Fagus sylvatica L., a typical masting species. We show that both summer drought and reproductive effort (masting) influenced growth. Importantly, the effects of summer drought and masting were interactive, with the greatest reductions in growth found in years when high reproductive effort (i.e., mast years) coincided with summer drought. Conversely, mast years that coincided with non-drought summers were associated with little reduction in radial growth, as were drought years that did not coincide with mast years. The results show that the strength of an inferred trade-off between growth and reproduction in this species (the cost of reproduction) is dependent on environmental stress, with a stronger trade-off in years with more stressful growing conditions. These results have widespread implications for understanding interannual variability in growth, and observed relationships between growth and climate. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction.

    PubMed

    Mai, Uyen; Sayyari, Erfan; Mirarab, Siavash

    2017-01-01

    Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods.

  9. Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.

    PubMed

    Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine

    2017-04-01

    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.

  10. Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction

    PubMed Central

    Sayyari, Erfan; Mirarab, Siavash

    2017-01-01

    Phylogenetic trees inferred using commonly-used models of sequence evolution are unrooted, but the root position matters both for interpretation and downstream applications. This issue has been long recognized; however, whether the potential for discordance between the species tree and gene trees impacts methods of rooting a phylogenetic tree has not been extensively studied. In this paper, we introduce a new method of rooting a tree based on its branch length distribution; our method, which minimizes the variance of root to tip distances, is inspired by the traditional midpoint rerooting and is justified when deviations from the strict molecular clock are random. Like midpoint rerooting, the method can be implemented in a linear time algorithm. In extensive simulations that consider discordance between gene trees and the species tree, we show that the new method is more accurate than midpoint rerooting, but its relative accuracy compared to using outgroups to root gene trees depends on the size of the dataset and levels of deviations from the strict clock. We show high levels of error for all methods of rooting estimated gene trees due to factors that include effects of gene tree discordance, deviations from the clock, and gene tree estimation error. Our simulations, however, did not reveal significant differences between two equivalent methods for species tree estimation that use rooted and unrooted input, namely, STAR and NJst. Nevertheless, our results point to limitations of existing scalable rooting methods. PMID:28800608

  11. TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

    PubMed

    Mai, Uyen; Mirarab, Siavash

    2018-05-08

    Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

  12. Utility of tree crown condition indicators to predict tree survival using remeasured Forest Inventory and Analysis data

    Treesearch

    Randall S. Morin; Jim Steinman; KaDonna C. Randolph

    2012-01-01

    The condition of tree crowns is an important indicator of tree and forest health. Crown conditions have been evaluated during surveys of Forest Inventory and Analysis (FIA) Phase 3 (P3) plots since 1999. In this study, remeasured data from 39,357 trees in the northern United States were used to assess the probability of survival among various tree species using the...

  13. Influence function for robust phylogenetic reconstructions.

    PubMed

    Bar-Hen, Avner; Mariadassou, Mahendra; Poursat, Marie-Anne; Vandenkoornhuyse, Philippe

    2008-05-01

    Based on the computation of the influence function, a tool to measure the impact of each piece of sampled data on the statistical inference of a parameter, we propose to analyze the support of the maximum-likelihood (ML) tree for each site. We provide a new tool for filtering data sets (nucleotides, amino acids, and others) in the context of ML phylogenetic reconstructions. Because different sites support different phylogenic topologies in different ways, outlier sites, that is, sites with a very negative influence value, are important: they can drastically change the topology resulting from the statistical inference. Therefore, these outlier sites must be clearly identified and their effects accounted for before drawing biological conclusions from the inferred tree. A matrix containing 158 fungal terminals all belonging to Chytridiomycota, Zygomycota, and Glomeromycota is analyzed. We show that removing the strongest outlier from the analysis strikingly modifies the ML topology, with a loss of as many as 20% of the internal nodes. As a result, estimating the topology on the filtered data set results in a topology with enhanced bootstrap support. From this analysis, the polyphyletic status of the fungal phyla Chytridiomycota and Zygomycota is reinforced, suggesting the necessity of revisiting the systematics of these fungal groups. We show the ability of influence function to produce new evolution hypotheses.

  14. Efficient Moment-Based Inference of Admixture Parameters and Sources of Gene Flow

    PubMed Central

    Levin, Alex; Reich, David; Patterson, Nick; Berger, Bonnie

    2013-01-01

    The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here, we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the model, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for Human Genome Diversity Cell Line Panel individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20–40% ancient northern Eurasian ancestry. PMID:23709261

  15. Bioinformatic Workflows for Generating Complete Plastid Genome Sequences-An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade.

    PubMed

    Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas

    2018-06-21

    The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.

  16. Hemispheric processing of predictive inferences during reading: The influence of negatively emotional valenced stimuli.

    PubMed

    Virtue, Sandra; Schutzenhofer, Michael; Tomkins, Blaine

    2017-07-01

    Although a left hemisphere advantage is usually evident during language processing, the right hemisphere is highly involved during the processing of weakly constrained inferences. However, currently little is known about how the emotional valence of environmental stimuli influences the hemispheric processing of these inferences. In the current study, participants read texts promoting either strongly or weakly constrained predictive inferences and performed a lexical decision task to inference-related targets presented to the left visual field-right hemisphere or the right visual field-left hemisphere. While reading these texts, participants either listened to dissonant music (i.e., the music condition) or did not listen to music (i.e., the no music condition). In the no music condition, the left hemisphere showed an advantage for strongly constrained inferences compared to weakly constrained inferences, whereas the right hemisphere showed high facilitation for both strongly and weakly constrained inferences. In the music condition, both hemispheres showed greater facilitation for strongly constrained inferences than for weakly constrained inferences. These results suggest that negatively valenced stimuli (such as dissonant music) selectively influences the right hemisphere's processing of weakly constrained inferences during reading.

  17. A Prize-Collecting Steiner Tree Approach for Transduction Network Inference

    NASA Astrophysics Data System (ADS)

    Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo

    Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.

  18. Parametric inference for biological sequence analysis.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  19. A Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits.

    PubMed

    Czech, Lucas; Huerta-Cepas, Jaime; Stamatakis, Alexandros

    2017-06-01

    Phylogenetic trees are routinely visualized to present and interpret the evolutionary relationships of species. Most empirical evolutionary data studies contain a visualization of the inferred tree with branch support values. Ambiguous semantics in tree file formats can lead to erroneous tree visualizations and therefore to incorrect interpretations of phylogenetic analyses. Here, we discuss problems that arise when displaying branch values on trees after rerooting. Branch values are typically stored as node labels in the widely-used Newick tree format. However, such values are attributes of branches. Storing them as node labels can therefore yield errors when rerooting trees. This depends on the mostly implicit semantics that tools deploy to interpret node labels. We reviewed ten tree viewers and ten bioinformatics toolkits that can display and reroot trees. We found that 14 out of 20 of these tools do not permit users to select the semantics of node labels. Thus, unaware users might obtain incorrect results when rooting trees. We illustrate such incorrect mappings for several test cases and real examples taken from the literature. This review has already led to improvements in eight tools. We suggest tools should provide options that explicitly force users to define the semantics of node labels. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Illusory inferences from a disjunction of conditionals: a new mental models account.

    PubMed

    Barrouillet, P; Lecas, J F

    2000-08-14

    (Johnson-Laird, P.N., & Savary, F. (1999, Illusory inferences: a novel class of erroneous deductions. Cognition, 71, 191-229.) have recently presented a mental models account, based on the so-called principle of truth, for the occurrence of inferences that are compelling but invalid. This article presents an alternative account of the illusory inferences resulting from a disjunction of conditionals. In accordance with our modified theory of mental models of the conditional, we show that the way individuals represent conditionals leads them to misinterpret the locus of the disjunction and prevents them from drawing conclusions from a false conditional, thus accounting for the compelling character of the illusory inference.

  1. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-02-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  2. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  3. Elevated Extinction Rates as a Trigger for Diversification Rate Shifts: Early Amniotes as a Case Study

    PubMed Central

    Brocklehurst, Neil; Ruta, Marcello; Müller, Johannes; Fröbisch, Jörg

    2015-01-01

    Tree shape analyses are frequently used to infer the location of shifts in diversification rate within the Tree of Life. Many studies have supported a causal relationship between shifts and temporally coincident events such as the evolution of “key innovations”. However, the evidence for such relationships is circumstantial. We investigated patterns of diversification during the early evolution of Amniota from the Carboniferous to the Triassic, subjecting a new supertree to analyses of tree balance in order to infer the timing and location of diversification shifts. We investigated how uneven origination and extinction rates drive diversification shifts, and use two case studies (herbivory and an aquatic lifestyle) to examine whether shifts tend to be contemporaneous with evolutionary novelties. Shifts within amniotes tend to occur during periods of elevated extinction, with mass extinctions coinciding with numerous and larger shifts. Diversification shifts occurring in clades that possess evolutionary innovations do not coincide temporally with the appearance of those innovations, but are instead deferred to periods of high extinction rate. We suggest such innovations did not cause increases in the rate of cladogenesis, but allowed clades to survive extinction events. We highlight the importance of examining general patterns of diversification before interpreting specific shifts. PMID:26592209

  4. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules

    PubMed Central

    Ashkenazy, Haim; Abadi, Shiran; Martz, Eric; Chay, Ofer; Mayrose, Itay; Pupko, Tal; Ben-Tal, Nir

    2016-01-01

    The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree. PMID:27166375

  5. Using single cell sequencing data to model the evolutionary history of a tumor.

    PubMed

    Kim, Kyung In; Simon, Richard

    2014-01-24

    The introduction of next-generation sequencing (NGS) technology has made it possible to detect genomic alterations within tumor cells on a large scale. However, most applications of NGS show the genetic content of mixtures of cells. Recently developed single cell sequencing technology can identify variation within a single cell. Characterization of multiple samples from a tumor using single cell sequencing can potentially provide information on the evolutionary history of that tumor. This may facilitate understanding how key mutations accumulate and evolve in lineages to form a heterogeneous tumor. We provide a computational method to infer an evolutionary mutation tree based on single cell sequencing data. Our approach differs from traditional phylogenetic tree approaches in that our mutation tree directly describes temporal order relationships among mutation sites. Our method also accommodates sequencing errors. Furthermore, we provide a method for estimating the proportion of time from the earliest mutation event of the sample to the most recent common ancestor of the sample of cells. Finally, we discuss current limitations on modeling with single cell sequencing data and possible improvements under those limitations. Inferring the temporal ordering of mutational sites using current single cell sequencing data is a challenge. Our proposed method may help elucidate relationships among key mutations and their role in tumor progression.

  6. Rooting depth varies differentially in trees and grasses as a function of mean annual rainfall in an African savanna.

    PubMed

    Holdo, Ricardo M; Nippert, Jesse B; Mack, Michelle C

    2018-01-01

    A significant fraction of the terrestrial biosphere comprises biomes containing tree-grass mixtures. Forecasting vegetation dynamics in these environments requires a thorough understanding of how trees and grasses use and compete for key belowground resources. There is disagreement about the extent to which tree-grass vertical root separation occurs in these ecosystems, how this overlap varies across large-scale environmental gradients, and what these rooting differences imply for water resource availability and tree-grass competition and coexistence. To assess the extent of tree-grass rooting overlap and how tree and grass rooting patterns vary across resource gradients, we examined landscape-level patterns of tree and grass functional rooting depth along a mean annual precipitation (MAP) gradient extending from ~ 450 to ~ 750 mm year -1 in Kruger National Park, South Africa. We used stable isotopes from soil and stem water to make inferences about relative differences in rooting depth between these two functional groups. We found clear differences in rooting depth between grasses and trees across the MAP gradient, with grasses generally exhibiting shallower rooting profiles than trees. We also found that trees tended to become more shallow-rooted as a function of MAP, to the point that trees and grasses largely overlapped in terms of rooting depth at the wettest sites. Our results reconcile previously conflicting evidence for rooting overlap in this system, and have important implications for understanding tree-grass dynamics under altered precipitation scenarios.

  7. Accumulation and long-term decline of radiocaesium contamination in tropical fruit trees

    NASA Astrophysics Data System (ADS)

    Anjos, R. M.; Mosquera, B.; Carvalho, C.; Sanches, N.; Bastos, J.; Gomes, P. R. S.; Macario, K.

    2007-09-01

    The accumulation of 137Cs, 40K and NH 4+ in several organs of tropical plants species were studied through measurements of its concentrations from mango, avocado, guava, papaya, banana and chili pepper trees. Our goal was to infer their differences in the uptake and translocation of such ions to the aboveground plant parts and to establish the suitability of using radiocaesium as a tracer for the plant uptake of nutrients. The results indicate Cs + is better tracer for K + as it is for NH 4+.

  8. Population Structure and History in Developing Core Sets in Wild Germplasm

    USDA-ARS?s Scientific Manuscript database

    Accurate inference of genetic discontinuities between populations is an essential component in studies of intraspecific biodiversity and evolution, as well as associative genetics. Multi-locus genotypes were amplified from 949 individuals representing seedling trees from 88 half-sib families from ei...

  9. Population Structure And History In Developing Core Sets In Wild Germplasm.

    USDA-ARS?s Scientific Manuscript database

    Accurate inference of genetic discontinuities between populations is an essential component in studies of intraspecific biodiversity and evolution, as well as associative genetics. Multi-locus genotypes were amplified from 949 individuals representing seedling trees from 88 half-sib families from ei...

  10. Climate Controls on Tree Growth Across Species and Sites in Northeastern Arizona

    NASA Astrophysics Data System (ADS)

    Schwan, M. R.; Guiterman, C. H.; Anchukaitis, K. J.

    2016-12-01

    Understanding how forests will respond to ongoing climate change is important for conservation and resource management. Conifer forests in the US Southwest are predicted to be particularly at risk from increased drought and higher temperatures projected to occur in the region. Tree-ring studies shed light on how trees respond to climate, but there remains considerable uncertainty as to which climate factors are most important, and which species are most at risk. Confounding climate and environmental factors, biological differences among species, and biogeography often complicate cross-species analysis. Here we present a multi-species, multivariate analysis of tree growth response to climate variability. We analyze data from three coexisting conifer tree species at two sites near Canyon de Chelly, Arizona. We use a high-resolution PRISM gridded climate dataset to determine the growth responses across species and sites to temperature and precipitation. We identify both common and differential responses in our data and use these to infer possible risks these forest communities may face under a changing climate.

  11. Phylogenetic classification and the universal tree.

    PubMed

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  12. Modeling of Water Flow Processes in the Soil-Plant-Atmosphere System: The Soil-Tree-Atmosphere Continuum Model

    NASA Astrophysics Data System (ADS)

    Massoud, E. C.; Vrugt, J. A.

    2015-12-01

    Trees and forests play a key role in controlling the water and energy balance at the land-air surface. This study reports on the calibration of an integrated soil-tree-atmosphere continuum (STAC) model using Bayesian inference with the DREAM algorithm and temporal observations of soil moisture content, matric head, sap flux, and leaf water potential from the King's River Experimental Watershed (KREW) in the southern Sierra Nevada mountain range in California. Water flow through the coupled system is described using the Richards' equation with both the soil and tree modeled as a porous medium with nonlinear soil and tree water relationships. Most of the model parameters appear to be reasonably well defined by calibration against the observed data. The posterior mean simulation reproduces the observed soil and tree data quite accurately, but a systematic mismatch is observed between early afternoon measured and simulated sap fluxes. We will show how this points to a structural error in the STAC-model and suggest and test an alternative hypothesis for root water uptake that alleviates this problem.

  13. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases.

    PubMed

    Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B

    2018-06-03

    There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.

  14. Climatic changes during the early Medieval and recent periods inferred from δ13C and δ18O of Siberian larch trees

    NASA Astrophysics Data System (ADS)

    Sidorova, O. V.; Matthias Saurer, Rolf Siegwolf

    2010-12-01

    We report unique isotope datasets for δ13C and δ18O of wood and cellulose of larch trees (Larix cajanderi Mayr.) from Northeastern Yakutia [70°N-148°E] for the early Medieval (AD 900-1000) and recent (AD 1880-2004) periods. During the recent period June, July, and August air temperatures were positively correlated with δ13C and δ18O of wood and cellulose, while July precipitation was negatively correlated. The positive correlation with one of the warmest months (July) in Northeastern Yakutia could indicate high photosynthetic capacity, because warm and dry conditions cause stomatal closure and lower the isotopic fractionation, leading to less negative δ13C values. Because during July, the soil water is still frozen at a soil depth of 20-30 cm, the water accessibility for trees is limited, which can lead to drought situations. An increase in water availability allows for a higher stomatal conductance, resulting in lower δ13C values, leading to a negative relationship with summer precipitation. Furthermore, the vapor pressure deficit of July and August was significantly correlated with δ13C of wood and cellulose, indicating decreased stomatal conductance, an expression of moderate drought. This leads to reduced 13CO2 discrimination and less negative δ13C values. The simultaneous increase of δ18O also indicates a reduction in stomatal conductance under rather dry conditions or drought. Comparative analyses between mean isotope values for the AD 900-1000 and AD 1880-2004 periods indicate similar ranges of climatic conditions, with the exception of the period AD 1950-2004, which is characterized by increased summer drought. Whilst isotopic ratios in cellulose are reliably related to climatic variables, those in whole wood showed even stronger relationships during some periods. Strong positive correlations between δ18O of cellulose and Greenland ice-core data were detected for the beginning of the Medieval period (r=0.86; p<0.05), indicating the reliability of isotope signals in tree rings for large-scale reconstructions. The recovery of multiple climate proxies from one archive, in this case annual tree rings, has the potential to identify more specific mechanistic links between the archive and varying climate. In this case, we enhance the existing quantitative reconstruction of early summer temperature from northeastern Yakutia with isotopic data, and gain a wider insight into the conditions under which the rings were formed. The multiple signals stored in tree rings, in particular isotope data, have the potential to increase our understanding of the influence of permafrost and precipitation on the mechanism of plant growth, and their response to this harsh climate in the vast Boreal zone. Acknowledgements: This work was supported by Marie Curie International Incoming Fellowship (FP7-235122), grants RFBR 09-05-98015 r_sibir_a. Thanks to Mukhtar Naurzbaev for sampling of the tree-ring material. Grants to Malcolm K Hughes, University of Arizona from the US National Science Foundation (9413327 and 0308525) supported the collection, dating, and ring-width measurement of material used in this study.

  15. Hypersaline sapropels act as hotspots for microbial dark matter

    DOE PAGES

    Andrei, Adrian -Stefan; Baricz, Andreea; Robeson, Michael Scott; ...

    2017-07-21

    Present-day terrestrial analogue sites are crucial ground truth proxies for studying life in geochemical conditions close to those assumed to be present on early Earth or inferred to exist on other celestial bodies (e.g. Mars, Europa). Although hypersaline sapropels are border-of-life habitats with moderate occurrence, their microbiological and physicochemical characterization lags behind. Here, we study the diversity of life under low water activity by describing the prokaryotic communities from two disparate hypersaline sapropels (Transylvanian Basin, Romania) in relation to geochemical milieu and pore water chemistry, while inferring their role in carbon cycling by matching taxa to known taxon-specific biogeochemical functions.more » Furthermore, the polyphasic approach combined deep coverage SSU rRNA gene amplicon sequencing and bioinformatics with RT-qPCR and physicochemical investigations. We found that sapropels developed an analogous elemental milieu and harbored prokaryotes affiliated with fifty-nine phyla, among which the most abundant were Proteobacteria, Bacteroidetes and Chloroflexi. Containing thirty-two candidate divisions and possibly undocumented prokaryotic lineages, the hypersaline sapropels were found to accommodate one of the most diverse and novel ecosystems reported to date and may contribute to completing the phylogenetic branching of the tree of life.« less

  16. Hypersaline sapropels act as hotspots for microbial dark matter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrei, Adrian -Stefan; Baricz, Andreea; Robeson, Michael Scott

    Present-day terrestrial analogue sites are crucial ground truth proxies for studying life in geochemical conditions close to those assumed to be present on early Earth or inferred to exist on other celestial bodies (e.g. Mars, Europa). Although hypersaline sapropels are border-of-life habitats with moderate occurrence, their microbiological and physicochemical characterization lags behind. Here, we study the diversity of life under low water activity by describing the prokaryotic communities from two disparate hypersaline sapropels (Transylvanian Basin, Romania) in relation to geochemical milieu and pore water chemistry, while inferring their role in carbon cycling by matching taxa to known taxon-specific biogeochemical functions.more » Furthermore, the polyphasic approach combined deep coverage SSU rRNA gene amplicon sequencing and bioinformatics with RT-qPCR and physicochemical investigations. We found that sapropels developed an analogous elemental milieu and harbored prokaryotes affiliated with fifty-nine phyla, among which the most abundant were Proteobacteria, Bacteroidetes and Chloroflexi. Containing thirty-two candidate divisions and possibly undocumented prokaryotic lineages, the hypersaline sapropels were found to accommodate one of the most diverse and novel ecosystems reported to date and may contribute to completing the phylogenetic branching of the tree of life.« less

  17. Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.

    PubMed

    De Sa, Christopher; Zhang, Ce; Olukotun, Kunle; Ré, Christopher

    2015-12-01

    Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results. Theoretical guarantees for its performance are weak: even for tree structured graphs, the mixing time of Gibbs may be exponential in the number of variables. To help understand the behavior of Gibbs sampling, we introduce a new (hyper)graph property, called hierarchy width . We show that under suitable conditions on the weights, bounded hierarchy width ensures polynomial mixing time. Our study of hierarchy width is in part motivated by a class of factor graph templates, hierarchical templates , which have bounded hierarchy width-regardless of the data used to instantiate them. We demonstrate a rich application from natural language processing in which Gibbs sampling provably mixes rapidly and achieves accuracy that exceeds human volunteers.

  18. Historical reconstruction of climatic and elevation preferences and the evolution of cloud forest-adapted tree ferns in Mesoamerica.

    PubMed

    Sosa, Victoria; Ornelas, Juan Francisco; Ramírez-Barahona, Santiago; Gándara, Etelvina

    2016-01-01

    Cloud forests, characterized by a persistent, frequent or seasonal low-level cloud cover and fragmented distribution, are one of the most threatened habitats, especially in the Neotropics. Tree ferns are among the most conspicuous elements in these forests, and ferns are restricted to regions in which minimum temperatures rarely drop below freezing and rainfall is high and evenly distributed around the year. Current phylogeographic data suggest that some of the cloud forest-adapted species remained in situ or expanded to the lowlands during glacial cycles and contracted allopatrically during the interglacials. Although the observed genetic signals of population size changes of cloud forest-adapted species including tree ferns correspond to predicted changes by Pleistocene climate change dynamics, the observed patterns of intraspecific lineage divergence showed temporal incongruence. Here we combined phylogenetic analyses, ancestral area reconstruction, and divergence time estimates with climatic and altitudinal data (environmental space) for phenotypic traits of tree fern species to make inferences about evolutionary processes in deep time. We used phylogenetic Bayesian inference and geographic and altitudinal distribution of tree ferns to investigate ancestral area and elevation and environmental preferences of Mesoamerican tree ferns. The phylogeny was then used to estimate divergence times and ask whether the ancestral area and elevation and environmental shifts were linked to climatic events and historical climatic preferences. Bayesian trees retrieved Cyathea, Alsophyla, Gymnosphaera and Sphaeropteris in monophyletic clades. Splits for species in these genera found in Mesoamerican cloud forests are recent, from the Neogene to the Quaternary, Australia was identified as the ancestral area for the clades of these genera, except for Gymnosphaera that was Mesoamerica. Climate tolerance was not divergent from hypothesized ancestors for the most significant variables or elevation. For elevational shifts, we found repeated change from low to high elevations. Our data suggest that representatives of Cyatheaceae main lineages migrated from Australia to Mesoamerican cloud forests in different times and have persisted in these environmentally unstable areas but extant species diverged recentrly from their ancestors.

  19. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.

  20. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515

  1. Historical reconstruction of climatic and elevation preferences and the evolution of cloud forest-adapted tree ferns in Mesoamerica

    PubMed Central

    2016-01-01

    Background Cloud forests, characterized by a persistent, frequent or seasonal low-level cloud cover and fragmented distribution, are one of the most threatened habitats, especially in the Neotropics. Tree ferns are among the most conspicuous elements in these forests, and ferns are restricted to regions in which minimum temperatures rarely drop below freezing and rainfall is high and evenly distributed around the year. Current phylogeographic data suggest that some of the cloud forest-adapted species remained in situ or expanded to the lowlands during glacial cycles and contracted allopatrically during the interglacials. Although the observed genetic signals of population size changes of cloud forest-adapted species including tree ferns correspond to predicted changes by Pleistocene climate change dynamics, the observed patterns of intraspecific lineage divergence showed temporal incongruence. Methods Here we combined phylogenetic analyses, ancestral area reconstruction, and divergence time estimates with climatic and altitudinal data (environmental space) for phenotypic traits of tree fern species to make inferences about evolutionary processes in deep time. We used phylogenetic Bayesian inference and geographic and altitudinal distribution of tree ferns to investigate ancestral area and elevation and environmental preferences of Mesoamerican tree ferns. The phylogeny was then used to estimate divergence times and ask whether the ancestral area and elevation and environmental shifts were linked to climatic events and historical climatic preferences. Results Bayesian trees retrieved Cyathea, Alsophyla, Gymnosphaera and Sphaeropteris in monophyletic clades. Splits for species in these genera found in Mesoamerican cloud forests are recent, from the Neogene to the Quaternary, Australia was identified as the ancestral area for the clades of these genera, except for Gymnosphaera that was Mesoamerica. Climate tolerance was not divergent from hypothesized ancestors for the most significant variables or elevation. For elevational shifts, we found repeated change from low to high elevations. Conclusions Our data suggest that representatives of Cyatheaceae main lineages migrated from Australia to Mesoamerican cloud forests in different times and have persisted in these environmentally unstable areas but extant species diverged recentrly from their ancestors. PMID:27896030

  2. Phylogenetic rooting using minimal ancestor deviation.

    PubMed

    Tria, Fernando Domingues Kümmel; Landan, Giddy; Dagan, Tal

    2017-06-19

    Ancestor-descendent relations play a cardinal role in evolutionary theory. Those relations are determined by rooting phylogenetic trees. Existing rooting methods are hampered by evolutionary rate heterogeneity or the unavailability of auxiliary phylogenetic information. Here we present a rooting approach, the minimal ancestor deviation (MAD) method, which accommodates heterotachy by using all pairwise topological and metric information in unrooted trees. We demonstrate the performance of the method, in comparison to existing rooting methods, by the analysis of phylogenies from eukaryotes and prokaryotes. MAD correctly recovers the known root of eukaryotes and uncovers evidence for the origin of cyanobacteria in the ocean. MAD is more robust and consistent than existing methods, provides measures of the root inference quality and is applicable to any tree with branch lengths.

  3. Signatures of microevolutionary processes in phylogenetic patterns.

    PubMed

    Costa, Carolina L N; Lemos-Costa, Paula; Marquitti, Flavia M D; Fernandes, Lucas D; Ramos, Marlon F; Schneider, David M; Martins, Ayana B; Aguiar, Marcus A M

    2018-06-23

    Phylogenetic trees are representations of evolutionary relationships among species and contain signatures of the processes responsible for the speciation events they display. Inferring processes from tree properties, however, is challenging. To address this problem we analysed a spatially-explicit model of speciation where genome size and mating range can be controlled. We simulated parapatric and sympatric (narrow and wide mating range, respectively) radiations and constructed their phylogenetic trees, computing structural properties such as tree balance and speed of diversification. We showed that parapatric and sympatric speciation are well separated by these structural tree properties. Balanced trees with constant rates of diversification only originate in sympatry and genome size affected both the balance and the speed of diversification of the simulated trees. Comparison with empirical data showed that most of the evolutionary radiations considered to have developed in parapatry or sympatry are in good agreement with model predictions. Even though additional forces other than spatial restriction of gene flow, genome size, and genetic incompatibilities, do play a role in the evolution of species formation, the microevolutionary processes modeled here capture signatures of the diversification pattern of evolutionary radiations, regarding the symmetry and speed of diversification of lineages.

  4. The impact of phenotypic and molecular data on the inference of Colletotrichum diversity associated with Musa.

    PubMed

    Vieira, Willie A S; Lima, Waléria G; Nascimento, Eduardo S; Michereff, Sami J; Câmara, Marcos P S; Doyle, Vinson P

    2017-01-01

    Developing a comprehensive and reliable taxonomy for the Colletotrichum gloeosporioides species complex will require adopting data standards on the basis of an understanding of how methodological choices impact morphological evaluations and phylogenetic inference. We explored the impact of methodological choices in a morphological and molecular evaluation of Colletotrichum species associated with banana in Brazil. The choice of alignment filtering algorithm has a significant impact on topological inference and the retention of phylogenetically informative sites. Similarly, the choice of phylogenetic marker affects the delimitation of species boundaries, particularly if low phylogenetic signal is confounded with strong discordance, and inference of the species tree from multiple-gene trees. According to both phylogenetic informativeness profiling and Bayesian concordance analyses, the most informative loci are DNA lyase (APN2), intergenic spacer (IGS) between DNA lyase and the mating-type locus MAT1-2-1 (APN2/MAT-IGS), calmodulin (CAL), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), glutamine synthetase (GS), β-tubulin (TUB2), and a new marker, the intergenic spacer between GAPDH and an hypothetical protein (GAP2-IGS). Cornmeal agar minimizes the variance in conidial dimensions compared with potato dextrose agar and synthetic nutrient-poor agar, such that species are more readily distinguishable based on phenotypic differences. We apply these insights to investigate the diversity of Colletotrichum species associated with banana anthracnose in Brazil and report C. musae, C. tropicale, C. theobromicola, and C. siamense in association with banana anthracnose. One lineage did not cluster with any previously described species and is described here as C. chrysophilum.

  5. Alignment-free genome tree inference by learning group-specific distance metrics.

    PubMed

    Patil, Kaustubh R; McHardy, Alice C

    2013-01-01

    Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.

  6. Expertise and category-based induction.

    PubMed

    Proffitt, J B; Coley, J D; Medin, D L

    2000-07-01

    The authors examined inductive reasoning among experts in a domain. Three types of tree experts (landscapers, taxonomists, and parks maintenance personnel) completed 3 reasoning tasks. In Experiment 1, participants inferred which of 2 novel diseases would affect "more other kinds of trees" and provided justifications for their choices. In Experiment 2, the authors used modified instructions and asked which disease would be more likely to affect "all trees." In Experiment 3, the conclusion category was eliminated altogether, and participants were asked to generate a list of other affected trees. Among these populations, typicality and diversity effects were weak to nonexistent. Instead, experts' reasoning was influenced by "local" coverage (extension of the property to members of the same folk family) and causal-ecological factors. The authors concluded that domain knowledge leads to the use of a variety of reasoning strategies not captured by current models of category-based induction.

  7. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. The Estimation of Tree Posterior Probabilities Using Conditional Clade Probability Distributions

    PubMed Central

    Larget, Bret

    2013-01-01

    In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample. [Bayesian phylogenetics; conditional clade distributions; improved accuracy; posterior probabilities of trees.] PMID:23479066

  9. MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics.

    PubMed

    Helaers, Raphaël; Milinkovitch, Michel C

    2010-07-15

    The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org.

  10. MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

    PubMed Central

    2010-01-01

    Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org. PMID:20633263

  11. A method for inferring the rate of evolution of homologous characters that can potentially improve phylogenetic inference, resolve deep divergence and correct systematic biases.

    PubMed

    Cummins, Carla A; McInerney, James O

    2011-12-01

    Current phylogenetic methods attempt to account for evolutionary rate variation across characters in a matrix. This is generally achieved by the use of sophisticated evolutionary models, combined with dense sampling of large numbers of characters. However, systematic biases and superimposed substitutions make this task very difficult. Model adequacy can sometimes be achieved at the cost of adding large numbers of free parameters, with each parameter being optimized according to some criterion, resulting in increased computation times and large variances in the model estimates. In this study, we develop a simple approach that estimates the relative evolutionary rate of each homologous character. The method that we describe uses the similarity between characters as a proxy for evolutionary rate. In this article, we work on the premise that if the character-state distribution of a homologous character is similar to many other characters, then this character is likely to be relatively slowly evolving. If the character-state distribution of a homologous character is not similar to many or any of the rest of the characters in a data set, then it is likely to be the result of rapid evolution. We show that in some test cases, at least, the premise can hold and the inferences are robust. Importantly, the method does not use a "starting tree" to make the inference and therefore is tree independent. We demonstrate that this approach can work as well as a maximum likelihood (ML) approach, though the ML method needs to have a known phylogeny, or at least a very good estimate of that phylogeny. We then demonstrate some uses for this method of analysis, including the improvement in phylogeny reconstruction for both deep-level and recent relationships and overcoming systematic biases such as base composition bias. Furthermore, we compare this approach to two well-established methods for reweighting or removing characters. These other methods are tree-based and we show that they can be systematically biased. We feel this method can be useful for phylogeny reconstruction, understanding evolutionary rate variation, and for understanding selection variation on different characters.

  12. The evolutionary history of holometabolous insects inferred from transcriptome-based phylogeny and comprehensive morphological data.

    PubMed

    Peters, Ralph S; Meusemann, Karen; Petersen, Malte; Mayer, Christoph; Wilbrandt, Jeanne; Ziesmann, Tanja; Donath, Alexander; Kjer, Karl M; Aspöck, Ulrike; Aspöck, Horst; Aberer, Andre; Stamatakis, Alexandros; Friedrich, Frank; Hünefeld, Frank; Niehuis, Oliver; Beutel, Rolf G; Misof, Bernhard

    2014-03-20

    Despite considerable progress in systematics, a comprehensive scenario of the evolution of phenotypic characters in the mega-diverse Holometabola based on a solid phylogenetic hypothesis was still missing. We addressed this issue by de novo sequencing transcriptome libraries of representatives of all orders of holometabolan insects (13 species in total) and by using a previously published extensive morphological dataset. We tested competing phylogenetic hypotheses by analyzing various specifically designed sets of amino acid sequence data, using maximum likelihood (ML) based tree inference and Four-cluster Likelihood Mapping (FcLM). By maximum parsimony-based mapping of the morphological data on the phylogenetic relationships we traced evolutionary transformations at the phenotypic level and reconstructed the groundplan of Holometabola and of selected subgroups. In our analysis of the amino acid sequence data of 1,343 single-copy orthologous genes, Hymenoptera are placed as sister group to all remaining holometabolan orders, i.e., to a clade Aparaglossata, comprising two monophyletic subunits Mecopterida (Amphiesmenoptera + Antliophora) and Neuropteroidea (Neuropterida + Coleopterida). The monophyly of Coleopterida (Coleoptera and Strepsiptera) remains ambiguous in the analyses of the transcriptome data, but appears likely based on the morphological data. Highly supported relationships within Neuropterida and Antliophora are Raphidioptera + (Neuroptera + monophyletic Megaloptera), and Diptera + (Siphonaptera + Mecoptera). ML tree inference and FcLM yielded largely congruent results. However, FcLM, which was applied here for the first time to large phylogenomic supermatrices, displayed additional signal in the datasets that was not identified in the ML trees. Our phylogenetic results imply that an orthognathous larva belongs to the groundplan of Holometabola, with compound eyes and well-developed thoracic legs, externally feeding on plants or fungi. Ancestral larvae of Aparaglossata were prognathous, equipped with single larval eyes (stemmata), and possibly agile and predacious. Ancestral holometabolan adults likely resembled in their morphology the groundplan of adult neopteran insects. Within Aparaglossata, the adult's flight apparatus and ovipositor underwent strong modifications. We show that the combination of well-resolved phylogenies obtained by phylogenomic analyses and well-documented extensive morphological datasets is an appropriate basis for reconstructing complex morphological transformations and for the inference of evolutionary histories.

  13. Diversity and evolutionary origins of fungi associated with seeds of a neotropical pioneer tree: a case study for analysing fungal environmental samples.

    PubMed

    U'ren, Jana M; Dalling, James W; Gallery, Rachel E; Maddison, David R; Davis, E Christine; Gibson, Cara M; Arnold, A Elizabeth

    2009-04-01

    Fungi associated with seeds of tropical trees pervasively affect seed survival and germination, and thus are an important, but understudied, component of forest ecology. Here, we examine the diversity and evolutionary origins of fungi isolated from seeds of an important pioneer tree (Cecropia insignis, Cecropiaceae) following burial in soil for five months in a tropical moist forest in Panama. Our approach, which relied on molecular sequence data because most isolates did not sporulate in culture, provides an opportunity to evaluate several methods currently used to analyse environmental samples of fungi. First, intra- and interspecific divergence were estimated for the nu-rITS and 5.8S gene for four genera of Ascomycota that are commonly recovered from seeds. Using these values we estimated species boundaries for 527 isolates, showing that seed-associated fungi are highly diverse, horizontally transmitted, and genotypically congruent with some foliar endophytes from the same site. We then examined methods for inferring the taxonomic placement and phylogenetic relationships of these fungi, evaluating the effects of manual versus automated alignment, model selection, and inference methods, as well as the quality of BLAST-based identification using GenBank. We found that common methods such as neighbor-joining and Bayesian inference differ in their sensitivity to alignment methods; analyses of particular fungal genera differ in their sensitivity to alignments; and numerous and sometimes intricate disparities exist between BLAST-based versus phylogeny-based identification methods. Lastly, we used our most robust methods to infer phylogenetic relationships of seed-associated fungi in four focal genera, and reconstructed ancestral states to generate preliminary hypotheses regarding the evolutionary origins of this guild. Our results illustrate the dynamic evolutionary relationships among endophytic fungi, pathogens, and seed-associated fungi, and the apparent evolutionary distinctiveness of saprotrophs. Our study also elucidates the diversity, taxonomy, and ecology of an important group of plant-associated fungi and highlights some of the advantages and challenges inherent in the use of ITS data for environmental sampling of fungi.

  14. Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set

    PubMed Central

    Hall, Matthew; Woolhouse, Mark; Rambaut, Andrew

    2015-01-01

    The use of genetic data to reconstruct the transmission tree of infectious disease epidemics and outbreaks has been the subject of an increasing number of studies, but previous approaches have usually either made assumptions that are not fully compatible with phylogenetic inference, or, where they have based inference on a phylogeny, have employed a procedure that requires this tree to be fixed. At the same time, the coalescent-based models of the pathogen population that are employed in the methods usually used for time-resolved phylogeny reconstruction are a considerable simplification of epidemic process, as they assume that pathogen lineages mix freely. Here, we contribute a new method that is simultaneously a phylogeny reconstruction method for isolates taken from an epidemic, and a procedure for transmission tree reconstruction. We observe that, if one or more samples is taken from each host in an epidemic or outbreak and these are used to build a phylogeny, a transmission tree is equivalent to a partition of the set of nodes of this phylogeny, such that each partition element is a set of nodes that is connected in the full tree and contains all the tips corresponding to samples taken from one and only one host. We then implement a Monte Carlo Markov Chain (MCMC) procedure for simultaneous sampling from the spaces of both trees, utilising a newly-designed set of phylogenetic tree proposals that also respect node partitions. We calculate the posterior probability of these partitioned trees based on a model that acknowledges the population structure of an epidemic by employing an individual-based disease transmission model and a coalescent process taking place within each host. We demonstrate our method, first using simulated data, and then with sequences taken from the H7N7 avian influenza outbreak that occurred in the Netherlands in 2003. We show that it is superior to established coalescent methods for reconstructing the topology and node heights of the phylogeny and performs well for transmission tree reconstruction when the phylogeny is well-resolved by the genetic data, but caution that this will often not be the case in practice and that existing genetic and epidemiological data should be used to configure such analyses whenever possible. This method is available for use by the research community as part of BEAST, one of the most widely-used packages for reconstruction of dated phylogenies. PMID:26717515

  15. The Emergence of Organizing Structure in Conceptual Representation

    ERIC Educational Resources Information Center

    Lake, Brenden M.; Lawrence, Neil D.; Tenenbaum, Joshua B.

    2018-01-01

    Both scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form--where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach…

  16. A taxonomic and phylogenetic re-appraisal of the genus Curvularia

    USDA-ARS?s Scientific Manuscript database

    Species of Curvularia are important plant and human pathogens worldwide. In this study, the genus Curvularia is re-assessed based on molecular phylogenetic analysis and morphological observations of available isolates and specimens. A multi-gene phylogenetic tree inferred from ITS, TEF and GPDH gene...

  17. Species limits, phylogeography and reproductive mode in the Metarhizium anisopliae complex

    USDA-ARS?s Scientific Manuscript database

    An essential first step toward understanding the ecology and life histories of Metarhizium anisopliae-group species as entomopathogens, endophytes and soil-adapted fungi is the ability to accurately define species limits and confidently infer a species tree. Here we present a multilocus phylogeny of...

  18. Drought tolerance and growth in populations of a wide-ranging tree species indicate climate change risks for the boreal north.

    PubMed

    Montwé, David; Isaac-Renton, Miriam; Hamann, Andreas; Spiecker, Heinrich

    2016-02-01

    Choosing drought-tolerant planting stock in reforestation programs may help adapt forests to climate change. To inform such reforestation strategies, we test lodgepole pine (Pinus contorta Doug. ex Loud. var latifolia Englm.) population response to drought and infer potential benefits of a northward transfer of seeds from drier, southern environments. The objective is addressed by combining dendroecological growth analysis with long-term genetic field trials. Over 500 trees originating from 23 populations across western North America were destructively sampled in three experimental sites in southern British Columbia, representing a climate warming scenario. Growth after 32 years from provenances transferred southward or northward over long distances was significantly lower than growth of local populations. All populations were affected by a severe natural drought event in 2002. The provenances from the most southern locations showed the highest drought tolerance but low productivity. Local provenances were productive and drought tolerant. Provenances from the boreal north showed lower productivity and less drought tolerance on southern test sites than all other sources, implying that maladaptation to drought may prevent boreal populations from taking full advantage of more favorable growing conditions under projected climate change. © 2015 John Wiley & Sons Ltd.

  19. Culture observation and molecular phylogenetic analysis on the blooming green alga Chaetomorpha valida (Cladophorales, Chlorophyta) from China

    NASA Astrophysics Data System (ADS)

    Deng, Yunyan; Tang, Xiaorong; Zhan, Zifeng; Teng, Linhong; Ding, Lanping; Huang, Bingxin

    2013-05-01

    The marine green alga Chaetomorpha valida fouls aquaculture ponds along the coastal cities of Dalian and Rongcheng, China. Unialgal cultures were observed under a microscope to determine the developmental morphological characters of C. valida. Results reveal that gametophytic filaments often produce lateral branches under laboratory culture conditions, suggesting an atypical heteromorphic life cycle of C. valida between unbranched sporophytes and branched gametophytes, which differs from typical isomorphic alternation of Chaetomorpha species. The shape of the basal attachment cell, an important taxonomic character within the genus, was found variable depending on environmental conditions. The 18S rDNA and 28S rDNA regions were used to explore the phylogenetic affinity of the taxa. Inferred trees from 18S rDNA sequences revealed a close relationship between C. valida and Chaetomorpha moniligera. These results would enrich information in general biology and morphological plasticity of C. valida and provided a basis for future identification of green tide forming algae.

  20. Genetic structure and demographic history of the endangered tree species Dysoxylum malabaricum (Meliaceae) in Western Ghats, India: implications for conservation in a biodiversity hotspot.

    PubMed

    Bodare, Sofia; Tsuda, Yoshiaki; Ravikanth, Gudasalamani; Uma Shaanker, Ramanan; Lascoux, Martin

    2013-09-01

    The impact of fragmentation by human activities on genetic diversity of forest trees is an important concern in forest conservation, especially in tropical forests. Dysoxylum malabaricum (white cedar) is an economically important tree species, endemic to the Western Ghats, India, one of the world's eight most important biodiversity hotspots. As D. malabaricum is under pressure of disturbance and fragmentation together with overharvesting, conservation efforts are required in this species. In this study, range-wide genetic structure of twelve D. malabaricum populations was evaluated to assess the impact of human activities on genetic diversity and infer the species' evolutionary history, using both nuclear and chloroplast (cp) DNA simple sequence repeats (SSR). As genetic diversity and population structure did not differ among seedling, juvenile and adult age classes, reproductive success among the old-growth trees and long distance seed dispersal by hornbills were suggested to contribute to maintain genetic diversity. The fixation index (F IS) was significantly correlated with latitude, with a higher level of inbreeding in the northern populations, possibly reflecting a more severe ecosystem disturbance in those populations. Both nuclear and cpSSRs revealed northern and southern genetic groups with some discordance of their distributions; however, they did not correlate with any of the two geographic gaps known as genetic barriers to animals. Approximate Bayesian computation-based inference from nuclear SSRs suggested that population divergence occurred before the last glacial maximum. Finally we discussed the implications of these results, in particular the presence of a clear pattern of historical genetic subdivision, on conservation policies.

  1. Multi-locus phylogenetics, lineage sorting, and reticulation in Pinus subsection Australes.

    PubMed

    Gernandt, David S; Aguirre Dugua, Xitlali; Vázquez-Lobo, Alejandra; Willyard, Ann; Moreno Letelier, Alejandra; Pérez de la Rosa, Jorge A; Piñero, Daniel; Liston, Aaron

    2018-04-23

    Both incomplete lineage sorting and reticulation have been proposed as causes of phylogenetic incongruence. Disentangling these factors may be most difficult in long-lived, wind-pollinated plants with large population sizes and weak reproductive barriers. We used solution hybridization for targeted enrichment and massive parallel sequencing to characterize low-copy-number nuclear genes and high-copy-number plastomes (Hyb-Seq) in 74 individuals of Pinus subsection Australes, a group of ~30 New World pine species of exceptional ecological and economic importance. We inferred relationships using methods that account for both incomplete lineage sorting and reticulation. Concatenation- and coalescent-based trees inferred from nuclear genes mainly agreed with one another, but they contradicted the plastid DNA tree in recovering the Attenuatae (the California closed-cone pines) and Oocarpae (the egg-cone pines of Mexico and Central America) as monophyletic and the Australes sensu stricto (the southern yellow pines) as paraphyletic to the Oocarpae. The plastid tree featured some relationships that were discordant with morphological and geographic evidence and species limits. Incorporating gene flow into the coalescent analyses better fit the data, but evidence supporting the hypothesis that hybridization explains the non-monophyly of the Attenuatae in the plastid tree was equivocal. Our analyses document cytonuclear discordance in Pinus subsection Australes. We attribute this discordance to ancient and recent introgression and present a phylogenetic hypothesis in which mostly hierarchical relationships are overlain by gene flow. © 2018 The Authors. American Journal of Botany is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America.

  2. Bears in a Forest of Gene Trees: Phylogenetic Inference Is Complicated by Incomplete Lineage Sorting and Gene Flow

    PubMed Central

    Kutschera, Verena E.; Bidon, Tobias; Hailer, Frank; Rodi, Julia L.; Fain, Steven R.; Janke, Axel

    2014-01-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. PMID:24903145

  3. Genetic structure and demographic history of the endangered tree species Dysoxylum malabaricum (Meliaceae) in Western Ghats, India: implications for conservation in a biodiversity hotspot

    PubMed Central

    Bodare, Sofia; Tsuda, Yoshiaki; Ravikanth, Gudasalamani; Uma Shaanker, Ramanan; Lascoux, Martin

    2013-01-01

    The impact of fragmentation by human activities on genetic diversity of forest trees is an important concern in forest conservation, especially in tropical forests. Dysoxylum malabaricum (white cedar) is an economically important tree species, endemic to the Western Ghats, India, one of the world's eight most important biodiversity hotspots. As D. malabaricum is under pressure of disturbance and fragmentation together with overharvesting, conservation efforts are required in this species. In this study, range-wide genetic structure of twelve D. malabaricum populations was evaluated to assess the impact of human activities on genetic diversity and infer the species’ evolutionary history, using both nuclear and chloroplast (cp) DNA simple sequence repeats (SSR). As genetic diversity and population structure did not differ among seedling, juvenile and adult age classes, reproductive success among the old-growth trees and long distance seed dispersal by hornbills were suggested to contribute to maintain genetic diversity. The fixation index (FIS) was significantly correlated with latitude, with a higher level of inbreeding in the northern populations, possibly reflecting a more severe ecosystem disturbance in those populations. Both nuclear and cpSSRs revealed northern and southern genetic groups with some discordance of their distributions; however, they did not correlate with any of the two geographic gaps known as genetic barriers to animals. Approximate Bayesian computation-based inference from nuclear SSRs suggested that population divergence occurred before the last glacial maximum. Finally we discussed the implications of these results, in particular the presence of a clear pattern of historical genetic subdivision, on conservation policies. PMID:24223264

  4. [Spectrum Variance Analysis of Tree Leaves Under the Condition of Different Leaf water Content].

    PubMed

    Wu, Jian; Chen, Tai-sheng; Pan, Li-xin

    2015-07-01

    Leaf water content is an important factor affecting tree spectral characteristics. So Exploring the leaf spectral characteristics change rule of the same tree under the condition of different leaf water content and the spectral differences of different tree leaves under the condition of the same leaf water content are not only the keys of hyperspectral vegetation remote sensing information identification but also the theoretical support of research on vegetation spectrum change as the differences in leaf water content. The spectrometer was used to observe six species of tree leaves, and the reflectivity and first order differential spectrum of different leaf water content were obtained. Then, the spectral characteristics of each tree species leaves under the condition of different leaf water content were analyzed, and the spectral differences of different tree species leaves under the condition of the same leaf water content were compared to explore possible bands of the leaf water content identification by hyperspectral remote sensing. Results show that the spectra of each tree leaf have changed a lot with the change of the leaf water content, but the change laws are different. Leaf spectral of different tree species has lager differences in some wavelength range under the condition of same leaf water content, and it provides some possibility for high precision identification of tree species.

  5. Tree analysis modeling of the associations between PHQ-9 depressive symptoms and doctor diagnosis of depression in primary care.

    PubMed

    Chin, Weng-Yee; Wan, Eric Yuk Fai; Dowrick, Christopher; Arroll, Bruce; Lam, Cindy Lo Kuen

    2018-04-26

    The aim of this study was to explore the relationship between patient self-reported Patient Health Questionnaire-9 (PHQ-9) symptoms and doctor diagnosis of depression using a tree analysis approach. This was a secondary analysis on a dataset obtained from 10 179 adult primary care patients and 59 primary care physicians (PCPs) across Hong Kong. Patients completed a waiting room survey collecting data on socio-demographics and the PHQ-9. Blinded doctors documented whether they thought the patient had depression. Data were analyzed using multiple logistic regression and conditional inference decision tree modeling. PCPs diagnosed 594 patients with depression. Logistic regression identified gender, age, employment status, past history of depression, family history of mental illness and recent doctor visit as factors associated with a depression diagnosis. Tree analyses revealed different pathways of association between PHQ-9 symptoms and depression diagnosis for patients with and without past depression. The PHQ-9 symptom model revealed low mood, sense of worthlessness, fatigue, sleep disturbance and functional impairment as early classifiers. The PHQ-9 total score model revealed cut-off scores of >12 and >15 were most frequently associated with depression diagnoses in patients with and without past depression. A past history of depression is the most significant factor associated with the diagnosis of depression. PCPs appear to utilize a hypothetical-deductive problem-solving approach incorporating pre-test probability, with different associated factors for patients with and without past depression. Diagnostic thresholds may be too low for patients with past depression and too high for those without, potentially leading to over and under diagnosis of depression.

  6. Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences.

    PubMed

    Chen, Meng-Yun; Liang, Dan; Zhang, Peng

    2017-08-01

    The interordinal relationships of Laurasiatherian mammals are currently one of the most controversial questions in mammalian phylogenetics. Previous studies mainly relied on coding sequences (CDS) and seldom used noncoding sequences. Here, by data mining public genome data, we compiled an intron data set of 3,638 genes (all introns from a protein-coding gene are considered as a gene) (19,055,073 bp) and a CDS data set of 10,259 genes (20,994,285 bp), covering all major lineages of Laurasiatheria (except Pholidota). We found that the intron data contained stronger and more congruent phylogenetic signals than the CDS data. In agreement with this observation, concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies, whereas the CDS data set produced weakly supported and incongruent results. Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup, but the CDS data produced unstable results under the same conditions. Interestingly, gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical, suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data. Our final result of Laurasiatheria phylogeny is (Eulipotyphla,((Chiroptera, Perissodactyla),(Carnivora, Cetartiodactyla))), favoring a close relationship between Chiroptera and Perissodactyla. Our study 1) provides a well-supported phylogenetic framework for Laurasiatheria, representing a step towards ending the long-standing "hard" polytomy and 2) argues that intron within genome data is a promising data resource for resolving rapid radiation events across the tree of life. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. Growing a Forest for the Trees.

    ERIC Educational Resources Information Center

    Growing Ideas, 2001

    2001-01-01

    Describes a tree studies program in a fourth-grade classroom. Students collected local tree seeds and seeds from supermarket fruits, researched growing conditions, and grew seeds under various conditions. Students kept journals on local trees, observing seed dispersal mechanisms and examining rings on trunk slices. Inquiry-based tree studies…

  8. Trends and uncertainties in budburst projections of Norway spruce in Northern Europe.

    PubMed

    Olsson, Cecilia; Olin, Stefan; Lindström, Johan; Jönsson, Anna Maria

    2017-12-01

    Budburst is regulated by temperature conditions, and a warming climate is associated with earlier budburst. A range of phenology models has been developed to assess climate change effects, and they tend to produce different results. This is mainly caused by different model representations of tree physiology processes, selection of observational data for model parameterization, and selection of climate model data to generate future projections. In this study, we applied (i) Bayesian inference to estimate model parameter values to address uncertainties associated with selection of observational data, (ii) selection of climate model data representative of a larger dataset, and (iii) ensembles modeling over multiple initial conditions, model classes, model parameterizations, and boundary conditions to generate future projections and uncertainty estimates. The ensemble projection indicated that the budburst of Norway spruce in northern Europe will on average take place 10.2 ± 3.7 days earlier in 2051-2080 than in 1971-2000, given climate conditions corresponding to RCP 8.5. Three provenances were assessed separately (one early and two late), and the projections indicated that the relationship among provenance will remain also in a warmer climate. Structurally complex models were more likely to fail predicting budburst for some combinations of site and year than simple models. However, they contributed to the overall picture of current understanding of climate impacts on tree phenology by capturing additional aspects of temperature response, for example, chilling. Model parameterizations based on single sites were more likely to result in model failure than parameterizations based on multiple sites, highlighting that the model parameterization is sensitive to initial conditions and may not perform well under other climate conditions, whether the change is due to a shift in space or over time. By addressing a range of uncertainties, this study showed that ensemble modeling provides a more robust impact assessment than would a single phenology model run.

  9. A tree island approach to inferring phylogeny in the ant subfamily Formicinae, with especial reference to the evolution of weaving.

    PubMed

    Johnson, Rebecca N; Agapow, Paul-Michael; Crozier, Ross H

    2003-11-01

    The ant subfamily Formicinae is a large assemblage (2458 species (J. Nat. Hist. 29 (1995) 1037), including species that weave leaf nests together with larval silk and in which the metapleural gland-the ancestrally defining ant character-has been secondarily lost. We used sequences from two mitochondrial genes (cytochrome b and cytochrome oxidase 2) from 18 formicine and 4 outgroup taxa to derive a robust phylogeny, employing a search for tree islands using 10000 randomly constructed trees as starting points and deriving a maximum likelihood consensus tree from the ML tree and those not significantly different from it. Non-parametric bootstrapping showed that the ML consensus tree fit the data significantly better than three scenarios based on morphology, with that of Bolton (Identification Guide to the Ant Genera of the World, Harvard University Press, Cambridge, MA) being the best among these alternative trees. Trait mapping showed that weaving had arisen at least four times and possibly been lost once. A maximum likelihood analysis showed that loss of the metapleural gland is significantly associated with the weaver life-pattern. The graph of the frequencies with which trees were discovered versus their likelihood indicates that trees with high likelihoods have much larger basins of attraction than those with lower likelihoods. While this result indicates that single searches are more likely to find high- than low-likelihood tree islands, it also indicates that searching only for the single best tree may lose important information.

  10. Inferring Action Structure and Causal Relationships in Continuous Sequences of Human Action

    DTIC Science & Technology

    2014-01-01

    language processing literature (e.g., Brent, 1999; Venkataraman , 2001), and which were also used by Goldwater et al. (2009). Precision (P) is the...trees in oriented linear graphs. Simon Stevin: Wis-en Natuurkundig Tijdschrift, 28 , 203. Venkataraman , A. (2001). A statistical model for word discovery

  11. The mechanisms of temporal inference

    NASA Technical Reports Server (NTRS)

    Fox, B. R.; Green, S. R.

    1987-01-01

    The properties of a temporal language are determined by its constituent elements: the temporal objects which it can represent, the attributes of those objects, the relationships between them, the axioms which define the default relationships, and the rules which define the statements that can be formulated. The methods of inference which can be applied to a temporal language are derived in part from a small number of axioms which define the meaning of equality and order and how those relationships can be propagated. More complex inferences involve detailed analysis of the stated relationships. Perhaps the most challenging area of temporal inference is reasoning over disjunctive temporal constraints. Simple forms of disjunction do not sufficiently increase the expressive power of a language while unrestricted use of disjunction makes the analysis NP-hard. In many cases a set of disjunctive constraints can be converted to disjunctive normal form and familiar methods of inference can be applied to the conjunctive sub-expressions. This process itself is NP-hard but it is made more tractable by careful expansion of a tree-structured search space.

  12. The Past Sure is Tense: On Interpreting Phylogenetic Divergence Time Estimates.

    PubMed

    Brown, Joseph W; Smith, Stephen A

    2018-03-01

    Divergence time estimation-the calibration of a phylogeny to geological time-is an integral first step in modeling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to over-rule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudodata present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modeling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently.

  13. Explaining the dependence of climatic response of tree radial growth on permafrost

    NASA Astrophysics Data System (ADS)

    Bryukhanova, Marina; Benkova, Anna; von Arx, Georg; Fonti, Patrick; Simanko, Valentina; Kirdyanov, Alexander; Shashkin, Alexander

    2015-04-01

    In northern regions of Siberia it is infrequent to have long-term observations of the variability of soil features, phenological data, duration of the growing season, which can be used to infer the influence of the environment on tree growth and productivity. The best way to understand tree-growth and tree responses to environmental changes is to make use of mechanistic models, allowing to combine already available experiment/field data with other parameters based on biological principles of tree growth. The goal of our study is to estimate which tree species (deciduous, conifer deciduous or conifer evergreen) is more plastic under possible climate changes in permafrost zone. The studied object is located in the northern part of central Siberia, Russia (64°N, 100°E). The study plot was selected within a post-fire succession and representatives for 100 years old even aged mixed forest of Larix gmelinii (Rupr.) Rupr. and Betula pubescens Ehrh. with few exemplars of Spruce (Picea obovata Ledeb.). To understand physiological response of larch, birch and spruce trees to climatic changes the ecological-physiological process-based model of tree photosynthesis (Benkova and Shashkin 2003) was applied. Multiparametric tree-ring chronologies were analyzed and correlated with climatic parameters over the last 77 years. This work is supported by the Ministry of Education and Science of the Russian Federation (Grant from the President of RF for Young Scientists MK-1589.2014.4).

  14. jsPhyloSVG: a javascript library for visualizing interactive and vector-based phylogenetic trees on the web.

    PubMed

    Smits, Samuel A; Ouverney, Cleber C

    2010-08-18

    Many software packages have been developed to address the need for generating phylogenetic trees intended for print. With an increased use of the web to disseminate scientific literature, there is a need for phylogenetic trees to be viewable across many types of devices and feature some of the interactive elements that are integral to the browsing experience. We propose a novel approach for publishing interactive phylogenetic trees. We present a javascript library, jsPhyloSVG, which facilitates constructing interactive phylogenetic trees from raw Newick or phyloXML formats directly within the browser in Scalable Vector Graphics (SVG) format. It is designed to work across all major browsers and renders an alternative format for those browsers that do not support SVG. The library provides tools for building rectangular and circular phylograms with integrated charting. Interactive features may be integrated and made to respond to events such as clicks on any element of the tree, including labels. jsPhyloSVG is an open-source solution for rendering dynamic phylogenetic trees. It is capable of generating complex and interactive phylogenetic trees across all major browsers without the need for plugins. It is novel in supporting the ability to interpret the tree inference formats directly, exposing the underlying markup to data-mining services. The library source code, extensive documentation and live examples are freely accessible at www.jsphylosvg.com.

  15. Modeling individual trees in an urban environment using dense discrete return LIDAR

    NASA Astrophysics Data System (ADS)

    Bandyopadhyay, Madhurima; van Aardt, Jan A. N.; van Leeuwen, Martin

    2015-05-01

    The urban forest is becoming increasingly important in the contexts of urban green space, carbon sequestration and offsets, and socio-economic impacts. This has led to a recent increase in attention being paid to urban environmental management. Tree biomass, specifically, is a vital indicator of carbon storage and has a direct impact on urban forest health and carbon sequestration. As an alternative to expensive and time-consuming field surveys, remote sensing has been used extensively in measuring dynamics of vegetation and estimating biomass. Light detection and ranging (LiDAR) has proven especially useful to characterize the three dimensional (3D) structure of forests. In urban contexts however, information is frequently required at the individual tree level, necessitating the proper delineation of tree crowns. Yet, crown delineation is challenging for urban trees where a wide range of stress factors and cultural influences affect growth. In this paper high resolution LiDAR data were used to infer biomass based on individual tree attributes. A multi-tiered delineation algorithm was designed to extract individual tree-crowns. At first, dominant tree segments were obtained by applying watershed segmentation on the crown height model (CHM). Next, prominent tree top positions within each segment were identified via a regional maximum transformation and the crown boundary was estimated for each of the tree tops. Finally, undetected trees were identified using a best-fitting circle approach. After tree delineation, individual tree attributes were used to estimate tree biomass and the results were validated with associated field mensuration data. Results indicate that the overall tree detection accuracy is nearly 80%, and the estimated biomass model has an adjusted-R2 of 0.5.

  16. Study of traffic-related pollutant removal from street canyon with trees: dispersion and deposition perspective.

    PubMed

    Morakinyo, Tobi Eniolu; Lam, Yun Fat

    2016-11-01

    Numerical experiments involving street canyons of varying aspect ratio with traffic-induced pollutants (PM 2.5 ) and implanted trees of varying aspect ratio, leaf area index, leaf area density distribution, trunk height, tree-covered area, and tree planting pattern under different wind conditions were conducted using a computational fluid dynamics (CFD) model, ENVI-met. Various aspects of dispersion and deposition were investigated, which include the influence of various tree configurations and wind condition on dispersion within the street canyon, pollutant mass at the free stream layer and street canyon, and comparison between mass removal by surface (leaf) deposition and mass enhancement due to the presence of trees. Results revealed that concentration level was enhanced especially within pedestrian level in street canyons with trees relative to their tree-free counterparts. Additionally, we found a dependence of the magnitude of concentration increase (within pedestrian level) and decrease (above pedestrian level) due to tree configuration and wind condition. Furthermore, we realized that only ∼0.1-3 % of PM 2.5 was dispersed to the free stream layer while a larger percentage (∼97 %) remained in the canyon, regardless of its aspect ratio, prevailing wind condition, and either tree-free or with tree (of various configuration). Lastly, results indicate that pollutant removal due to deposition on leaf surfaces is potentially sufficient to counterbalance the enhancement of PM 2.5 by such trees under some tree planting scenarios and wind conditions.

  17. Urban Tree Species Show the Same Hydraulic Response to Vapor Pressure Deficit across Varying Tree Size and Environmental Conditions

    PubMed Central

    Chen, Lixin; Zhang, Zhiqiang; Ewers, Brent E.

    2012-01-01

    Background The functional convergence of tree transpiration has rarely been tested for tree species growing under urban conditions even though it is of significance to elucidate the relationship between functional convergence and species differences of urban trees for establishing sustainable urban forests in the context of forest water relations. Methodology/Principal Findings We measured sap flux of four urban tree species including Cedrus deodara, Zelkova schneideriana, Euonymus bungeanus and Metasequoia glyptostroboides in an urban park by using thermal dissipation probes (TDP). The concurrent microclimate conditions and soil moisture content were also measured. Our objectives were to examine 1) the influence of tree species and size on transpiration, and 2) the hydraulic control of urban trees under different environmental conditions over the transpiration in response to VPD as represented by canopy conductance. The results showed that the functional convergence between tree diameter at breast height (DBH) and tree canopy transpiration amount (E c) was not reliable to predict stand transpiration and there were species differences within same DBH class. Species differed in transpiration patterns to seasonal weather progression and soil water stress as a result of varied sensitivity to water availability. Species differences were also found in their potential maximum transpiration rate and reaction to light. However, a same theoretical hydraulic relationship between G c at VPD = 1 kPa (G cref) and the G c sensitivity to VPD (−dG c/dlnVPD) across studied species as well as under contrasting soil water and R s conditions in the urban area. Conclusions/Significance We concluded that urban trees show the same hydraulic regulation over response to VPD across varying tree size and environmental conditions and thus tree transpiration could be predicted with appropriate assessment of G cref. PMID:23118904

  18. Urban tree species show the same hydraulic response to vapor pressure deficit across varying tree size and environmental conditions.

    PubMed

    Chen, Lixin; Zhang, Zhiqiang; Ewers, Brent E

    2012-01-01

    The functional convergence of tree transpiration has rarely been tested for tree species growing under urban conditions even though it is of significance to elucidate the relationship between functional convergence and species differences of urban trees for establishing sustainable urban forests in the context of forest water relations. We measured sap flux of four urban tree species including Cedrus deodara, Zelkova schneideriana, Euonymus bungeanus and Metasequoia glyptostroboides in an urban park by using thermal dissipation probes (TDP). The concurrent microclimate conditions and soil moisture content were also measured. Our objectives were to examine 1) the influence of tree species and size on transpiration, and 2) the hydraulic control of urban trees under different environmental conditions over the transpiration in response to VPD as represented by canopy conductance. The results showed that the functional convergence between tree diameter at breast height (DBH) and tree canopy transpiration amount (E(c)) was not reliable to predict stand transpiration and there were species differences within same DBH class. Species differed in transpiration patterns to seasonal weather progression and soil water stress as a result of varied sensitivity to water availability. Species differences were also found in their potential maximum transpiration rate and reaction to light. However, a same theoretical hydraulic relationship between G(c) at VPD = 1 kPa (G(cref)) and the G(c) sensitivity to VPD (-dG(c)/dlnVPD) across studied species as well as under contrasting soil water and R(s) conditions in the urban area. We concluded that urban trees show the same hydraulic regulation over response to VPD across varying tree size and environmental conditions and thus tree transpiration could be predicted with appropriate assessment of G(cref).

  19. Evidence for a Drought-driven (pre-industrial) Regime Shift in an Australian Shallow Lake

    NASA Astrophysics Data System (ADS)

    Mills, K.; Gell, P.; Doan, P.; Kershaw, P.; McKenzie, M.; Lewis, T.; Tyler, J. J.

    2015-12-01

    We present a 750-year record of ecosystem response to long-term drought history from Lake Colac, Victoria. Using multiple lines of evidence, we test the sensitivity and resilience of Lake Colac to independently reconstructed drought history. The sedimentary archive shows that Lake Colac appears to be sensitive to periods of drought. Following drought conditions c. CE 1390, the lake ecosystem indicates signs of recovery. A succession of droughts in the early 1500s initiates a change in the diatom flora, with freshwater species declining and replaced by saline tolerant species, though there is little interpretable change in aquatic palynomorphs. An inferred drought, around CE 1720 appears to precede a major switch in the lake's ecosystem. The lake became increasingly turbid and saline and there is a distinct switch from a macrophyte-dominated system to an algal-dominated system. The arrival of Europeans in Victoria (CE1840) appears to have little effect on the lake's ecosystem, but the terrestrial vegetation indicates regionally established changes including declines in native trees, especially Casuarina, and arrival and expansion of exotic shade or plantation trees Pinus and Cupressus as well as native and introduced weeds. As European impact in the catchment increases, nutrients appear to play a role in the modification of the lake's ecosystem. A long-term drying trend from c. CE 1975 is evident, culminating in the Millennium Drought, which suggests unprecedented conditions in the ecological history of the Lake.

  20. Comparison of plastid 16S rRNA (rrn16) genes from Helicosporidium spp.: evidence supporting the reclassification of Helicosporidia as green algae (Chlorophyta).

    PubMed

    Tartar, Aurélien; Boucias, Drion G; Becnel, James J; Adams, Byron J

    2003-11-01

    The Helicosporidia are invertebrate pathogens that have recently been identified as non-photosynthetic green algae (Chlorophyta). In order to confirm the algal nature of the genus Helicosporidium, the presence of a retained chloroplast genome in Helicosporidia cells was investigated. Fragments homologous to plastid 16S rRNA (rrn16) genes were amplified successfully from cellular DNA extracted from two different Helicosporidium isolates. The fragment sequences are 1269 and 1266 bp long, are very AT-rich (60.7 %) and are similar to homologous genes sequenced from non-photosynthetic green algae. Maximum-parsimony, maximum-likelihood and neighbour-joining methods were used to infer phylogenetic trees from an rrn16 sequence alignment. All trees depicted the Helicosporidia as sister taxa to the non-photosynthetic, pathogenic alga Prototheca zopfii. Moreover, the trees identified Helicosporidium spp. as members of a clade that included the heterotrophic species Prototheca spp. and the mesotrophic species Chlorella protothecoides. The clade is always strongly supported by bootstrap values, suggesting that all these organisms share a most recent common ancestor. Phylogenetic analyses inferred from plastid 16S rRNA genes confirmed that the Helicosporidia are non-photosynthetic green algae, close relatives of the genus Prototheca (Chlorophyta, Trebouxiophyceae). Such phylogenetic affinities suggest that Helicosporidium spp. are likely to possess Prototheca-like organelles and organelle genomes.

  1. Estimating phylogenetic relationships despite discordant gene trees across loci: the species tree of a diverse species group of feather mites (Acari: Proctophyllodidae).

    PubMed

    Knowles, Lacey L; Klimov, Pavel B

    2011-11-01

    With the increased availability of multilocus sequence data, the lack of concordance of gene trees estimated for independent loci has focused attention on both the biological processes producing the discord and the methodologies used to estimate phylogenetic relationships. What has emerged is a suite of new analytical tools for phylogenetic inference--species tree approaches. In contrast to traditional phylogenetic methods that are stymied by the idiosyncrasies of gene trees, approaches for estimating species trees explicitly take into account the cause of discord among loci and, in the process, provides a direct estimate of phylogenetic history (i.e. the history of species divergence, not divergence of specific loci). We illustrate the utility of species tree estimates with an analysis of a diverse group of feather mites, the pinnatus species group (genus Proctophyllodes). Discord among four sequenced nuclear loci is consistent with theoretical expectations, given the short time separating speciation events (as evident by short internodes relative to terminal branch lengths in the trees). Nevertheless, many of the relationships are well resolved in a Bayesian estimate of the species tree; the analysis also highlights ambiguous aspects of the phylogeny that require additional loci. The broad utility of species tree approaches is discussed, and specifically, their application to groups with high speciation rates--a history of diversification with particular prevalence in host/parasite systems where species interactions can drive rapid diversification.

  2. Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti.

    PubMed

    Copetti, Dario; Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L M; Childs, Kevin L; Eguiarte, Luis E; Lee, Seunghee; Liu, Tiffany L; McMahon, Michelle M; Whiteman, Noah K; Wing, Rod A; Wojciechowski, Martin F; Sanderson, Michael J

    2017-11-07

    Few clades of plants have proven as difficult to classify as cacti. One explanation may be an unusually high level of convergent and parallel evolution (homoplasy). To evaluate support for this phylogenetic hypothesis at the molecular level, we sequenced the genomes of four cacti in the especially problematic tribe Pachycereeae, which contains most of the large columnar cacti of Mexico and adjacent areas, including the iconic saguaro cactus ( Carnegiea gigantea ) of the Sonoran Desert. We assembled a high-coverage draft genome for saguaro and lower coverage genomes for three other genera of tribe Pachycereeae ( Pachycereus , Lophocereus , and Stenocereus ) and a more distant outgroup cactus, Pereskia We used these to construct 4,436 orthologous gene alignments. Species tree inference consistently returned the same phylogeny, but gene tree discordance was high: 37% of gene trees having at least 90% bootstrap support conflicted with the species tree. Evidently, discordance is a product of long generation times and moderately large effective population sizes, leading to extensive incomplete lineage sorting (ILS). In the best supported gene trees, 58% of apparent homoplasy at amino sites in the species tree is due to gene tree-species tree discordance rather than parallel substitutions in the gene trees themselves, a phenomenon termed "hemiplasy." The high rate of genomic hemiplasy may contribute to apparent parallelisms in phenotypic traits, which could confound understanding of species relationships and character evolution in cacti. Published under the PNAS license.

  3. Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti

    PubMed Central

    Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L. M.; Childs, Kevin L.; Eguiarte, Luis E.; Lee, Seunghee; Liu, Tiffany L.; McMahon, Michelle M.; Whiteman, Noah K.; Wing, Rod A.; Wojciechowski, Martin F.; Sanderson, Michael J.

    2017-01-01

    Few clades of plants have proven as difficult to classify as cacti. One explanation may be an unusually high level of convergent and parallel evolution (homoplasy). To evaluate support for this phylogenetic hypothesis at the molecular level, we sequenced the genomes of four cacti in the especially problematic tribe Pachycereeae, which contains most of the large columnar cacti of Mexico and adjacent areas, including the iconic saguaro cactus (Carnegiea gigantea) of the Sonoran Desert. We assembled a high-coverage draft genome for saguaro and lower coverage genomes for three other genera of tribe Pachycereeae (Pachycereus, Lophocereus, and Stenocereus) and a more distant outgroup cactus, Pereskia. We used these to construct 4,436 orthologous gene alignments. Species tree inference consistently returned the same phylogeny, but gene tree discordance was high: 37% of gene trees having at least 90% bootstrap support conflicted with the species tree. Evidently, discordance is a product of long generation times and moderately large effective population sizes, leading to extensive incomplete lineage sorting (ILS). In the best supported gene trees, 58% of apparent homoplasy at amino sites in the species tree is due to gene tree-species tree discordance rather than parallel substitutions in the gene trees themselves, a phenomenon termed “hemiplasy.” The high rate of genomic hemiplasy may contribute to apparent parallelisms in phenotypic traits, which could confound understanding of species relationships and character evolution in cacti. PMID:29078296

  4. Majority rule has transition ratio 4 on Yule trees under a 2-state symmetric model.

    PubMed

    Mossel, Elchanan; Steel, Mike

    2014-11-07

    Inferring the ancestral state at the root of a phylogenetic tree from states observed at the leaves is a problem arising in evolutionary biology. The simplest technique - majority rule - estimates the root state by the most frequently occurring state at the leaves. Alternative methods - such as maximum parsimony - explicitly take the tree structure into account. Since either method can outperform the other on particular trees, it is useful to consider the accuracy of the methods on trees generated under some evolutionary null model, such as a Yule pure-birth model. In this short note, we answer a recently posed question concerning the performance of majority rule on Yule trees under a symmetric 2-state Markovian substitution model of character state change. We show that majority rule is accurate precisely when the ratio of the birth (speciation) rate of the Yule process to the substitution rate exceeds the value 4. By contrast, maximum parsimony has been shown to be accurate only when this ratio is at least 6. Our proof relies on a second moment calculation, coupling, and a novel application of a reflection principle. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. DLRS: gene tree evolution in light of a species tree.

    PubMed

    Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens

    2012-11-15

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).

  6. Multiple species of wild tree peonies gave rise to the ‘king of flowers’, Paeonia suffruticosa Andrews

    PubMed Central

    Zhou, Shi-Liang; Zou, Xin-Hui; Zhou, Zhi-Qin; Liu, Jing; Xu, Chao; Yu, Jing; Wang, Qiang; Zhang, Da-Ming; Wang, Xiao-Quan; Ge, Song; Sang, Tao; Pan, Kai-Yu; Hong, De-Yuan

    2014-01-01

    The origin of cultivated tree peonies, known as the ‘king of flowers' in China for more than 1000 years, has attracted considerable interest, but remained unsolved. Here, we conducted phylogenetic analyses of explicitly sampled traditional cultivars of tree peonies and all wild species from the shrubby section Moutan of the genus Paeonia based on sequences of 14 fast-evolved chloroplast regions and 25 presumably single-copy nuclear markers identified from RNA-seq data. The phylogeny of the wild species inferred from the nuclear markers was fully resolved and largely congruent with morphology and classification. The incongruence between the nuclear and chloroplast trees suggested that there had been gene flow between the wild species. The comparison of nuclear and chloroplast phylogenies including cultivars showed that the cultivated tree peonies originated from homoploid hybridization among five wild species. Since the origin, thousands of cultivated varieties have spread worldwide, whereas four parental species are currently endangered or on the verge of extinction. The documentation of extensive homoploid hybridization involved in tree peony domestication provides new insights into the mechanisms underlying the origins of garden ornamentals and the way of preserving natural genetic resources through domestication. PMID:25377453

  7. Knowledge, expectations, and inductive reasoning within conceptual hierarchies.

    PubMed

    Coley, John D; Hayes, Brett; Lawson, Christopher; Moloney, Michelle

    2004-01-01

    Previous research (e.g. Cognition 64 (1997) 73) suggests that the privileged level for inductive inference in a folk biological conceptual hierarchy does not correspond to the "basic" level (i.e. the level at which concepts are both informative and distinct). To further explore inductive inference within conceptual hierarchies, we examine relations between knowledge of concepts at different hierarchical levels, expectations about conceptual coherence, and inductive inference. In Experiments 1 and 2, 5- and 8-year-olds and adults listed features of living kind (Experiments 1 and 2) and artifact (Experiment 2) concepts at different hierarchical levels (e.g. plant, tree, oak, desert oak), and also rated the strength of generalizations to the same concepts. For living kinds, the level that showed a relative advantage on these two tasks differed; the greatest increase in features listed tended to occur at the life-form level (e.g. tree), whereas the greatest increase in inductive strength tended to occur at the folk-generic level (e.g. oak). Knowledge and induction also showed different developmental trajectories. For artifact concepts, the levels at which the greatest gains in knowledge and induction occurred were more varied, and corresponded more closely across tasks. In Experiment 3, adults reported beliefs about within-category similarity for concepts at different levels of animal, plant and artifact hierarchies, and rated inductive strength as before. For living kind concepts, expectations about category coherence predicted patterns of inductions; knowledge did not. For artifact concepts, both knowledge and expectations predicted patterns of induction. Results suggest that beliefs about conceptual coherence play an important role in guiding inductive inference, that this role may be largely independent of specific knowledge of concepts, and that such beliefs are especially important in reasoning about living kinds.

  8. Emerging Concepts of Data Integration in Pathogen Phylodynamics.

    PubMed

    Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.

  9. Emerging Concepts of Data Integration in Pathogen Phylodynamics

    PubMed Central

    Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

    2017-01-01

    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504

  10. Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width

    PubMed Central

    De Sa, Christopher; Zhang, Ce; Olukotun, Kunle; Ré, Christopher

    2016-01-01

    Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results. Theoretical guarantees for its performance are weak: even for tree structured graphs, the mixing time of Gibbs may be exponential in the number of variables. To help understand the behavior of Gibbs sampling, we introduce a new (hyper)graph property, called hierarchy width. We show that under suitable conditions on the weights, bounded hierarchy width ensures polynomial mixing time. Our study of hierarchy width is in part motivated by a class of factor graph templates, hierarchical templates, which have bounded hierarchy width—regardless of the data used to instantiate them. We demonstrate a rich application from natural language processing in which Gibbs sampling provably mixes rapidly and achieves accuracy that exceeds human volunteers. PMID:27279724

  11. Efficient Exploration of the Space of Reconciled Gene Trees

    PubMed Central

    Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent

    2013-01-01

    Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git. [amalgamation; gene tree reconciliation; gene tree reconstruction; lateral gene transfer; phylogeny.] PMID:23925510

  12. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature

    PubMed Central

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838

  13. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

    PubMed

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.

  14. 7 CFR 319.77-4 - Conditions for the importation of regulated articles.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... Host Material from Canada § 319.77-4 Conditions for the importation of regulated articles. (a) Trees and shrubs. 1 (1) Trees without roots (e.g., Christmas trees), trees with roots, and shrubs with roots... restriction under this subpart if they: 1 Trees and shrubs from Canada may be subject to additional...

  15. 7 CFR 319.77-4 - Conditions for the importation of regulated articles.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... Host Material from Canada § 319.77-4 Conditions for the importation of regulated articles. (a) Trees and shrubs. 1 (1) Trees without roots (e.g., Christmas trees), trees with roots, and shrubs with roots... restriction under this subpart if they: 1 Trees and shrubs from Canada may be subject to additional...

  16. 7 CFR 319.77-4 - Conditions for the importation of regulated articles.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Host Material from Canada § 319.77-4 Conditions for the importation of regulated articles. (a) Trees and shrubs. 1 (1) Trees without roots (e.g., Christmas trees), trees with roots, and shrubs with roots... restriction under this subpart if they: 1 Trees and shrubs from Canada may be subject to additional...

  17. L.U.St: a tool for approximated maximum likelihood supertree reconstruction.

    PubMed

    Akanni, Wasiu A; Creevey, Christopher J; Wilkinson, Mark; Pisani, Davide

    2014-06-12

    Supertrees combine disparate, partially overlapping trees to generate a synthesis that provides a high level perspective that cannot be attained from the inspection of individual phylogenies. Supertrees can be seen as meta-analytical tools that can be used to make inferences based on results of previous scientific studies. Their meta-analytical application has increased in popularity since it was realised that the power of statistical tests for the study of evolutionary trends critically depends on the use of taxon-dense phylogenies. Further to that, supertrees have found applications in phylogenomics where they are used to combine gene trees and recover species phylogenies based on genome-scale data sets. Here, we present the L.U.St package, a python tool for approximate maximum likelihood supertree inference and illustrate its application using a genomic data set for the placental mammals. L.U.St allows the calculation of the approximate likelihood of a supertree, given a set of input trees, performs heuristic searches to look for the supertree of highest likelihood, and performs statistical tests of two or more supertrees. To this end, L.U.St implements a winning sites test allowing ranking of a collection of a-priori selected hypotheses, given as a collection of input supertree topologies. It also outputs a file of input-tree-wise likelihood scores that can be used as input to CONSEL for calculation of standard tests of two trees (e.g. Kishino-Hasegawa, Shimidoara-Hasegawa and Approximately Unbiased tests). This is the first fully parametric implementation of a supertree method, it has clearly understood properties, and provides several advantages over currently available supertree approaches. It is easy to implement and works on any platform that has python installed. bitBucket page - https://afro-juju@bitbucket.org/afro-juju/l.u.st.git. Davide.Pisani@bristol.ac.uk.

  18. A Nonstationary Markov Model Detects Directional Evolution in Hymenopteran Morphology.

    PubMed

    Klopfstein, Seraina; Vilhelmsen, Lars; Ronquist, Fredrik

    2015-11-01

    Directional evolution has played an important role in shaping the morphological, ecological, and molecular diversity of life. However, standard substitution models assume stationarity of the evolutionary process over the time scale examined, thus impeding the study of directionality. Here we explore a simple, nonstationary model of evolution for discrete data, which assumes that the state frequencies at the root differ from the equilibrium frequencies of the homogeneous evolutionary process along the rest of the tree (i.e., the process is nonstationary, nonreversible, but homogeneous). Within this framework, we develop a Bayesian approach for testing directional versus stationary evolution using a reversible-jump algorithm. Simulations show that when only data from extant taxa are available, the success in inferring directionality is strongly dependent on the evolutionary rate, the shape of the tree, the relative branch lengths, and the number of taxa. Given suitable evolutionary rates (0.1-0.5 expected substitutions between root and tips), accounting for directionality improves tree inference and often allows correct rooting of the tree without the use of an outgroup. As an empirical test, we apply our method to study directional evolution in hymenopteran morphology. We focus on three character systems: wing veins, muscles, and sclerites. We find strong support for a trend toward loss of wing veins and muscles, while stationarity cannot be ruled out for sclerites. Adding fossil and time information in a total-evidence dating approach, we show that accounting for directionality results in more precise estimates not only of the ancestral state at the root of the tree, but also of the divergence times. Our model relaxes the assumption of stationarity and reversibility by adding a minimum of additional parameters, and is thus well suited to studying the nature of the evolutionary process in data sets of limited size, such as morphology and ecology. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  19. Coseismic fault slip associated with the 1992 M(sub w) 6.1 Joshua Tree, California, earthquake: Implications for the Joshua Tree-Landers earthquake sequence

    NASA Technical Reports Server (NTRS)

    Bennett, Richard A.; Reilinger, Robert E.; Rodi, William; Li, Yingping; Toksoz, M. Nafi; Hudnut, Ken

    1995-01-01

    Coseismic surface deformation associated with the M(sub w) 6.1, April 23, 1992, Joshua Tree earthquake is well represented by estimates of geodetic monument displacements at 20 locations independently derived from Global Positioning System and trilateration measurements. The rms signal to noise ratio for these inferred displacements is 1.8 with near-fault displacement estimates exceeding 40 mm. In order to determine the long-wavelength distribution of slip over the plane of rupture, a Tikhonov regularization operator is applied to these estimates which minimizes stress variability subject to purely right-lateral slip and zero surface slip constraints. The resulting slip distribution yields a geodetic moment estimate of 1.7 x 10(exp 18) N m with corresponding maximum slip around 0.8 m and compares well with independent and complementary information including seismic moment and source time function estimates and main shock and aftershock locations. From empirical Green's functions analyses, a rupture duration of 5 s is obtained which implies a rupture radius of 6-8 km. Most of the inferred slip lies to the north of the hypocenter, consistent with northward rupture propagation. Stress drop estimates are in the range of 2-4 MPa. In addition, predicted Coulomb stress increases correlate remarkably well with the distribution of aftershock hypocenters; most of the aftershocks occur in areas for which the mainshock rupture produced stress increases larger than about 0.1 MPa. In contrast, predicted stress changes are near zero at the hypocenter of the M(sub w) 7.3, June 28, 1992, Landers earthquake which nucleated about 20 km beyond the northernmost edge of the Joshua Tree rupture. Based on aftershock migrations and the predicted static stress field, we speculate that redistribution of Joshua Tree-induced stress perturbations played a role in the spatio-temporal development of the earth sequence culminating in the Landers event.

  20. The estimation of tree posterior probabilities using conditional clade probability distributions.

    PubMed

    Larget, Bret

    2013-07-01

    In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample.

  1. Exploring Tree Age & Diameter to Illustrate Sample Design & Inference in Observational Ecology

    ERIC Educational Resources Information Center

    Casady, Grant M.

    2015-01-01

    Undergraduate biology labs often explore the techniques of data collection but neglect the statistical framework necessary to express findings. Students can be confused about how to use their statistical knowledge to address specific biological questions. Growth in the area of observational ecology requires that students gain experience in…

  2. Andean snowpack since AD 1150 inferred from rainfall, tree-ring and documentary records

    USDA-ARS?s Scientific Manuscript database

    The Andean snowpack is the main source of freshwater and arguably the single most important natural resource for the populated, semi-arid regions of central Chile and central-western Argentina. However, apart from recent analyses of instrumental snowpack data, very little is known about the long ter...

  3. LATERAL ROOT DISTRIBUTION OF TREES IN AN OLD-GROWTH DOUGLAS-FIR FOREST INFERRED FROM UPTAKE OF TRACER 15N

    EPA Science Inventory

    Belowground competition for nutrients and water is considered a key factor affecting spatial organization and productivity of individual stems within forest stands, yet there are almost no data describing the lateral extent and overlap of competing root systems. We quantified th...

  4. Finding a common path: predicting gene function using inferred evolutionary trees.

    PubMed

    Reynolds, Kimberly A

    2014-07-14

    Reporting in Cell, Li and colleagues (2014) describe an innovative method to functionally classify genes using evolutionary information. This approach demonstrates broad utility for eukaryotic gene annotation and suggests an intriguing new decomposition of pathways and complexes into evolutionarily conserved modules. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Soils

    Treesearch

    John R. Jones; Norbert V. DeByle

    1985-01-01

    Edaphic and climatic characteristics of a site quite well define the quality of that site for plant growth. The importance of soil characteristics to the growth and well-being of aspen in the West is apparent from observations by many authors, from inferences resulting from work with other trees and agricultural crops, and from detailed study of aspen soils and site...

  6. A likelihood-based time series modeling approach for application in dendrochronology to examine the growth-climate relations and forest disturbance history

    EPA Science Inventory

    A time series intervention analysis (TSIA) of dendrochronological data to infer the tree growth-climate-disturbance relations and forest disturbance history is described. Maximum likelihood is used to estimate the parameters of a structural time series model with components for ...

  7. Tree Morphologic Plasticity Explains Deviation from Metabolic Scaling Theory in Semi-Arid Conifer Forests, Southwestern USA

    PubMed Central

    O’Connor, Christopher D.; Lynch, Ann M.

    2016-01-01

    A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST. PMID:27391084

  8. Tree Morphologic Plasticity Explains Deviation from Metabolic Scaling Theory in Semi-Arid Conifer Forests, Southwestern USA.

    PubMed

    Swetnam, Tyson L; O'Connor, Christopher D; Lynch, Ann M

    2016-01-01

    A significant concern about Metabolic Scaling Theory (MST) in real forests relates to consistent differences between the values of power law scaling exponents of tree primary size measures used to estimate mass and those predicted by MST. Here we consider why observed scaling exponents for diameter and height relationships deviate from MST predictions across three semi-arid conifer forests in relation to: (1) tree condition and physical form, (2) the level of inter-tree competition (e.g. open vs closed stand structure), (3) increasing tree age, and (4) differences in site productivity. Scaling exponent values derived from non-linear least-squares regression for trees in excellent condition (n = 381) were above the MST prediction at the 95% confidence level, while the exponent for trees in good condition were no different than MST (n = 926). Trees that were in fair or poor condition, characterized as diseased, leaning, or sparsely crowned had exponent values below MST predictions (n = 2,058), as did recently dead standing trees (n = 375). Exponent value of the mean-tree model that disregarded tree condition (n = 3,740) was consistent with other studies that reject MST scaling. Ostensibly, as stand density and competition increase trees exhibited greater morphological plasticity whereby the majority had characteristically fair or poor growth forms. Fitting by least-squares regression biases the mean-tree model scaling exponent toward values that are below MST idealized predictions. For 368 trees from Arizona with known establishment dates, increasing age had no significant impact on expected scaling. We further suggest height to diameter ratios below MST relate to vertical truncation caused by limitation in plant water availability. Even with environmentally imposed height limitation, proportionality between height and diameter scaling exponents were consistent with the predictions of MST.

  9. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.

    PubMed Central

    Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu

    2002-01-01

    Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences. PMID:12136032

  10. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data.

    PubMed

    Drummond, Alexei J; Nicholls, Geoff K; Rodrigo, Allen G; Solomon, Wiremu

    2002-07-01

    Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.

  11. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.

    PubMed

    Xi, Zhenxiang; Liu, Liang; Davis, Charles C

    2015-11-01

    The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014). Copyright © 2015 Elsevier Inc. All rights reserved.

  12. STBase: one million species trees for comparative biology.

    PubMed

    McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.

  13. A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction

    PubMed Central

    De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David

    2016-01-01

    Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. PMID:25281847

  14. Thresholds for boreal biome transitions.

    PubMed

    Scheffer, Marten; Hirota, Marina; Holmgren, Milena; Van Nes, Egbert H; Chapin, F Stuart

    2012-12-26

    Although the boreal region is warming twice as fast as the global average, the way in which the vast boreal forests and tundras may respond is poorly understood. Using satellite data, we reveal marked alternative modes in the frequency distributions of boreal tree cover. At the northern end and at the dry continental southern extremes, treeless tundra and steppe, respectively, are the only possible states. However, over a broad intermediate temperature range, these treeless states coexist with boreal forest (∼75% tree cover) and with two more open woodland states (∼20% and ∼45% tree cover). Intermediate tree covers (e.g., ∼10%, ∼30%, and ∼60% tree cover) between these distinct states are relatively rare, suggesting that they may represent unstable states where the system dwells only transiently. Mechanisms for such instabilities remain to be unraveled, but our results have important implications for the anticipated response of these ecosystems to climatic change. The data reveal that boreal forest shows no gradual decline in tree cover toward its limits. Instead, our analysis suggests that it becomes less resilient in the sense that it may more easily shift into a sparse woodland or treeless state. Similarly, the relative scarcity of the intermediate ∼10% tree cover suggests that tundra may shift relatively abruptly to a more abundant tree cover. If our inferences are correct, climate change may invoke massive nonlinear shifts in boreal biomes.

  15. iGLASS: An Improvement to the GLASS Method for Estimating Species Trees from Gene Trees

    PubMed Central

    Rosenberg, Noah A.

    2012-01-01

    Abstract Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree. PMID:22216756

  16. Novel information theory-based measures for quantifying incongruence among phylogenetic trees.

    PubMed

    Salichos, Leonidas; Stamatakis, Alexandros; Rokas, Antonis

    2014-05-01

    Phylogenies inferred from different data matrices often conflict with each other necessitating the development of measures that quantify this incongruence. Here, we introduce novel measures that use information theory to quantify the degree of conflict or incongruence among all nontrivial bipartitions present in a set of trees. The first measure, internode certainty (IC), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode (internal branch) in a given set of trees jointly with that of the most prevalent conflicting bipartition in the same tree set. The second measure, IC All (ICA), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode in a given set of trees in conjunction with that of all conflicting bipartitions in the same underlying tree set. Finally, the tree certainty (TC) and TC All (TCA) measures are the sum of IC and ICA values across all internodes of a phylogeny, respectively. IC, ICA, TC, and TCA can be calculated from different types of data that contain nontrivial bipartitions, including from bootstrap replicate trees to gene trees or individual characters. Given a set of phylogenetic trees, the IC and ICA values of a given internode reflect its specific degree of incongruence, and the TC and TCA values describe the global degree of incongruence between trees in the set. All four measures are implemented and freely available in version 8.0.0 and subsequent versions of the widely used program RAxML.

  17. Multispecies coalescent analysis of the early diversification of neotropical primates: phylogenetic inference under strong gene trees/species tree conflict.

    PubMed

    Schrago, Carlos G; Menezes, Albert N; Furtado, Carolina; Bonvicino, Cibele R; Seuanez, Hector N

    2014-11-05

    Neotropical primates (NP) are presently distributed in the New World from Mexico to northern Argentina, comprising three large families, Cebidae, Atelidae, and Pitheciidae, consequently to their diversification following their separation from Old World anthropoids near the Eocene/Oligocene boundary, some 40 Ma. The evolution of NP has been intensively investigated in the last decade by studies focusing on their phylogeny and timescale. However, despite major efforts, the phylogenetic relationship between these three major clades and the age of their last common ancestor are still controversial because these inferences were based on limited numbers of loci and dating analyses that did not consider the evolutionary variation associated with the distribution of gene trees within the proposed phylogenies. We show, by multispecies coalescent analyses of selected genome segments, spanning along 92,496,904 bp that the early diversification of extant NP was marked by a 2-fold increase of their effective population size and that Atelids and Cebids are more closely related respective to Pitheciids. The molecular phylogeny of NP has been difficult to solve because of population-level phenomena at the early evolution of the lineage. The association of evolutionary variation with the distribution of gene trees within proposed phylogenies is crucial for distinguishing the mean genetic divergence between species (the mean coalescent time between loci) from speciation time. This approach, based on extensive genomic data provided by new generation DNA sequencing, provides more accurate reconstructions of phylogenies and timescales for all organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk

    2014-10-09

    To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less

  19. Bears in a forest of gene trees: phylogenetic inference is complicated by incomplete lineage sorting and gene flow.

    PubMed

    Kutschera, Verena E; Bidon, Tobias; Hailer, Frank; Rodi, Julia L; Fain, Steven R; Janke, Axel

    2014-08-01

    Ursine bears are a mammalian subfamily that comprises six morphologically and ecologically distinct extant species. Previous phylogenetic analyses of concatenated nuclear genes could not resolve all relationships among bears, and appeared to conflict with the mitochondrial phylogeny. Evolutionary processes such as incomplete lineage sorting and introgression can cause gene tree discordance and complicate phylogenetic inferences, but are not accounted for in phylogenetic analyses of concatenated data. We generated a high-resolution data set of autosomal introns from several individuals per species and of Y-chromosomal markers. Incorporating intraspecific variability in coalescence-based phylogenetic and gene flow estimation approaches, we traced the genealogical history of individual alleles. Considerable heterogeneity among nuclear loci and discordance between nuclear and mitochondrial phylogenies were found. A species tree with divergence time estimates indicated that ursine bears diversified within less than 2 My. Consistent with a complex branching order within a clade of Asian bear species, we identified unidirectional gene flow from Asian black into sloth bears. Moreover, gene flow detected from brown into American black bears can explain the conflicting placement of the American black bear in mitochondrial and nuclear phylogenies. These results highlight that both incomplete lineage sorting and introgression are prominent evolutionary forces even on time scales up to several million years. Complex evolutionary patterns are not adequately captured by strictly bifurcating models, and can only be fully understood when analyzing multiple independently inherited loci in a coalescence framework. Phylogenetic incongruence among gene trees hence needs to be recognized as a biologically meaningful signal. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Towards resolving Lamiales relationships: insights from rapidly evolving chloroplast sequences

    PubMed Central

    2010-01-01

    Background In the large angiosperm order Lamiales, a diverse array of highly specialized life strategies such as carnivory, parasitism, epiphytism, and desiccation tolerance occur, and some lineages possess drastically accelerated DNA substitutional rates or miniaturized genomes. However, understanding the evolution of these phenomena in the order, and clarifying borders of and relationships among lamialean families, has been hindered by largely unresolved trees in the past. Results Our analysis of the rapidly evolving trnK/matK, trnL-F and rps16 chloroplast regions enabled us to infer more precise phylogenetic hypotheses for the Lamiales. Relationships among the nine first-branching families in the Lamiales tree are now resolved with very strong support. Subsequent to Plocospermataceae, a clade consisting of Carlemanniaceae plus Oleaceae branches, followed by Tetrachondraceae and a newly inferred clade composed of Gesneriaceae plus Calceolariaceae, which is also supported by morphological characters. Plantaginaceae (incl. Gratioleae) and Scrophulariaceae are well separated in the backbone grade; Lamiaceae and Verbenaceae appear in distant clades, while the recently described Linderniaceae are confirmed to be monophyletic and in an isolated position. Conclusions Confidence about deep nodes of the Lamiales tree is an important step towards understanding the evolutionary diversification of a major clade of flowering plants. The degree of resolution obtained here now provides a first opportunity to discuss the evolution of morphological and biochemical traits in Lamiales. The multiple independent evolution of the carnivorous syndrome, once in Lentibulariaceae and a second time in Byblidaceae, is strongly supported by all analyses and topological tests. The evolution of selected morphological characters such as flower symmetry is discussed. The addition of further sequence data from introns and spacers holds promise to eventually obtain a fully resolved plastid tree of Lamiales. PMID:21073690

  1. Long-distance seed and pollen dispersal inferred from spatial genetic structure in the very low-density rainforest tree, Baillonella toxisperma Pierre, in Central Africa.

    PubMed

    Ndiade-Bourobou, D; Hardy, O J; Favreau, B; Moussavou, H; Nzengue, E; Mignot, A; Bouvet, J-M

    2010-11-01

    We analysed the spatial distribution of genetic diversity to infer gene flow for Baillonella toxisperma Pierre (Moabi), a threatened entomophilous pollinated and animal-dispersed Central African tree, with typically low density (5-7 adults trees/km(2)). Fifteen nuclear and three universal chloroplast microsatellites markers were used to type 247 individuals localized in three contiguous areas with differing past logging intensity. These three areas were within a natural forest block of approximately 2886 km(2) in Gabon. Expected heterozygosity and chloroplast diversity were He(nuc) = 0.570 and H(cp) = 0.761, respectively. F(IS) was only significant in one area (F(IS) = 0.076, P < 0.01) and could be attributed to selfing. For nuclear loci, Bayesian clustering did not detect discrete gene pools within and between the three areas and global differentiation (F(STnuc) = 0.007, P > 0.05) was not significant, suggesting that they are one population. At the level of the whole forest, both nuclear and chloroplast markers revealed a weak correlation between genetic relatedness and spatial distance between individuals: Sp(nuc) = 0.003 and Sp(cp) = 0.015, respectively. The extent of gene flow (σ) was partitioned into global gene flow (σ(g)) from 6.6 to 9.9 km, seed dispersal (σ(s)) from 4.0 to 6.3 km and pollen dispersal (σ(p)) from 9.8 to 10.8 km. These uncommonly high dispersal distances indicate that low-density canopy trees in African rainforests could be connected by extensive gene flow, although, given the current threats facing many seed disperser species in Central Africa, this may no longer be the case. © 2010 Blackwell Publishing Ltd.

  2. Perfluoropolyalkylether Oil Degradation: Inference of FeF3 Formation on Steel Surfaces under Boundary Conditions

    DTIC Science & Technology

    1985-08-01

    REPORT SD-TR-85-37 O,-) Lfl Perfluoropolyalkylether Oil Degradation: Inference of FeF 3 Formation on Steel Surfaces I under Boundary Conditions DAVID...S. TYPE OF REPORT & PERIOD COVERED PERFLUOROPOLYALKYLETHER OIL DEGRADATION: INFERENCE OF FeF3 FORMATION ON STEELSURFACES UNDER BOUNDARY CONDITIONS 0...number) Boundary conditions Oil Degradation Perfluoropolyalkylether FeF3 Wear test Lubrication .... 440C 20. ABSTRACT (Contlnue o 0 ,systes sI . I

  3. How low can you go? Assessing minimum concentrations of NSC in carbon limited tree saplings

    NASA Astrophysics Data System (ADS)

    Hoch, Guenter; Hartmann, Henrik; Schwendener, Andrea

    2016-04-01

    Tissue concentrations of non-structural carbohydrates (NSC) are frequently used to determine the carbon balance of plants. Over the last years, an increasing number of studies have inferred carbon starvation in trees under environmental stress like drought from low tissue NSC concentrations. However, such inferences are limited by the fact that minimum concentrations of NSC required for survival are not known. So far, it was hypothesized that even under lethal carbon starvation, starch and low molecular sugar concentrations cannot be completely depleted and that minimum NSC concentrations at death vary across tissues and species. Here we present results of an experiment that aimed to determine minimum NSC concentrations in different tissues of saplings of two broad-leaved tree species (Acer pseudoplatanus and Quercus petratea) exposed to lethal carbon starvation via continuous darkening. In addition, we investigated recovery rates of NSC concentrations in saplings that had been darkened for different periods of time and were then re-exposed to light. Both species survived continuous darkening for about 12 weeks (confirmed by testing the ability to re-sprout after darkness). In all investigated tissues, starch concentrations declined close to zero within three to six weeks of darkness. Low molecular sugars also decreased strongly within the first weeks of darkness, but seemed to stabilize at low concentrations of 0.5 to 2 % dry matter (depending on tissue and species) almost until death. NSC concentrations recovered surprisingly fast in saplings that were re-exposed to light. After 3 weeks of continuous darkness, tissue NSC concentrations recovered within 6 weeks to levels of unshaded control saplings in all tissues and in both species. To our knowledge, this study represents the first experimental attempt to quantify minimum tissue NSC concentrations at lethal carbon starvation. Most importantly, our results suggest that carbon-starved tree saplings are able to survive several weeks without starch reserves and with extremely low sugar concentrations in all organs. Although it remains to be tested whether our findings are also valid for mature trees, these results show that NSC pools in trees are very sensitive to carbon limitation and that lethal carbon starvation is preceded by a significant (almost complete) depletion of starch and sugars in all tree organs.

  4. More than one kind of inference: re-examining what's learned in feature inference and classification.

    PubMed

    Sweller, Naomi; Hayes, Brett K

    2010-08-01

    Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.

  5. Multilocus Phylogeography and Species Delimitation in the Cumberland Plateau Salamander, Plethodon kentucki: Incongruence among Data Sets and Methods

    PubMed Central

    Kuchta, Shawn R.; Brown, Ashley D.; Converse, Paul E.; Highton, Richard

    2016-01-01

    Species are a fundamental unit of biodiversity, yet can be challenging to delimit objectively. This is particularly true of species complexes characterized by high levels of population genetic structure, hybridization between genetic groups, isolation by distance, and limited phenotypic variation. Previous work on the Cumberland Plateau Salamander, Plethodon kentucki, suggested that it might constitute a species complex despite occupying a relatively small geographic range. To examine this hypothesis, we sampled 135 individuals from 43 populations, and used four mitochondrial loci and five nuclear loci (5693 base pairs) to quantify phylogeographic structure and probe for cryptic species diversity. Rates of evolution for each locus were inferred using the multidistribute package, and time calibrated gene trees and species trees were inferred using BEAST 2 and *BEAST 2, respectively. Because the parameter space relevant for species delimitation is large and complex, and all methods make simplifying assumptions that may lead them to fail, we conducted an array of analyses. Our assumption was that strongly supported species would be congruent across methods. Putative species were first delimited using a Bayesian implementation of the GMYC model (bGMYC), Geneland, and Brownie. We then validated these species using the genealogical sorting index and BPP. We found substantial phylogeographic diversity using mtDNA, including four divergent clades and an inferred common ancestor at 14.9 myr (95% HPD: 10.8–19.7 myr). By contrast, this diversity was not corroborated by nuclear sequence data, which exhibited low levels of variation and weak phylogeographic structure. Species trees estimated a far younger root than did the mtDNA data, closer to 1.0 myr old. Mutually exclusive putative species were identified by the different approaches. Possible causes of data set discordance, and the problem of species delimitation in complexes with high levels of population structure and introgressive hybridization, are discussed. PMID:26974148

  6. Computerized assessment of communication for cognitive stimulation for people with cognitive decline using spectral-distortion measures and phylogenetic inference.

    PubMed

    Pham, Tuan D; Oyama-Higa, Mayumi; Truong, Cong-Thang; Okamoto, Kazushi; Futaba, Terufumi; Kanemoto, Shigeru; Sugiyama, Masahide; Lampe, Lisa

    2015-01-01

    Therapeutic communication and interpersonal relationships in care homes can help people to improve their mental wellbeing. Assessment of the efficacy of these dynamic and complex processes are necessary for psychosocial planning and management. This paper presents a pilot application of photoplethysmography in synchronized physiological measurements of communications between the care-giver and people with dementia. Signal-based evaluations of the therapy can be carried out using the measures of spectral distortion and the inference of phylogenetic trees. The proposed computational models can be of assistance and cost-effectiveness in caring for and monitoring people with cognitive decline.

  7. Tropical geometry of statistical models.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.

  8. Computerized Assessment of Communication for Cognitive Stimulation for People with Cognitive Decline Using Spectral-Distortion Measures and Phylogenetic Inference

    PubMed Central

    Pham, Tuan D.; Oyama-Higa, Mayumi; Truong, Cong-Thang; Okamoto, Kazushi; Futaba, Terufumi; Kanemoto, Shigeru; Sugiyama, Masahide; Lampe, Lisa

    2015-01-01

    Therapeutic communication and interpersonal relationships in care homes can help people to improve their mental wellbeing. Assessment of the efficacy of these dynamic and complex processes are necessary for psychosocial planning and management. This paper presents a pilot application of photoplethysmography in synchronized physiological measurements of communications between the care-giver and people with dementia. Signal-based evaluations of the therapy can be carried out using the measures of spectral distortion and the inference of phylogenetic trees. The proposed computational models can be of assistance and cost-effectiveness in caring for and monitoring people with cognitive decline. PMID:25803586

  9. Simultaneously estimating evolutionary history and repeated traits phylogenetic signal: applications to viral and host phenotypic evolution

    PubMed Central

    Vrancken, Bram; Lemey, Philippe; Rambaut, Andrew; Bedford, Trevor; Longdon, Ben; Günthard, Huldrych F.; Suchard, Marc A.

    2014-01-01

    Phylogenetic signal quantifies the degree to which resemblance in continuously-valued traits reflects phylogenetic relatedness. Measures of phylogenetic signal are widely used in ecological and evolutionary research, and are recently gaining traction in viral evolutionary studies. Standard estimators of phylogenetic signal frequently condition on data summary statistics of the repeated trait observations and fixed phylogenetics trees, resulting in information loss and potential bias. To incorporate the observation process and phylogenetic uncertainty in a model-based approach, we develop a novel Bayesian inference method to simultaneously estimate the evolutionary history and phylogenetic signal from molecular sequence data and repeated multivariate traits. Our approach builds upon a phylogenetic diffusion framework that model continuous trait evolution as a Brownian motion process and incorporates Pagel’s λ transformation parameter to estimate dependence among traits. We provide a computationally efficient inference implementation in the BEAST software package. We evaluate the synthetic performance of the Bayesian estimator of phylogenetic signal against standard estimators, and demonstrate the use of our coherent framework to address several virus-host evolutionary questions, including virulence heritability for HIV, antigenic evolution in influenza and HIV, and Drosophila sensitivity to sigma virus infection. Finally, we discuss model extensions that will make useful contributions to our flexible framework for simultaneously studying sequence and trait evolution. PMID:25780554

  10. Mortality rates associated with crown health for eastern forest tree species

    Treesearch

    Randall S. Morin; KaDonna C. Randolph; Jim Steinman

    2015-01-01

    The condition of tree crowns is an important indicator of tree and forest health. Crown conditions have been evaluated during inventories of the US Forest Service Forest Inventory and Analysis (FIA) program since 1999. In this study, remeasured data from 55,013 trees on 2616 FIA plots in the eastern USA were used to assess the probability of survival among various tree...

  11. Phylogeny and Divergence Times of Lemurs Inferred with Recent and Ancient Fossils in the Tree.

    PubMed

    Herrera, James P; Dávalos, Liliana M

    2016-09-01

    Paleontological and neontological systematics seek to answer evolutionary questions with different data sets. Phylogenies inferred for combined extant and extinct taxa provide novel insights into the evolutionary history of life. Primates have an extensive, diverse fossil record and molecular data for living and extinct taxa are rapidly becoming available. We used two models to infer the phylogeny and divergence times for living and fossil primates, the tip-dating (TD) and fossilized birth-death process (FBD). We collected new morphological data, especially on the living and extinct endemic lemurs of Madagascar. We combined the morphological data with published DNA sequences to infer near-complete (88% of lemurs) time-calibrated phylogenies. The results suggest that primates originated around the Cretaceous-Tertiary boundary, slightly earlier than indicated by the fossil record and later than previously inferred from molecular data alone. We infer novel relationships among extinct lemurs, and strong support for relationships that were previously unresolved. Dates inferred with TD were significantly older than those inferred with FBD, most likely related to an assumption of a uniform branching process in the TD compared with a birth-death process assumed in the FBD. This is the first study to combine morphological and DNA sequence data from extinct and extant primates to infer evolutionary relationships and divergence times, and our results shed new light on the tempo of lemur evolution and the efficacy of combined phylogenetic analyses. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Chapter 7 - Crown condition

    Treesearch

    KaDonna C. Randolph

    2018-01-01

    Tree crown conditions are visually assessed by the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program as an indicator of forest health. These assessments are useful because individual tree photosynthetic capacity is dependent upon the size and condition of the crown. In general, trees with full, vigorous crowns are associated...

  13. Tree structure and cavity microclimate: implications for bats and birds.

    PubMed

    Clement, Matthew J; Castleberry, Steven B

    2013-05-01

    It is widely assumed that tree cavity structure and microclimate affect cavity selection and use in cavity-dwelling bats and birds. Despite the interest in tree structure and microclimate, the relationship between the two has rarely been quantified. Currently available data often comes from artificial structures that may not accurately represent conditions in natural cavities. We collected data on tree cavity structure and microclimate from 45 trees in five cypress-gum swamps in the Coastal Plain of Georgia in the United States in 2008. We used hierarchical linear models to predict cavity microclimate from tree structure and ambient temperature and humidity, and used Aikaike's information criterion to select the most parsimonious models. We found large differences in microclimate among trees, but tree structure variables explained <28% of the variation, while ambient conditions explained >80% of variation common to all trees. We argue that the determinants of microclimate are complex and multidimensional, and therefore cavity microclimate cannot be deduced easily from simple tree structures. Furthermore, we found that daily fluctuations in ambient conditions strongly affect microclimate, indicating that greater weather fluctuations will cause greater differences among tree cavities.

  14. Feature-to-Feature Inference Under Conditions of Cue Restriction and Dimensional Correlation.

    PubMed

    Lancaster, Matthew E; Homa, Donald

    2017-01-01

    The present study explored feature-to-feature and label-to-feature inference in a category task for different category structures. In the correlated condition, each of the 4 dimensions comprising the category was positively correlated to each other and to the category label. In the uncorrelated condition, no correlation existed between the 4 dimensions comprising the category, although the dimension to category label correlation matched that of the correlated condition. After learning, participants made inference judgments of a missing feature, given 1, 2, or 3 feature cues; on half the trials, the category label was also included as a cue. The results showed superior inference of features following training on the correlated structure, with accurate inference when only a single feature was presented. In contrast, a single-feature cue resulted in chance levels of inference for the uncorrelated structure. Feature inference systematically improved with number of cues after training on the correlated structure. Surprisingly, a similar outcome was obtained for the uncorrelated structure, an outcome that must have reflected mediation via the category label. A descriptive model is briefly introduced to explain the results, with a suggestion that this paradigm might be profitably extended to hierarchical structures where the levels of feature-to-feature inference might vary with the depth of the hierarchy.

  15. Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics.

    PubMed

    Grünewald, Stefan; Long, Yangjing; Wu, Yaokun

    2018-03-09

    Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.

  16. CADDIS Volume 4. Data Analysis: Predicting Environmental Conditions from Biological Observations (PECBO Appendix)

    EPA Pesticide Factsheets

    Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.

  17. Evaluating the predictability of distance race performance in NCAA cross country and track and field from high school race times in the United States.

    PubMed

    Brusa, Jamie L

    2017-12-30

    Successful recruiting for collegiate track & field athletes has become a more competitive and essential component of coaching. This study aims to determine the relationship between race performances of distance runners at the United States high school and National Collegiate Athletic Association (NCAA) levels. Conditional inference classification tree models were built and analysed to predict the probability that runners would qualify for the NCAA Division I National Cross Country Meet and/or the East or West NCAA Division I Outdoor Track & Field Preliminary Round based on their high school race times in the 800 m, 1600 m, and 3200 m. Prediction accuracies of the classification trees ranged from 60.0 to 76.6 percent. The models produced the most reliable estimates for predicting qualifiers in cross country, the 1500 m, and the 800 m for females and cross country, the 5000 m, and the 800 m for males. NCAA track & field coaches can use the results from this study as a guideline for recruiting decisions. Additionally, future studies can apply the methodological foundations of this research to predicting race performances set at different metrics, such as national meets in other countries or Olympic qualifications, from previous race data.

  18. Application of stochastic models in identification and apportionment of heavy metal pollution sources in the surface soils of a large-scale region.

    PubMed

    Hu, Yuanan; Cheng, Hefa

    2013-04-16

    As heavy metals occur naturally in soils at measurable concentrations and their natural background contents have significant spatial variations, identification and apportionment of heavy metal pollution sources across large-scale regions is a challenging task. Stochastic models, including the recently developed conditional inference tree (CIT) and the finite mixture distribution model (FMDM), were applied to identify the sources of heavy metals found in the surface soils of the Pearl River Delta, China, and to apportion the contributions from natural background and human activities. Regression trees were successfully developed for the concentrations of Cd, Cu, Zn, Pb, Cr, Ni, As, and Hg in 227 soil samples from a region of over 7.2 × 10(4) km(2) based on seven specific predictors relevant to the source and behavior of heavy metals: land use, soil type, soil organic carbon content, population density, gross domestic product per capita, and the lengths and classes of the roads surrounding the sampling sites. The CIT and FMDM results consistently indicate that Cd, Zn, Cu, Pb, and Cr in the surface soils of the PRD were contributed largely by anthropogenic sources, whereas As, Ni, and Hg in the surface soils mostly originated from the soil parent materials.

  19. Oil composition and genetic biodiversity of ancient and new olive (Olea europea L.) varieties and accessions of southern Italy.

    PubMed

    Cicatelli, Angela; Fortunati, Tancredi; De Feis, Italia; Castiglione, Stefano

    2013-09-01

    The present study is focused on determining the olive oil fatty acid composition of ancient and recent varieties of the Campania region (Italy), but also on molecularly characterizing the most common cultivated varieties in the same region, together with olive trees of the garden of the University Campus of Salerno and of three olive groves of south Italy. Fatty acid methyl esters in the extra virgin oil derived olive fruits were determined, during three consecutive harvests, by gas chromatography. The statistical analysis on fatty acid composition was performed with the ffmanova package. The genetic biodiversity of the olive collection was estimated by using eight highly polymorphic microsatellite loci and calculating the most commonly used indexes. "Dice index" was employed to estimate the similarity level of the analysed olive samples, while the Structure software to infer their genetic structure. The fatty acid content of extra virgin olive oils, produced from the two olive groves in Campania, suggests that the composition is mainly determined by genotype and not by cultural practices or climatic conditions. Furthermore, the analysis conducted on the molecular data revealed the presence of 100 distinct genotypes and seven homonymies out of the 136 analysed trees. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  20. Conditional random fields for pattern recognition applied to structured data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burr, Tom; Skurikhin, Alexei

    In order to predict labels from an output domain, Y, pattern recognition is used to gather measurements from an input domain, X. Image analysis is one setting where one might want to infer whether a pixel patch contains an object that is “manmade” (such as a building) or “natural” (such as a tree). Suppose the label for a pixel patch is “manmade”; if the label for a nearby pixel patch is then more likely to be “manmade” there is structure in the output domain that can be exploited to improve pattern recognition performance. Modeling P(X) is difficult because features betweenmore » parts of the model are often correlated. Thus, conditional random fields (CRFs) model structured data using the conditional distribution P(Y|X = x), without specifying a model for P(X), and are well suited for applications with dependent features. Our paper has two parts. First, we overview CRFs and their application to pattern recognition in structured problems. Our primary examples are image analysis applications in which there is dependence among samples (pixel patches) in the output domain. Second, we identify research topics and present numerical examples.« less

  1. Conditional random fields for pattern recognition applied to structured data

    DOE PAGES

    Burr, Tom; Skurikhin, Alexei

    2015-07-14

    In order to predict labels from an output domain, Y, pattern recognition is used to gather measurements from an input domain, X. Image analysis is one setting where one might want to infer whether a pixel patch contains an object that is “manmade” (such as a building) or “natural” (such as a tree). Suppose the label for a pixel patch is “manmade”; if the label for a nearby pixel patch is then more likely to be “manmade” there is structure in the output domain that can be exploited to improve pattern recognition performance. Modeling P(X) is difficult because features betweenmore » parts of the model are often correlated. Thus, conditional random fields (CRFs) model structured data using the conditional distribution P(Y|X = x), without specifying a model for P(X), and are well suited for applications with dependent features. Our paper has two parts. First, we overview CRFs and their application to pattern recognition in structured problems. Our primary examples are image analysis applications in which there is dependence among samples (pixel patches) in the output domain. Second, we identify research topics and present numerical examples.« less

  2. Conflicting Evolutionary Histories of the Mitochondrial and Nuclear Genomes in New World Myotis Bats.

    PubMed

    Platt, Roy N; Faircloth, Brant C; Sullivan, Kevin A M; Kieran, Troy J; Glenn, Travis C; Vandewege, Michael W; Lee, Thomas E; Baker, Robert J; Stevens, Richard D; Ray, David A

    2018-03-01

    The rapid diversification of Myotis bats into more than 100 species is one of the most extensive mammalian radiations available for study. Efforts to understand relationships within Myotis have primarily utilized mitochondrial markers and trees inferred from nuclear markers lacked resolution. Our current understanding of relationships within Myotis is therefore biased towards a set of phylogenetic markers that may not reflect the history of the nuclear genome. To resolve this, we sequenced the full mitochondrial genomes of 37 representative Myotis, primarily from the New World, in conjunction with targeted sequencing of 3648 ultraconserved elements (UCEs). We inferred the phylogeny and explored the effects of concatenation and summary phylogenetic methods, as well as combinations of markers based on informativeness or levels of missing data, on our results. Of the 294 phylogenies generated from the nuclear UCE data, all are significantly different from phylogenies inferred using mitochondrial genomes. Even within the nuclear data, quartet frequencies indicate that around half of all UCE loci conflict with the estimated species tree. Several factors can drive such conflict, including incomplete lineage sorting, introgressive hybridization, or even phylogenetic error. Despite the degree of discordance between nuclear UCE loci and the mitochondrial genome and among UCE loci themselves, the most common nuclear topology is recovered in one quarter of all analyses with strong nodal support. Based on these results, we re-examine the evolutionary history of Myotis to better understand the phenomena driving their unique nuclear, mitochondrial, and biogeographic histories.

  3. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

    PubMed

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

    2017-01-05

    Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the tree and 71% of all clades. We have made all data from this experiment (raw sequencing reads, consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php .

  4. How many bootstrap replicates are necessary?

    PubMed

    Pattengale, Nicholas D; Alipour, Masoud; Bininda-Emonds, Olaf R P; Moret, Bernard M E; Stamatakis, Alexandros

    2010-03-01

    Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.

  5. Impact Of Selfing On The Inference Of Demographic History From Whole Genomes In Theobroma cacao L.

    USDA-ARS?s Scientific Manuscript database

    Theobroma cacao L (cacao: Malvaceae) is a small tree found naturally in the Amazonian rain forest. An interesting feature of cacao is that it persists in populations of naturally outcrossing and inbreeding plants, as it is a species with a complex system of self-incompatibility, where a fraction of...

  6. Inferring ancestral distribution area and survival vegetation of Caragana (Fabaceae) in Tertiary

    Treesearch

    Mingli Zhang; Juanjuan Xue; Qiang Zhang; Stewart C. Sanderson

    2015-01-01

    Caragana, a leguminous genus mainly restricted to temperate Central and East Asia, occurs in arid, semiarid, and humid belts, and has forest, grassland, and desert ecotypes. Based on the previous molecular phylogenetic tree and dating, biogeographical analyses of extant species area and ecotype were conducted by means of four ancestral optimization approaches: S-DIVA,...

  7. Canopy structure on forest lands in western Oregon: differences among forest types and stand ages

    Treesearch

    Anne C.S. McIntosh; Andrew N. Gray; Steven L. Garman

    2009-01-01

    Canopy structure is an important attribute affecting economic and ecological values of forests in the Pacific Northwest. However, canopy cover and vertical layering are rarely measured directly; they are usually inferred from other forest measurements. In this study, we quantified and compared vertical and horizontal patterns of tree canopy structure and understory...

  8. Mixed-severity fire regimes in dry forests of southern interior British Columbia, Canada

    Treesearch

    Emily K. Heyerdahl; Ken Lertzman; Carmen M. Wong

    2012-01-01

    Historical fire severity is poorly characterized for dry forests in the interior west of North America. We inferred a multicentury history of fire severity from tree rings in Douglas-fir (Pseudotsuga menziesii var. glauca (Beissn.) Franco) - ponderosa pine (Pinus ponderosa Douglas ex P. Lawson & C. Lawson) forests in the southern interior of British Columbia,...

  9. Ratio equations for loblolly pine trees

    Treesearch

    Dehai Zhao; Michael Kane; Daniel Markewitz; Robert Teskey

    2015-01-01

    The conversion factors (CFs) or expansion factors (EFs) are often used to convert volume to green or dry weight, or from one component biomass to estimate total biomass or other component biomass. These factors might be inferred from the previously developed biomass and volume equations with or without destructive sampling data. However, how the factors are related to...

  10. Woodland-to-forest transition during prolonged drought in Minnesota after ca. AD 1300.

    PubMed

    Shuman, Bryan; Henderson, Anna K; Plank, Colin; Stefanova, Ivanka; Ziegler, Susy S

    2009-10-01

    Interactions among multiple causes of ecological perturbation, such as climate change and disturbance, can produce "ecological surprises." Here, we examine whether climate-fire-vegetation interactions can produce ecological changes that differ in direction from those expected from the effects of climate change alone. To do so, we focus on the "Big Woods" of central Minnesota, USA, which was shaped both by climate and fire. The deciduous Big Woods forest replaced regional woodlands and savannas after the severity of regional fire regimes declined at ca. AD 1300. A trend toward wet conditions has long been assumed to explain the forest expansion, but we show that water levels at two lakes within the region (Wolsfeld Lake and Bufflehead Pond) were low when open woodlands were transformed into the Big Woods. Water levels were high instead at ca. 2240-795 BC when regional fire regimes were most severe. Based on the correlation between water levels and fire-regime severity, we infer that prolonged or repeated droughts after ca. AD 1265 reduced the biomass and connectivity of fine fuels (grasses) within the woodlands. As a result, regional fire severity declined and allowed tree populations to expand. Tree-ring data from the region show a peak in the recruitment of key Big Woods tree species during the AD 1930s drought and suggest that low regional moisture balance need not have been a limiting factor for forest expansion. The regional history, thus, demonstrates the types of counterintuitive ecosystem changes that may arise as climate changes in the future.

  11. Application of Unmanned Aircraft Systems (UAS) for phenotypic mapping of white spruce genotypes along environmental gradients

    NASA Astrophysics Data System (ADS)

    D'Odorico, P.; Wong, C. Y.; Besik, A.; Earon, E.; Isabel, N.; Ensminger, I.

    2017-12-01

    Rapid climate change is expected to cause a mismatch between locally adapted tree populations and the optimal climatic conditions to which they have adapted. Plant breeding and reforestation programs will increasingly need to rely on high-throughput precision phenotyping tools for the selection of genotypes with increased drought and stress tolerance. In this work, we present the possibilities offered by Unmanned Aircraft Systems (UAS) carrying optical sensors to monitor and assess differences in performance among white spruce genotypes. While high-throughput precision phenotyping using UAS has gained traction in agronomic crop research during the last few years, to our knowledge it is still at its infancy in forestry applications. UAS surveys were performed at different times during the growing season over large white spruce common garden experiments established by the Canadian Forest Service at four different sites, each characterized by 2000 clonally replicated genotypes. Sites are distributed over a latitudinal gradient, in Ontario and Quebec, Canada. The UAS payload consisted of a custom-bands multispectral sensor acquiring radiation at wavelength at which the reflectance spectrum of vegetation is known to capture physiological change under disturbance and stress. Ground based tree-top spectral reflectances and leaf level functional traits were also acquired for validation purposes parallel to UAS surveys. We will discuss the potential and the challenges of using optical sensors on UAS to infer genotypic variation in tree response to stress events and show how spectral data can function as the link between large-scale phenotype and genotype data.

  12. Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning

    NASA Astrophysics Data System (ADS)

    Rouet-Leduc, B.; Hulbert, C.; Ren, C. X.; Bolton, D. C.; Marone, C.; Johnson, P. A.

    2017-12-01

    Fault friction controls nearly all aspects of fault rupture, yet it is only possible to measure in the laboratory. Here we describe laboratory experiments where acoustic emissions are recorded from the fault. We find that by applying a machine learning approach known as "extreme gradient boosting trees" to the continuous acoustical signal, the fault friction can be directly inferred, showing that instantaneous characteristics of the acoustic signal are a fingerprint of the frictional state. This machine learning-based inference leads to a simple law that links the acoustic signal to the friction state, and holds for every stress cycle the laboratory fault goes through. The approach does not use any other measured parameter than instantaneous statistics of the acoustic signal. This finding may have importance for inferring frictional characteristics from seismic waves in Earth where fault friction cannot be measured.

  13. Tree crown conditions in Missouri, 2000-2003

    Treesearch

    KaDonna C. Randolph; W. Keith Moser

    2009-01-01

    The Forest Service, U.S. Department of Agriculture, Forest Inventory and Analysis (FIA) Program uses visual assessments of tree crown condition to monitor changes and trends in forest health. This report describes three FIA tree crown condition indicators (crown dieback, crown density, and foliage transparency) and sapling crown vigor measured in Missouri between 2000...

  14. Endosymbiotic gene transfer from prokaryotic pangenomes: Inherited chimerism in eukaryotes.

    PubMed

    Ku, Chuan; Nelson-Sathi, Shijulal; Roettger, Mayo; Garg, Sriram; Hazkani-Covo, Einat; Martin, William F

    2015-08-18

    Endosymbiotic theory in eukaryotic-cell evolution rests upon a foundation of three cornerstone partners--the plastid (a cyanobacterium), the mitochondrion (a proteobacterium), and its host (an archaeon)--and carries a corollary that, over time, the majority of genes once present in the organelle genomes were relinquished to the chromosomes of the host (endosymbiotic gene transfer). However, notwithstanding eukaryote-specific gene inventions, single-gene phylogenies have never traced eukaryotic genes to three single prokaryotic sources, an issue that hinges crucially upon factors influencing phylogenetic inference. In the age of genomes, single-gene trees, once used to test the predictions of endosymbiotic theory, now spawn new theories that stand to eventually replace endosymbiotic theory with descriptive, gene tree-based variants featuring supernumerary symbionts: prokaryotic partners distinct from the cornerstone trio and whose existence is inferred solely from single-gene trees. We reason that the endosymbiotic ancestors of mitochondria and chloroplasts brought into the eukaryotic--and plant and algal--lineage a genome-sized sample of genes from the proteobacterial and cyanobacterial pangenomes of their respective day and that, even if molecular phylogeny were artifact-free, sampling prokaryotic pangenomes through endosymbiotic gene transfer would lead to inherited chimerism. Recombination in prokaryotes (transduction, conjugation, transformation) differs from recombination in eukaryotes (sex). Prokaryotic recombination leads to pangenomes, and eukaryotic recombination leads to vertical inheritance. Viewed from the perspective of endosymbiotic theory, the critical transition at the eukaryote origin that allowed escape from Muller's ratchet--the origin of eukaryotic recombination, or sex--might have required surprisingly little evolutionary innovation.

  15. Water retained in tall Cryptomeria japonica leaves as studied by infrared micro-spectroscopy.

    PubMed

    Azuma, Wakana; Nakashima, Satoru; Yamakita, Eri; Ishii, H Roaki; Kuroda, Keiko

    2017-10-01

    Recent studies in the tallest tree species suggest that physiological and anatomical traits of tree-top leaves are adapted to water-limited conditions. In order to examine water retention mechanism of leaves in a tall tree, infrared (IR) micro-spectroscopy was conducted on mature leaf cross-sections of tall Cryptomeria japonica D. Don from four different heights (51, 43, 31 and 19 m). We measured IR transmission spectra and mainly analyzed OH (3700-3000 cm-1) and C-O (1190-845 cm-1) absorption bands, indicating water molecules and sugar groups, respectively. The changes in IR spectra of leaf sections from different heights were compared with bulk-leaf hydraulics. Both average OH band area of the leaf sections and leaf water content were larger in the upper-crown, while osmotic potential at saturation did not vary with height, suggesting higher dissolved sugar contents of upper-crown leaves. As cell-wall is the main cellular structure of leaves, we inferred that larger average C-O band area of upper-crown leaves reflected higher content of structural polysaccharides such as cellulose, hemicellulose and pectin. Infrared micro-spectroscopic imaging showed that the OH and C-O band areas are large in the vascular bundle, transfusion tissue and epidermis. Infrared spectra of individual tissue showed that much more water is retained in vascular bundle and transfusion tissue than mesophyll. These results demonstrate that IR micro-spectroscopy is a powerful tool for visualizing detailed, quantitative information on the spatial distribution of chemical substances within plant tissues, which cannot be done using conventional methods like histochemical staining. The OH band could be well reproduced by four Gaussian OH components around 3530 (free water: long H bond), 3410 (pectin-like OH species), 3310 (cellulose-like OH species) and 3210 (bound water: short H bond) cm-1, and all of these OH components were higher in the upper crown while their relative proportions did not vary with height. Based on the spectral analyses, we inferred that polysaccharides play a key role in biomolecular retention of water in leaves of tall C. japonica. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. Predictive Factors of Surgical Outcome in Frontal Lobe Epilepsy Explored with Stereoelectroencephalography.

    PubMed

    Bonini, Francesca; McGonigal, Aileen; Scavarda, Didier; Carron, Romain; Régis, Jean; Dufour, Henry; Péragut, Jean-Claude; Laguitton, Virginie; Villeneuve, Nathalie; Chauvel, Patrick; Giusiano, Bernard; Trébuchon, Agnès; Bartolomei, Fabrice

    2017-06-29

    Resective surgery established treatment for pharmacoresistant frontal lobe epilepsy (FLE), but seizure outcome and prognostic indicators are poorly characterized and vary between studies. To study long-term seizure outcome and identify prognostic factors. We retrospectively analyzed 42 FLE patients having undergone surgical resection, mostly preceded by invasive recordings with stereoelectroencephalography (SEEG). Postsurgical outcome up to 10-yr follow-up and prognostic indicators were analyzed using Kaplan-Meier analysis and multivariate and conditional inference procedures. At the time of last follow-up, 57.1% of patients were seizure-free. The estimated chance of seizure freedom was 67% (95% confidence interval [CI]: 54-83) at 6 mo, 59% (95% CI: 46-76) at 1 yr, 53% (95% CI: 40-71) at 2 yr, and 46% (95% CI: 32-66) at 5 yr. Most relapses (83%) occurred within the first 12 mo. Multivariate analysis showed that completeness of resection of the epileptogenic zone (EZ) as defined by SEEG was the main predictor of seizure outcome. According to conditional inference trees, in patients with complete resection of the EZ, focal cortical dysplasia as etiology and focal EZ were positive prognostic indicators. No difference in outcome was found in patients with positive vs negative magnetic resonance imaging. Surgical resection in drug-resistant FLE can be a successful therapeutic approach, even in the absence of neuroradiologically visible lesions. SEEG may be highly useful in both nonlesional and lesional FLE cases, because complete resection of the EZ as defined by SEEG is associated with better prognosis. Copyright © 2017 by the Congress of Neurological Surgeons

  17. Classification tree for the assessment of sedentary lifestyle among hypertensive.

    PubMed

    Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves Dos Santos, Naftale

    2016-04-01

    To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection). The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.

  18. Estimating the Effective Sample Size of Tree Topologies from Bayesian Phylogenetic Analyses

    PubMed Central

    Lanfear, Robert; Hua, Xia; Warren, Dan L.

    2016-01-01

    Bayesian phylogenetic analyses estimate posterior distributions of phylogenetic tree topologies and other parameters using Markov chain Monte Carlo (MCMC) methods. Before making inferences from these distributions, it is important to assess their adequacy. To this end, the effective sample size (ESS) estimates how many truly independent samples of a given parameter the output of the MCMC represents. The ESS of a parameter is frequently much lower than the number of samples taken from the MCMC because sequential samples from the chain can be non-independent due to autocorrelation. Typically, phylogeneticists use a rule of thumb that the ESS of all parameters should be greater than 200. However, we have no method to calculate an ESS of tree topology samples, despite the fact that the tree topology is often the parameter of primary interest and is almost always central to the estimation of other parameters. That is, we lack a method to determine whether we have adequately sampled one of the most important parameters in our analyses. In this study, we address this problem by developing methods to estimate the ESS for tree topologies. We combine these methods with two new diagnostic plots for assessing posterior samples of tree topologies, and compare their performance on simulated and empirical data sets. Combined, the methods we present provide new ways to assess the mixing and convergence of phylogenetic tree topologies in Bayesian MCMC analyses. PMID:27435794

  19. Regional tree growth and inferred summer climate in the Winnipeg River basin, Canada, since AD 1783

    NASA Astrophysics Data System (ADS)

    St. George, Scott; Meko, David M.; Evans, Michael N.

    2008-09-01

    A network of 54 ring-width chronologies is used to estimate changes in summer climate within the Winnipeg River basin, Canada, since AD 1783. The basin drains parts of northwestern Ontario, northern Minnesota and southeastern Manitoba, and is a key area for hydroelectric power production. Most chronologies were developed from Pinus resinosa and P. strobus, with a limited number of Thuja occidentalis, Picea glauca and Pinus banksiana. The dominant pattern of regional tree growth can be recovered using only the nine longest chronologies, and is not affected by the method used to remove variability related to age or stand dynamics from individual trees. Tree growth is significantly, but weakly, correlated with both temperature (negatively) and precipitation (positively) during summer. Simulated ring-width chronologies produced by a process model of tree-ring growth exhibit similar relationships with summer climate. High and low growth across the region is associated with cool/wet and warm/dry summers, respectively; this relationship is supported by comparisons with archival records from early 19th century fur-trading posts. The tree-ring record indicates that summer droughts were more persistent in the 19th and late 18th century, but there is no evidence that drought was more extreme prior to the onset of direct monitoring.

  20. More on the Best Evolutionary Rate for Phylogenetic Analysis

    PubMed Central

    Massingham, Tim; Goldman, Nick

    2017-01-01

    Abstract The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations. We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well. Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data. PMID:28595363

Top