Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal
2012-01-01
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
Zwickl, Derrick J; Stein, Joshua C; Wing, Rod A; Ware, Doreen; Sanderson, Michael J
2014-09-01
We describe new methods for characterizing gene tree discordance in phylogenomic data sets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Using an exceptionally complete set of genome sequences for the short arm of chromosome 3 in Oryza (rice) species, we applied these methods to identify the causes and consequences of differing patterns of discordance in the sets of gene trees inferred using a panel of 20 distinct analysis pipelines. We found that discordance patterns were strongly affected by aspects of data selection, alignment, and alignment masking. Unusual patterns of discordance evident when using certain pipelines were reduced or eliminated by using alternative pipelines, suggesting that they were the product of methodological biases rather than evolutionary processes. In some cases, once such biases were eliminated, evolutionary processes such as introgression could be implicated. Additionally, patterns of gene tree discordance had significant downstream impacts on species tree inference. For example, inference from supermatrices was positively misleading when pipelines that led to biased gene trees were used. Several results may generalize to other data sets: we found that gene tree and species tree inference gave more reasonable results when intron sequence was included during sequence alignment and tree inference, the alignment software PRANK was used, and detectable "block-shift" alignment artifacts were removed. We discuss our findings in the context of well-established relationships in Oryza and continuing controversies regarding the domestication history of O. sativa. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V
2003-01-01
In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
AGILE: Autonomous Global Integrated Language Exploitation
2009-12-01
combination, including METEOR-based alignment (with stemming and WordNet synonym matching) and GIZA ++ based alignment. So far, we have not seen any...parse trees and a detailed analysis of how function words operate in translation. This program lets us fix alignment errors that systems like GIZA ...correlates better with Pyramid than with Responsiveness scoring (i.e., it is a more precise, careful, measure) • BE generally outperforms ROUGE
Sequence comparison alignment-free approach based on suffix tree and L-words frequency.
Soares, Inês; Goios, Ana; Amorim, António
2012-01-01
The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.
Simple chained guide trees give high-quality protein multiple sequence alignments
Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.
2014-01-01
Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495
BuddySuite: Command-Line Toolkits for Manipulating Sequences, Alignments, and Phylogenetic Trees.
Bond, Stephen R; Keat, Karl E; Barreira, Sofia N; Baxevanis, Andreas D
2017-06-01
The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.
Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer.
Bromberg, Raquel; Grishin, Nick V; Otwinowski, Zbyszek
2016-06-01
Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.
Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer
Grishin, Nick V.; Otwinowski, Zbyszek
2016-01-01
Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz. PMID:27336403
Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen
2010-07-01
We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.
Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy
2015-05-01
We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.
Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer
2017-09-26
Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .
Apparatus and method for harvesting woody plantations
Eggen, David L.
1988-11-15
A tree harvester for harvesting felled trees includes a wheel mounted wood chipper which moves toward the butt ends of the tree stems to be processed. The harvester includes a plurality of rotating alignment discs in front of the chipper. These discs align the tree stems to be processed with the mouth of the chipper. A chipper infeed cylinder is rotatably mounted between the discs and the front end of the chipper, and lifts the tree stem butts up from the ground into alignment with the chipper inlet port. The chips discharge from the chipper and go into a chip hopper which moves with the tree harvester.
Apparatus and method for harvesting woody plantations
Eggen, D.L.
1988-11-15
A tree harvester for harvesting felled trees includes a wheel mounted wood chipper which moves toward the butt ends of the tree stems to be processed. The harvester includes a plurality of rotating alignment discs in front of the chipper. These discs align the tree stems to be processed with the mouth of the chipper. A chipper infeed cylinder is rotatably mounted between the discs and the front end of the chipper, and lifts the tree stem butts up from the ground into alignment with the chipper inlet port. The chips discharge from the chipper and go into a chip hopper which moves with the tree harvester. 8 figs.
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences
Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong
2015-01-01
Abstract We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate—slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory. PMID:25549288
Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard
2010-03-31
Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
Swain, Timothy D
2018-01-01
The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.
Analyzing and synthesizing phylogenies using tree alignment graphs.
Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E
2013-01-01
Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs
Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.
2013-01-01
Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118
Molecular Phylogenetics: Concepts for a Newcomer.
Ajawatanawong, Pravech
Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.
2010-01-01
Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504
Darwin v. 2.0: an interpreted computer language for the biosciences.
Gonnet, G H; Hallett, M T; Korostensky, C; Bernardin, L
2000-02-01
We announce the availability of the second release of Darwin v. 2.0, an interpreted computer language especially tailored to researchers in the biosciences. The system is a general tool applicable to a wide range of problems. This second release improves Darwin version 1.6 in several ways: it now contains (1) a larger set of libraries touching most of the classical problems from computational biology (pairwise alignment, all versus all alignments, tree construction, multiple sequence alignment), (2) an expanded set of general purpose algorithms (search algorithms for discrete problems, matrix decomposition routines, complex/long integer arithmetic operations), (3) an improved language with a cleaner syntax, (4) better on-line help, and (5) a number of fixes to user-reported bugs. Darwin is made available for most operating systems free of char ge from the Computational Biochemistry Research Group (CBRG), reachable at http://chrg.inf.ethz.ch. darwin@inf.ethz.ch
Phylogenetic inference under varying proportions of indel-induced alignment gaps
Dwivedi, Bhakti; Gadagkar, Sudhindra R
2009-01-01
Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy. PMID:19698168
Carbone, Ignazio; White, James B; Miadlikowska, Jolanta; Arnold, A Elizabeth; Miller, Mark A; Kauff, Frank; U'Ren, Jana M; May, Georgiana; Lutzoni, François
2017-04-15
High-quality phylogenetic placement of sequence data has the potential to greatly accelerate studies of the diversity, systematics, ecology and functional biology of diverse groups. We developed the Tree-Based Alignment Selector (T-BAS) toolkit to allow evolutionary placement and visualization of diverse DNA sequences representing unknown taxa within a robust phylogenetic context, and to permit the downloading of highly curated, single- and multi-locus alignments for specific clades. In its initial form, T-BAS v1.0 uses a core phylogeny of 979 taxa (including 23 outgroup taxa, as well as 61 orders, 175 families and 496 genera) representing all 13 classes of largest subphylum of Fungi-Pezizomycotina (Ascomycota)-based on sequence alignments for six loci (nr5.8S, nrLSU, nrSSU, mtSSU, RPB1, RPB2 ). T-BAS v1.0 has three main uses: (i) Users may download alignments and voucher tables for members of the Pezizomycotina directly from the reference tree, facilitating systematics studies of focal clades. (ii) Users may upload sequence files with reads representing unknown taxa and place these on the phylogeny using either BLAST or phylogeny-based approaches, and then use the displayed tree to select reference taxa to include when downloading alignments. The placement of unknowns can be performed for large numbers of Sanger sequences obtained from fungal cultures and for alignable, short reads of environmental amplicons. (iii) User-customizable metadata can be visualized on the tree. T-BAS Version 1.0 is available online at http://tbas.hpc.ncsu.edu . Registration is required to access the CIPRES Science Gateway and NSF XSEDE's large computational resources. icarbon@ncsu.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.
Rajan, Vaibhav
2013-03-01
Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.
Visualizing phylogenetic tree landscapes.
Wilgenbusch, James C; Huang, Wen; Gallivan, Kyle A
2017-02-02
Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from multi-source data sets using current consensus tree methods discards valuable information and can disguise potential methodological problems. Discovery of efficient and accurate dimensionality reduction methods used to display at once in 2- or 3- dimensions the relationship among these competing phylogenies will help practitioners diagnose the limits of current evolutionary models and potential problems with phylogenetic reconstruction methods when analyzing large multi-source data sets. We introduce several dimensionality reduction methods to visualize in 2- and 3-dimensions the relationship among competing phylogenies obtained from gene partitions found in three mid- to large-size mitochondrial genome alignments. We test the performance of these dimensionality reduction methods by applying several goodness-of-fit measures. The intrinsic dimensionality of each data set is also estimated to determine whether projections in 2- and 3-dimensions can be expected to reveal meaningful relationships among trees from different data partitions. Several new approaches to aid in the comparison of different phylogenetic landscapes are presented. Curvilinear Components Analysis (CCA) and a stochastic gradient decent (SGD) optimization method give the best representation of the original tree-to-tree distance matrix for each of the three- mitochondrial genome alignments and greatly outperformed the method currently used to visualize tree landscapes. The CCA + SGD method converged at least as fast as previously applied methods for visualizing tree landscapes. We demonstrate for all three mtDNA alignments that 3D projections significantly increase the fit between the tree-to-tree distances and can facilitate the interpretation of the relationship among phylogenetic trees. We demonstrate that the choice of dimensionality reduction method can significantly influence the spatial relationship among a large set of competing phylogenetic trees. We highlight the importance of selecting a dimensionality reduction method to visualize large multi-locus phylogenetic landscapes and demonstrate that 3D projections of mitochondrial tree landscapes better capture the relationship among the trees being compared.
Yu, Xiaoyu; Reva, Oleg N
2018-01-01
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
Yu, Xiaoyu; Reva, Oleg N
2018-01-01
Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Simultaneous phylogeny reconstruction and multiple sequence alignment
Yue, Feng; Shi, Jian; Tang, Jijun
2009-01-01
Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
High scale flavor alignment in two-Higgs doublet models and its phenomenology
Gori, Stefania; Haber, Howard E.; Santos, Edward
2017-06-21
The most general two-Higgs doublet model (2HDM) includes potentially large sources of flavor changing neutral currents (FCNCs) that must be suppressed in order to achieve a phenomenologically viable model. The flavor alignment ansatz postulates that all Yukawa coupling matrices are diagonal when expressed in the basis of mass-eigenstate fermion fields, in which case tree-level Higgs-mediated FCNCs are eliminated. In this work, we explore models with the flavor alignment condition imposed at a very high energy scale, which results in the generation of Higgs-mediated FCNCs via renormalization group running from the high energy scale to the electroweak scale. Using the currentmore » experimental bounds on flavor changing observables, constraints are derived on the aligned 2HDM parameter space. In the favored parameter region, we analyze the implications for Higgs boson phenomenology.« less
Kocot, Kevin M; Citarella, Mathew R; Moroz, Leonid L; Halanych, Kenneth M
2013-01-01
Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.
Walking tree heuristics for biological string alignment, gene location, and phylogenies
NASA Astrophysics Data System (ADS)
Cull, P.; Holloway, J. L.; Cavener, J. D.
1999-03-01
Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model's assumptions. We propose a family of heuristics called walking trees to align biologically reasonable strings. Both edit-distance and walking tree methods can locate specific genes within a large string when the genes' sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree "discovered" a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be efficiently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network.
Block 3. Central view of Block 3 observed from the ...
Block 3. Central view of Block 3 observed from the west to the east. This photograph reveals the alignment of trees within the central path of the park. In addition, this photograph exposes broken bricks aligning tree beds - Skyline Park, 1500-1800 Arapaho Street, Denver, Denver County, CO
How effective are DNA barcodes in the identification of African rainforest trees?
Parmentier, Ingrid; Duminil, Jérôme; Kuzmina, Maria; Philippe, Morgane; Thomas, Duncan W; Kenfack, David; Chuyong, George B; Cruaud, Corinne; Hardy, Olivier J
2013-01-01
DNA barcoding of rain forest trees could potentially help biologists identify species and discover new ones. However, DNA barcodes cannot always distinguish between closely related species, and the size and completeness of barcode databases are key parameters for their successful application. We test the ability of rbcL, matK and trnH-psbA plastid DNA markers to identify rain forest trees at two sites in Atlantic central Africa under the assumption that a database is exhaustive in terms of species content, but not necessarily in terms of haplotype diversity within species. We assess the accuracy of identification to species or genus using a genetic distance matrix between samples either based on a global multiple sequence alignment (GD) or on a basic local alignment search tool (BLAST). Where a local database is available (within a 50 ha plot), barcoding was generally reliable for genus identification (95-100% success), but less for species identification (71-88%). Using a single marker, best results for species identification were obtained with trnH-psbA. There was a significant decrease of barcoding success in species-rich clades. When the local database was used to identify the genus of trees from another region and did include all genera from the query individuals but not all species, genus identification success decreased to 84-90%. The GD method performed best but a global multiple sequence alignment is not applicable on trnH-psbA. Barcoding is a useful tool to assign unidentified African rain forest trees to a genus, but identification to a species is less reliable, especially in species-rich clades, even using an exhaustive local database. Combining two markers improves the accuracy of species identification but it would only marginally improve genus identification. Finally, we highlight some limitations of the BLAST algorithm as currently implemented and suggest possible improvements for barcoding applications.
How Effective Are DNA Barcodes in the Identification of African Rainforest Trees?
Parmentier, Ingrid; Duminil, Jérôme; Kuzmina, Maria; Philippe, Morgane; Thomas, Duncan W.; Kenfack, David; Chuyong, George B.; Cruaud, Corinne; Hardy, Olivier J.
2013-01-01
Background DNA barcoding of rain forest trees could potentially help biologists identify species and discover new ones. However, DNA barcodes cannot always distinguish between closely related species, and the size and completeness of barcode databases are key parameters for their successful application. We test the ability of rbcL, matK and trnH-psbA plastid DNA markers to identify rain forest trees at two sites in Atlantic central Africa under the assumption that a database is exhaustive in terms of species content, but not necessarily in terms of haplotype diversity within species. Methodology/Principal Findings We assess the accuracy of identification to species or genus using a genetic distance matrix between samples either based on a global multiple sequence alignment (GD) or on a basic local alignment search tool (BLAST). Where a local database is available (within a 50 ha plot), barcoding was generally reliable for genus identification (95–100% success), but less for species identification (71–88%). Using a single marker, best results for species identification were obtained with trnH-psbA. There was a significant decrease of barcoding success in species-rich clades. When the local database was used to identify the genus of trees from another region and did include all genera from the query individuals but not all species, genus identification success decreased to 84–90%. The GD method performed best but a global multiple sequence alignment is not applicable on trnH-psbA. Conclusions/Significance Barcoding is a useful tool to assign unidentified African rain forest trees to a genus, but identification to a species is less reliable, especially in species-rich clades, even using an exhaustive local database. Combining two markers improves the accuracy of species identification but it would only marginally improve genus identification. Finally, we highlight some limitations of the BLAST algorithm as currently implemented and suggest possible improvements for barcoding applications. PMID:23565134
Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G.
2013-01-01
Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that – although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching – it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe. PMID:23418526
Padial, José M; Grant, Taran; Frost, Darrel R
2014-06-26
Brachycephaloidea is a monophyletic group of frogs with more than 1000 species distributed throughout the New World tropics, subtropics, and Andean regions. Recently, the group has been the target of multiple molecular phylogenetic analyses, resulting in extensive changes in its taxonomy. Here, we test previous hypotheses of phylogenetic relationships for the group by combining available molecular evidence (sequences of 22 genes representing 431 ingroup and 25 outgroup terminals) and performing a tree-alignment analysis under the parsimony optimality criterion using the program POY. To elucidate the effects of alignment and optimality criterion on phylogenetic inferences, we also used the program MAFFT to obtain a similarity-alignment for analysis under both parsimony and maximum likelihood using the programs TNT and GARLI, respectively. Although all three analytical approaches agreed on numerous points, there was also extensive disagreement. Tree-alignment under parsimony supported the monophyly of the ingroup and the sister group relationship of the monophyletic marsupial frogs (Hemiphractidae), while maximum likelihood and parsimony analyses of the MAFFT similarity-alignment did not. All three methods differed with respect to the position of Ceuthomantis smaragdinus (Ceuthomantidae), with tree-alignment using parsimony recovering this species as the sister of Pristimantis + Yunganastes. All analyses rejected the monophyly of Strabomantidae and Strabomantinae as originally defined, and the tree-alignment analysis under parsimony further rejected the recently redefined Craugastoridae and Pristimantinae. Despite the greater emphasis in the systematics literature placed on the choice of optimality criterion for evaluating trees than on the choice of method for aligning DNA sequences, we found that the topological differences attributable to the alignment method were as great as those caused by the optimality criterion. Further, the optimal tree-alignment indicates that insertions and deletions occurred in twice as many aligned positions as implied by the optimal similarity-alignment, confirming previous findings that sequence turnover through insertion and deletion events plays a greater role in molecular evolution than indicated by similarity-alignments. Our results also provide a clear empirical demonstration of the different effects of wildcard taxa produced by missing data in parsimony and maximum likelihood analyses. Specifically, maximum likelihood analyses consistently (81% bootstrap frequency) provided spurious resolution despite a lack of evidence, whereas parsimony correctly depicted the ambiguity due to missing data by collapsing unsupported nodes. We provide a new taxonomy for the group that retains previously recognized Linnaean taxa except for Ceuthomantidae, Strabomantidae, and Strabomantinae. A phenotypically diagnosable superfamily is recognized formally as Brachycephaloidea, with the informal, unranked name terrarana retained as the standard common name for these frogs. We recognize three families within Brachycephaloidea that are currently diagnosable solely on molecular grounds (Brachycephalidae, Craugastoridae, and Eleutherodactylidae), as well as five subfamilies (Craugastorinae, Eleutherodactylinae, Holoadeninae, Phyzelaphryninae, and Pristimantinae) corresponding in large part to previous families and subfamilies. Our analyses upheld the monophyly of all tested genera, but we found numerous subgeneric taxa to be non-monophyletic and modified the taxonomy accordingly.
Reconstructing evolutionary trees in parallel for massive sequences.
Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam
2017-12-14
Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
A new version of the RDP (Ribosomal Database Project)
NASA Technical Reports Server (NTRS)
Maidak, B. L.; Cole, J. R.; Parker, C. T. Jr; Garrity, G. M.; Larsen, N.; Li, B.; Lilburn, T. G.; McCaughey, M. J.; Olsen, G. J.; Overbeek, R.;
1999-01-01
The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.
6. Aerial view of turnpike alignment running from lower left ...
6. Aerial view of turnpike alignment running from lower left diagonally up to right along row of trees. Migel Estate and Farm buildings (HABS No. NY-6356) located at lower right of photograph. W.K. Smith house (HABS No. NY-6356-A) located within clump of trees at lower center, with poultry houses (HABS No. NY-6356-F and G) visible left of the clump of trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
Roettger, Mayo; Martin, William; Dagan, Tal
2009-09-01
Among the methods currently used in phylogenomic practice to detect the presence of lateral gene transfer (LGT), one of the most frequently employed is the comparison of gene tree topologies for different genes. In cases where the phylogenies for different genes are incompatible, or discordant, for well-supported branches there are three simple interpretations for the result: 1) gene duplications (paralogy) followed by many independent gene losses have occurred, 2) LGT has occurred, or 3) the phylogeny is well supported but for reasons unknown is nonetheless incorrect. Here, we focus on the third possibility by examining the properties of 22,437 published multiple sequence alignments, the Bayesian maximum likelihood trees for which either do or do not suggest the occurrence of LGT by the criterion of discordant branches. The alignments that produce discordant phylogenies differ significantly in several salient alignment properties from those that do not. Using a support vector machine, we were able to predict the inference of discordant tree topologies with up to 80% accuracy from alignment properties alone.
DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.
Kelly, Steven; Maini, Philip K
2013-01-01
The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Chronomics of tree rings for chronoastrobiology and beyond.
Nintcheu-Fata, Sylvain; Katinas, George; Halberg, Franz; Cornélissen, Germaine; Tolstykh, Victor; Michael, Henry N; Otsuka, Kuniaki; Schwartzkopff, Othild; Bakken, Earl
2003-10-01
Gliding spectral windows illustrate the changes as a function of time in the relative prominence of signals in a given frequency range, viewed in 3D or as surface charts. As an example, the method is applied to a 2,189-year series of averages of ring measurements on 11 sequoia trees published by Douglass. Analyses of the original data and after filtering reveal, among others, components with periods of about 10.5 and 21 years similar to the Schwabe and Hale solar activity cycles. An alignment of gliding spectra with a global spectrum serves to define, by minima, the ranges of variability around the anticipated Schwabe and Hale cycles. This procedure may have more general applicability when dealing with ranges of only transiently synchronized, wobbly, and perhaps sometimes free-running periodicities. Solar activity is known to affect climate and changes in climate are reflected to some extent in tree growth. The spectral structure in tree rings could serve not only to check any relations of climate with sunspots, auroras and more modern measures of solar activity, but also to check any purely mathematical extrapolations from the much shorter available actual data on solar activity. With such extrapolated series and the data analyzed herein, the task remains to align physical and physiological variables to further study the influence of natural environmental factors near and far on biota, including international battles, which cover an even longer span of 2,556 years.
Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric
2005-03-10
Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
Evolutionary distances in the twilight zone--a rational kernel approach.
Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian
2010-12-31
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
Phylogenomic analyses data of the avian phylogenomics project.
Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Narula, Nitish; Liu, Liang; Burt, Dave; Ellegren, Hans; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas Pius; Zhang, Guojie
2015-01-01
Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
The Inference of Gene Trees with Species Trees
Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien
2015-01-01
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970
A scalable approach for tree segmentation within small-footprint airborne LiDAR data
NASA Astrophysics Data System (ADS)
Hamraz, Hamid; Contreras, Marco A.; Zhang, Jun
2017-05-01
This paper presents a distributed approach that scales up to segment tree crowns within a LiDAR point cloud representing an arbitrarily large forested area. The approach uses a single-processor tree segmentation algorithm as a building block in order to process the data delivered in the shape of tiles in parallel. The distributed processing is performed in a master-slave manner, in which the master maintains the global map of the tiles and coordinates the slaves that segment tree crowns within and across the boundaries of the tiles. A minimal bias was introduced to the number of detected trees because of trees lying across the tile boundaries, which was quantified and adjusted for. Theoretical and experimental analyses of the runtime of the approach revealed a near linear speedup. The estimated number of trees categorized by crown class and the associated error margins as well as the height distribution of the detected trees aligned well with field estimations, verifying that the distributed approach works correctly. The approach enables providing information of individual tree locations and point cloud segments for a forest-level area in a timely manner, which can be used to create detailed remotely sensed forest inventories. Although the approach was presented for tree segmentation within LiDAR point clouds, the idea can also be generalized to scale up processing other big spatial datasets.
2017-01-01
The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference. PMID:29136663
Arulmozhi, R; Subramani, T; Sukumar, S
2015-01-01
Formation of new roads generally brings about adverse impact on the environment, and in the case of hill roads, the impact is diverse and effective measures are required to mitigate it. The common problems in hill road formation are tree cutting, destruction of canopies, change in land use pattern, soil erosion, slope instability, induced landslides, invasion of foreign species, and so on. Removal of trees and vegetations causes rapid soil erosion, landslides, and invasion of foreign species posing danger to the survival of weak native species. Dumping of surplus earth materials on the valley side poses a significant threat to the environment as it would cause induced landslides. Using the cut earth for filling in road formation and dumping, the surplus cut earth in safe locations will reduce environmental degradation considerably. Conventionally, hill road alignments are finalized using traditional survey methods using ghat tracer, compass, and leveling surveys which require enormous complicated field and office works. Any revision to reduce the quantum of earthwork is difficult in this method due to its complex nature. In the present study at Palamalai Hills, South India, an alignment for a length of 7.95 km was prepared by traditional methods using ghat tracer and total station instruments for survey works. The earthwork quantities were ascertained from the longitudinal profile of the alignment. A GPS survey was also conducted along the alignment to examine its utility in alignment modification. To modify the stretches, where the earthwork cutting and filling are above normal and unbalanced and result in surplus earth, repeated GPS surveys were conducted along different paths to optimize the earthwork. The earthwork quantities of the original alignment were analyzed, and its correlation with environmental effect and the usefulness of the GPS survey in this task are presented in this paper.
4. VIEW OF EMPIRE, STONE CABIN AND TIP TOP MINES. ...
4. VIEW OF EMPIRE, STONE CABIN AND TIP TOP MINES. EMPIRE TAILING PILE IS VISIBLE IN LOWER CENTER (SLOPE WITH ORE CHUTE IS HIDDEN BY TREES ABOVE TAILINGS), TIP TOP IS VISIBLE IN RIGHT THIRD AND SLIGHTLY UPHILL IN ELEVATION FROM UPPER EMPIRE TAILINGS,(TO LOCATE, FIND THE V-SHAPED SPOT OF SNOW JUST BELOW THE RIDGE LINE ON FAR RIGHT OF IMAGE. TIP TOP BUILDING IS VISIBLE IN THE LIGHT AREA BELOW AND SLIGHTLY LEFT OF V-SHAPED SNOW SPOT), AND STONE CABIN II IS ALSO VISIBLE, (TO LOCATE, USE A STRAIGHT EDGE AND ALIGN WITH EMPIRE TAILINGS. THIS WILL DIRECT ONE THROUGH THE EDGE OF STONE CABIN II, WHICH IS THE DARK SPOT JUST BELOW THE POINT WHERE THE RIDGE LINE TREES STOP). STONE CABIN I IS LOCATED IN GENERAL VICINITY OF THE LONE TREE ON FAR LEFT RIDGE LINE. ... - Florida Mountain Mining Sites, Silver City, Owyhee County, ID
Hal: an automated pipeline for phylogenetic analyses of genomic data.
Robbertse, Barbara; Yoder, Ryan J; Boyd, Alex; Reeves, John; Spatafora, Joseph W
2011-02-07
The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.
Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka
2016-01-01
Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296
ETE: a python Environment for Tree Exploration.
Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni
2010-01-13
Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.
ETE: a python Environment for Tree Exploration
2010-01-01
Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org. PMID:20070885
Blom, Mozes P K
2015-08-05
Recently developed molecular methods enable geneticists to target and sequence thousands of orthologous loci and infer evolutionary relationships across the tree of life. Large numbers of genetic markers benefit species tree inference but visual inspection of alignment quality, as traditionally conducted, is challenging with thousands of loci. Furthermore, due to the impracticality of repeated visual inspection with alternative filtering criteria, the potential consequences of using datasets with different degrees of missing data remain nominally explored in most empirical phylogenomic studies. In this short communication, I describe a flexible high-throughput pipeline designed to assess alignment quality and filter exonic sequence data for subsequent inference. The stringency criteria for alignment quality and missing data can be adapted based on the expected level of sequence divergence. Each alignment is automatically evaluated based on the stringency criteria specified, significantly reducing the number of alignments that require visual inspection. By developing a rapid method for alignment filtering and quality assessment, the consistency of phylogenetic estimation based on exonic sequence alignments can be further explored across distinct inference methods, while accounting for different degrees of missing data.
MaxAlign: maximizing usable data in an alignment.
Gouveia-Oliveira, Rodrigo; Sackett, Peter W; Pedersen, Anders G
2007-08-28
The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.
Optimal network alignment with graphlet degree vectors.
Milenković, Tijana; Ng, Weng Leong; Hayes, Wayne; Przulj, Natasa
2010-06-30
Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.
4. Aerial view of turnpike path running through center of ...
4. Aerial view of turnpike path running through center of photograph along row of trees. South edge of original alignment visible at left at cluster of white trailers. North edge of original alignment visible at right at the W.K. Smith house (HABS No. NY-6356-A) at the top right corner. Migel mansion visible on ridgetop at right-center of photograph, surrounded by trees. View looking west. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
Evolutionary inference via the Poisson Indel Process.
Bouchard-Côté, Alexandre; Jordan, Michael I
2013-01-22
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Evolutionary inference via the Poisson Indel Process
Bouchard-Côté, Alexandre; Jordan, Michael I.
2013-01-01
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114–124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments. PMID:23275296
NASA Astrophysics Data System (ADS)
Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.
2017-09-01
Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.
IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.
Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam
2015-01-01
IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix.
Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses.
Fouquier, Jennifer; Rideout, Jai Ram; Bolyen, Evan; Chase, John; Shiffer, Arron; McDonald, Daniel; Knight, Rob; Caporaso, J Gregory; Kelley, Scott T
2016-02-24
Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child. We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes. The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees. ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree .
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity
Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D
2006-01-01
Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
EURRECA: development of tools to improve the alignment of micronutrient recommendations.
Matthys, C; Bucchini, L; Busstra, M C; Cavelaars, A E J M; Eleftheriou, P; Garcia-Alvarez, A; Fairweather-Tait, S; Gurinović, M; van Ommen, B; Contor, L
2010-11-01
Approaches through which reference values for micronutrients are derived, as well as the reference values themselves, vary considerably across countries. Harmonisation is needed to improve nutrition policy and public health strategies. The EURRECA (EURopean micronutrient RECommendations Aligned, http://www.eurreca.org) Network of Excellence is developing generic tools for systematically establishing and updating micronutrient reference values or recommendations. Different types of instruments (including best practice guidelines, interlinked web pages, online databases and decision trees) have been identified. The first set of instruments is for training purposes and includes mainly interactive digital learning materials. The second set of instruments comprises collection and interlinkage of diverse information sources that have widely varying contents and purposes. In general, these sources are collections of existing information. The purpose of the majority of these information sources is to provide guidance on best practice for use in a wider scientific community or for users and stakeholders of reference values. The third set of instruments includes decision trees and frameworks. The purpose of these tools is to guide non-scientists in decision making based on scientific evidence. This platform of instruments will, in particular in Central and Eastern European countries, contribute to future capacity-building development in nutrition. The use of these tools by the scientific community, the European Food Safety Authority, bodies responsible for setting national nutrient requirements and others should ultimately help to align nutrient-based recommendations across Europe. Therefore, EURRECA can contribute towards nutrition policy development and public health strategies.
Robinson, Mark D; De Souza, David P; Keen, Woon Wai; Saunders, Eleanor C; McConville, Malcolm J; Speed, Terence P; Likić, Vladimir A
2007-10-29
Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological samples. When multiple samples are profiled, including replicates of the same sample and/or different sample states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies. A new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated; (2) the guide tree is built based on the similarity between the peak lists; (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different sample states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites. We propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple sample states are analyzed.
2016-01-01
Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919
Holovachov, Oleksandr
2016-01-01
Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.
Dang, Cuong Cao; Lefort, Vincent; Le, Vinh Sy; Le, Quang Si; Gascuel, Olivier
2011-10-01
Amino acid replacement rate matrices are an essential basis of protein studies (e.g. in phylogenetics and alignment). A number of general purpose matrices have been proposed (e.g. JTT, WAG, LG) since the seminal work of Margaret Dayhoff and co-workers. However, it has been shown that matrices specific to certain protein groups (e.g. mitochondrial) or life domains (e.g. viruses) differ significantly from general average matrices, and thus perform better when applied to the data to which they are dedicated. This Web server implements the maximum-likelihood estimation procedure that was used to estimate LG, and provides a number of tools and facilities. Users upload a set of multiple protein alignments from their domain of interest and receive the resulting matrix by email, along with statistics and comparisons with other matrices. A non-parametric bootstrap is performed optionally to assess the variability of replacement rate estimates. Maximum-likelihood trees, inferred using the estimated rate matrix, are also computed optionally for each input alignment. Finely tuned procedures and up-to-date ML software (PhyML 3.0, XRATE) are combined to perform all these heavy calculations on our clusters. http://www.atgc-montpellier.fr/ReplacementMatrix/ olivier.gascuel@lirmm.fr Supplementary data are available at http://www.atgc-montpellier.fr/ReplacementMatrix/
Basal jawed vertebrate phylogeny inferred from multiple nuclear DNA-coded genes
Kikugawa, Kanae; Katoh, Kazutaka; Kuraku, Shigehiro; Sakurai, Hiroshi; Ishida, Osamu; Iwabe, Naoyuki; Miyata, Takashi
2004-01-01
Background Phylogenetic analyses of jawed vertebrates based on mitochondrial sequences often result in confusing inferences which are obviously inconsistent with generally accepted trees. In particular, in a hypothesis by Rasmussen and Arnason based on mitochondrial trees, cartilaginous fishes have a terminal position in a paraphyletic cluster of bony fishes. No previous analysis based on nuclear DNA-coded genes could significantly reject the mitochondrial trees of jawed vertebrates. Results We have cloned and sequenced seven nuclear DNA-coded genes from 13 vertebrate species. These sequences, together with sequences available from databases including 13 jawed vertebrates from eight major groups (cartilaginous fishes, bichir, chondrosteans, gar, bowfin, teleost fishes, lungfishes and tetrapods) and an outgroup (a cyclostome and a lancelet), have been subjected to phylogenetic analyses based on the maximum likelihood method. Conclusion Cartilaginous fishes have been inferred to be basal to other jawed vertebrates, which is consistent with the generally accepted view. The minimum log-likelihood difference between the maximum likelihood tree and trees not supporting the basal position of cartilaginous fishes is 18.3 ± 13.1. The hypothesis by Rasmussen and Arnason has been significantly rejected with the minimum log-likelihood difference of 123 ± 23.3. Our tree has also shown that living holosteans, comprising bowfin and gar, form a monophyletic group which is the sister group to teleost fishes. This is consistent with a formerly prevalent view of vertebrate classification, although inconsistent with both of the current morphology-based and mitochondrial sequence-based trees. Furthermore, the bichir has been shown to be the basal ray-finned fish. Tetrapods and lungfish have formed a monophyletic cluster in the tree inferred from the concatenated alignment, being consistent with the currently prevalent view. It also remains possible that tetrapods are more closely related to ray-finned fishes than to lungfishes. PMID:15070407
Collins, Kodi; Warnow, Tandy
2018-06-19
PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign, and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. Supplementary data are available at Bioinformatics online.
Chen, Jonathan S.; Reddy, Vamsee; Chen, Joshua H.; Shlykov, Maksim A.; Zheng, Wei Hao; Cho, Jaehoon; Yen, Ming Ren; Saier, Milton H.
2012-01-01
Transport proteins function in the translocation of ions, solutes and macromolecules across cellular and organellar membranes. These integral membrane proteins fall into >600 families as tabulated in the Transporter Classification Database (www.tcdb.org). Recent studies, some of which are reported here, define distant phylogenetic relationships between families with the creation of superfamilies. Several of these are analyzed using a novel set of programs designed to allow reliable prediction of phylogenetic trees when sequence divergence is too great to allow the use of multiple alignments. These new programs, called SuperfamilyTree1 and 2 (SFT1 and 2), allow display of protein and family relationships, respectively, based on thousands of comparative BLAST scores rather than multiple alignments. Superfamilies analyzed include: (1) Aerolysins, (2) RTX Toxins, (3) Defensins, (4) Ion Transporters, (5) Bile/Arsenite/Riboflavin Transporters, (6) Cation: Proton Antiporters, and (7) the Glucose/Fructose/Lactose superfamily within the prokaryotic phosphoenol pyruvate-dependent Phosphotransferase System. In addition to defining the phylogenetic relationships of the proteins and families within these seven superfamilies, evidence is provided showing that the SFT programs outperform programs that are based on multiple alignments whenever sequence divergence of superfamily members is extensive. The SFT programs should be applicable to virtually any superfamily of proteins or nucleic acids. PMID:22286036
Zhou, Xiaofan; Shen, Xing-Xing; Hittinger, Chris Todd
2018-01-01
Abstract The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses. PMID:29177474
King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach
2014-01-01
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Evolutionary profiles from the QR factorization of multiple sequence alignments
Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida
2005-01-01
We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270
Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment
2011-01-01
Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510
Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka
2013-07-01
We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology.
Kuraku, Shigehiro; Zmasek, Christian M.; Nishimura, Osamu; Katoh, Kazutaka
2013-01-01
We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614
Bellerophon: A program to detect chimeric sequences in multiple sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip
2003-12-23
Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.
Tree decomposition based fast search of RNA structures including pseudoknots in genomes.
Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming
2005-01-01
Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.
A new fast method for inferring multiple consensus trees using k-medoids.
Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir
2018-04-05
Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.
Parks, Matthew B; Wickett, Norman J; Alverson, Andrew J
2018-01-01
Abstract Diatoms (Bacillariophyta) are a species-rich group of eukaryotic microbes diverse in morphology, ecology, and metabolism. Previous reconstructions of the diatom phylogeny based on one or a few genes have resulted in inconsistent resolution or low support for critical nodes. We applied phylogenetic paralog pruning techniques to a data set of 94 diatom genomes and transcriptomes to infer perennially difficult species relationships, using concatenation and summary-coalescent methods to reconstruct species trees from data sets spanning a wide range of thresholds for taxon and column occupancy in gene alignments. Conflicts between gene and species trees decreased with both increasing taxon occupancy and bootstrap cutoffs applied to gene trees. Concordance between gene and species trees was lowest for short internodes and increased logarithmically with increasing edge length, suggesting that incomplete lineage sorting disproportionately affects species tree inference at short internodes, which are a common feature of the diatom phylogeny. Although species tree topologies were largely consistent across many data treatments, concatenation methods appeared to outperform summary-coalescent methods for sparse alignments. Our results underscore that approaches to species-tree inference based on few loci are likely to be misled by unrepresentative sampling of gene histories, particularly in lineages that may have diversified rapidly. In addition, phylogenomic studies of diatoms, and potentially other hyperdiverse groups, should maximize the number of gene trees with high taxon occupancy, though there is clearly a limit to how many of these genes will be available. PMID:29040712
NASA Astrophysics Data System (ADS)
Catley, Kefyn M.; Phillips, Brenda C.; Novick, Laura R.
2013-12-01
The biological community is currently undertaking one its greatest scientific endeavours, that of constructing the Tree of Life, a phylogeny intended to be an evidenced-based, predictive road map of evolutionary relationships among Earth's biota. Unfortunately, we know very little about how such diagrams are understood, interpreted, or used as inferential tools by students—collectively referred to as tree thinking. The present study provides the first in-depth look at US high school students' competence at tree thinking and reports how they engage cognitively with tree representations as a precursor to developing curricula that will provide an entry point into macroevolution. Sixty tenth graders completed a 12-question instrument that assessed five basic tree-thinking skills. We present data that show patterns of misunderstandings are largely congruent between tenth graders and undergraduates and identify competences that are pivotal to address during instruction. Two general principles that emerge from this study are: (a) Students need to be taught that cladograms are an authoritative source of evidence that should be weighted more than other superficial or ecological similarities; (b) students need to understand the vital importance and critical difference between most recent common ancestry and common ancestry. Further, we show how the objectives of this study are closely aligned with US and International Standards and argue that scientifically-literate citizens need at least a basic understanding of the science behind the Tree of Life to understand and engage in twenty-first century societal issues such as human health, agriculture, and biotechnology.
Visual exploration of parameter influence on phylogenetic trees.
Hess, Martin; Bremm, Sebastian; Weissgraeber, Stephanie; Hamacher, Kay; Goesele, Michael; Wiemeyer, Josef; von Landesberger, Tatiana
2014-01-01
Evolutionary relationships between organisms are frequently derived as phylogenetic trees inferred from multiple sequence alignments (MSAs). The MSA parameter space is exponentially large, so tens of thousands of potential trees can emerge for each dataset. A proposed visual-analytics approach can reveal the parameters' impact on the trees. Given input trees created with different parameter settings, it hierarchically clusters the trees according to their structural similarity. The most important clusters of similar trees are shown together with their parameters. This view offers interactive parameter exploration and automatic identification of relevant parameters. Biologists applied this approach to real data of 16S ribosomal RNA and protein sequences of ion channels. It revealed which parameters affected the tree structures. This led to a more reliable selection of the best trees.
7. ALIGNMENT OF ABANDONED COULTERVILLE ROAD IN FORESTA AT FALLEN ...
7. ALIGNMENT OF ABANDONED COULTERVILLE ROAD IN FORESTA AT FALLEN TREE IN CENTER REAR. FOREGROUND MARKS TURN OF NEW ROAD FROM FORESTA TO HIGHWAY 120. LOOKING E. GIS: N-37 42 16.6 / W-119 44 00.3 - Coulterville Road, Between Foresta & All-Weather Highway, Yosemite Village, Mariposa County, CA
Towards improving searches for optimal phylogenies.
Ford, Eric; St John, Katherine; Wheeler, Ward C
2015-01-01
Finding the optimal evolutionary history for a set of taxa is a challenging computational problem, even when restricting possible solutions to be "tree-like" and focusing on the maximum-parsimony optimality criterion. This has led to much work on using heuristic tree searches to find approximate solutions. We present an approach for finding exact optimal solutions that employs and complements the current heuristic methods for finding optimal trees. Given a set of taxa and a set of aligned sequences of characters, there may be subsets of characters that are compatible, and for each such subset there is an associated (possibly partially resolved) phylogeny with edges corresponding to each character state change. These perfect phylogenies serve as anchor trees for our constrained search space. We show that, for sequences with compatible sites, the parsimony score of any tree [Formula: see text] is at least the parsimony score of the anchor trees plus the number of inferred changes between [Formula: see text] and the anchor trees. As the maximum-parsimony optimality score is additive, the sum of the lower bounds on compatible character partitions provides a lower bound on the complete alignment of characters. This yields a region in the space of trees within which the best tree is guaranteed to be found; limiting the search for the optimal tree to this region can significantly reduce the number of trees that must be examined in a search of the space of trees. We analyze this method empirically using four different biological data sets as well as surveying 400 data sets from the TreeBASE repository, demonstrating the effectiveness of our technique in reducing the number of steps in exact heuristic searches for trees under the maximum-parsimony optimality criterion. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Mallatt, Jon; Craig, Catherine Waggoner; Yoder, Matthew J
2010-04-01
This study (1) uses nearly complete rRNA-gene sequences from across Metazoa (197 taxa) to reconstruct animal phylogeny; (2) presents a highly annotated, manual alignment of these sequences with special reference to rRNA features including paired sites (http://purl.oclc.org/NET/rRNA/Metazoan_alignment) and (3) tests, after eliminating as few disruptive, rogue sequences as possible, if a likelihood framework can recover the main metazoan clades. We found that systematic elimination of approximately 6% of the sequences, including the divergent or unstably placed sequences of cephalopods, arrowworm, symphylan and pauropod myriapods, and of myzostomid and nemertodermatid worms, led to a tree that supported Ecdysozoa, Lophotrochozoa, Protostomia, and Bilateria. Deuterostomia, however, was never recovered, because the rRNA of urochordates goes (nonsignificantly) near the base of the Bilateria. Counterintuitively, when we modeled the evolution of the paired sites, phylogenetic resolution was not increased over traditional tree-building models that assume all sites in rRNA evolve independently. The rRNA genes of non-bilaterians contain a higher % AT than do those of most bilaterians. The rRNA genes of Acoela and Myzostomida were found to be secondarily shortened, AT-enriched, and highly modified, throwing some doubt on the location of these worms at the base of Bilateria in the rRNA tree--especially myzostomids, which other evidence suggests are annelids instead. Other findings are marsupial-with-placental mammals, arrowworms in Ecdysozoa (well supported here but contradicted by morphology), and Placozoa as sister to Cnidaria. Finally, despite the difficulties, the rRNA-gene trees are in strong concordance with trees derived from multiple protein-coding genes in supporting the new animal phylogeny. (c) 2009 Elsevier Inc. All rights reserved.
Recapitulating phylogenies using k-mers: from trees to networks.
Bernard, Guillaume; Ragan, Mark A; Chan, Cheong Xin
2016-01-01
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k -mers (subsequences at fixed length k ). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k -mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
Sequence similarities and evolutionary relationships of microbial, plant and animal alpha-amylases.
Janecek, S
1994-09-01
Amino acid sequence comparison of 37 alpha-amylases from microbial, plant and animal sources was performed to identify their mutual sequence similarities in addition to the five already described conserved regions. These sequence regions were examined from structure/function and evolutionary perspectives. An unrooted evolutionary tree of alpha-amylases was constructed on a subset of 55 residues from the alignment of sequence similarities along with conserved regions. The most important new information extracted from the tree was as follows: (a) the close evolutionary relationship of Alteromonas haloplanctis alpha-amylase (thermolabile enzyme from an antarctic psychrotroph) with the already known group of homologous alpha-amylases from streptomycetes, Thermomonospora curvata, insects and mammals, and (b) the remarkable 40.1% identity between starch-saccharifying Bacillus subtilis alpha-amylase and the enzyme from the ruminal bacterium Butyrivibrio fibrisolvens, an alpha-amylase with an unusually large polypeptide chain (943 residues in the mature enzyme). Due to a very high degree of similarity, the whole amino acid sequences of three groups of alpha-amylases, namely (a) fungi and yeasts, (b) plants, and (c) A. haloplanctis, streptomycetes, T. curvata, insects and mammals, were aligned independently and their unrooted distance trees were calculated using these alignments. Possible rooting of the trees was also discussed. Based on the knowledge of the location of the five disulfide bonds in the structure of pig pancreatic alpha-amylase, the possible disulfide bridges were established for each of these groups of homologous alpha-amylases.
Is multiple-sequence alignment required for accurate inference of phylogeny?
Höhl, Michael; Ragan, Mark A
2007-04-01
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.
Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin
2018-05-03
The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Danjon, Frédéric; Khuder, Hayfa; Stokes, Alexia
2013-01-01
This study aims at assessing the influence of slope angle and multi-directional flexing and their interaction on the root architecture of Robinia pseudoacacia seedlings, with a particular focus on architectural model and trait plasticity. 36 trees were grown from seed in containers inclined at 0° (control) or 45° (slope) in a glasshouse. The shoots of half the plants were gently flexed for 5 minutes a day. After 6 months, root systems were excavated and digitized in 3D, and biomass measured. Over 100 root architectural traits were determined. Both slope and flexing increased significantly plant size. Non-flexed trees on 45° slopes developed shallow roots which were largely aligned perpendicular to the slope. Compared to the controls, flexed trees on 0° slopes possessed a shorter and thicker taproot held in place by regularly distributed long and thin lateral roots. Flexed trees on the 45° slope also developed a thick vertically aligned taproot, with more volume allocated to upslope surface lateral roots, due to the greater soil volume uphill. We show that there is an inherent root system architectural model, but that a certain number of traits are highly plastic. This plasticity will permit root architectural design to be modified depending on external mechanical signals perceived by young trees. PMID:24386227
Huang, Qi; Nie, Binbin; Ma, Chen; Wang, Jing; Zhang, Tianhao; Duan, Shaofeng; Wu, Shang; Liang, Shengxiang; Li, Panlong; Liu, Hua; Sun, Hua; Zhou, Jiangning; Xu, Lin; Shan, Baoci
2018-01-01
Tree shrews are proposed as an alternative animal model to nonhuman primates due to their close affinity to primates. Neuroimaging techniques are widely used to study brain functions and structures of humans and animals. However, tree shrews are rarely applied in neuroimaging field partly due to the lack of available species specific analysis methods. In this study, 10 PET/CT and 10 MRI images of tree shrew brain were used to construct PET and MRI templates; based on histological atlas we reconstructed a three-dimensional digital atlas with 628 structures delineated; then the digital atlas and templates were aligned into a stereotaxic space. Finally, we integrated the digital atlas and templates into a toolbox for tree shrew brain spatial normalization, statistical analysis and results localization. We validated the feasibility of the toolbox by simulated data with lesions in laterodorsal thalamic nucleus (LD). The lesion volumes of simulated PET and MRI images were (12.97±3.91)mm 3 and (7.04±0.84)mm 3 . Statistical results at p<0.005 showed the lesion volumes of PET and MRI were 13.18mm 3 and 8.06mm 3 in LD. To our knowledge, we report the first PET template and digital atlas of tree shrew brain. Compared to the existing MRI templates, our MRI template was aligned into stereotaxic space. And the toolbox is the first software dedicated for tree shrew brain analysis. The templates and digital atlas of tree shrew brain, as well as the toolbox, facilitate the use of tree shrews in neuroimaging field. Copyright © 2017 Elsevier B.V. All rights reserved.
Fischer, Christiane; Daniel, Rolf; Wubet, Tesfaye
2012-01-01
The ribosomal DNA comprised of the ITS1-5.8S-ITS2 regions is widely used as a fungal marker in molecular ecology and systematics but cannot be aligned with confidence across genetically distant taxa. In order to study the diversity of Agaricomycotina in forest soils, we designed primers targeting the more alignable 28S (LSU) gene, which should be more useful for phylogenetic analyses of the detected taxa. This paper compares the performance of the established ITS1F/4B primer pair, which targets basidiomycetes, to that of two new pairs. Key factors in the comparison were the diversity covered, off-target amplification, rarefaction at different Operational Taxonomic Unit (OTU) cutoff levels, sensitivity of the method used to process the alignment to missing data and insecure positional homology, and the congruence of monophyletic clades with OTU assignments and BLAST-derived OTU names. The ITS primer pair yielded no off-target amplification but also exhibited the least fidelity to the expected phylogenetic groups. The LSU primers give complementary pictures of diversity, but were more sensitive to modifications of the alignment such as the removal of difficult-to align stretches. The LSU primers also yielded greater numbers of singletons but also had a greater tendency to produce OTUs containing sequences from a wider variety of species as judged by BLAST similarity. We introduced some new parameters to describe alignment heterogeneity based on Shannon entropy and the extent and contents of the OTUs in a phylogenetic tree space. Our results suggest that ITS should not be used when calculating phylogenetic trees from genetically distant sequences obtained from environmental DNA extractions and that it is inadvisable to define OTUs on the basis of very heterogeneous alignments. PMID:22363808
An efficient compression scheme for bitmap indices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie
2004-04-13
When using an out-of-core indexing method to answer a query, it is generally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reducing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap codemore » (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps is proportional to the total size of the bitmaps involved. The total size of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time is proportional to the index size. This indicates that the compressed bitmap indices are efficient for very large datasets.« less
1. Aerial view of turnpike path running diagonally up from ...
1. Aerial view of turnpike path running diagonally up from lower left (present-day Orange Turnpike alignment) and containing on towards upper right through tree clump in center of the bare spot on the landscape, and on through the trees. View looking south. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis
2016-09-02
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
2010-01-01
Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size. PMID:20565983
Bittencourt, Natalia F N; Ocarino, Juliana M; Mendonça, Luciana D M; Hewett, Timothy E; Fonseca, Sergio T
2012-12-01
Cross-sectional. To investigate predictors of increased frontal plane knee projection angle (FPKPA) in athletes. The underlying mechanisms that lead to increased FPKPA are likely multifactorial and depend on how the musculoskeletal system adapts to the possible interactions between its distal and proximal segments. Bivariate and linear analyses traditionally employed to analyze the occurrence of increased FPKPA are not sufficiently robust to capture complex relationships among predictors. The investigation of nonlinear interactions among biomechanical factors is necessary to further our understanding of the interdependence of lower-limb segments and resultant dynamic knee alignment. The FPKPA was assessed in 101 athletes during a single-leg squat and in 72 athletes at the moment of landing from a jump. The investigated predictors were sex, hip abductor isometric torque, passive range of motion (ROM) of hip internal rotation (IR), and shank-forefoot alignment. Classification and regression trees were used to investigate nonlinear interactions among predictors and their influence on the occurrence of increased FPKPA. During single-leg squatting, the occurrence of high FPKPA was predicted by the interaction between hip abductor isometric torque and passive hip IR ROM. At the moment of landing, the shank-forefoot alignment, abductor isometric torque, and passive hip IR ROM were predictors of high FPKPA. In addition, the classification and regression trees established cutoff points that could be used in clinical practice to identify athletes who are at potential risk for excessive FPKPA. The models captured nonlinear interactions between hip abductor isometric torque, passive hip IR ROM, and shank-forefoot alignment.
Verdant: automated annotation, alignment and phylogenetic analysis of whole chloroplast genomes.
McKain, Michael R; Hartsock, Ryan H; Wohl, Molly M; Kellogg, Elizabeth A
2017-01-01
Chloroplast genomes are now produced in the hundreds for angiosperm phylogenetics projects, but current methods for annotation, alignment and tree estimation still require some manual intervention reducing throughput and increasing analysis time for large chloroplast systematics projects. Verdant is a web-based software suite and database built to take advantage a novel annotation program, annoBTD. Using annoBTD, Verdant provides accurate annotation of chloroplast genomes without manual intervention. Subsequent alignment and tree estimation can incorporate newly annotated and publically available plastomes and can accommodate a large number of taxa. Verdant sharply reduces the time required for analysis of assembled chloroplast genomes and removes the need for pipelines and software on personal hardware. Verdant is available at: http://verdant.iplantcollaborative.org/plastidDB/ It is implemented in PHP, Perl, MySQL, Javascript, HTML and CSS with all major browsers supported. mrmckain@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Phyx: phylogenetic tools for unix.
Brown, Joseph W; Walker, Joseph F; Smith, Stephen A
2017-06-15
The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx : a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx. eebsmith@umich.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Phyx: phylogenetic tools for unix
Brown, Joseph W.; Walker, Joseph F.; Smith, Stephen A.
2017-01-01
Abstract Summary: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx: a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. Availability and Implementation: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx Contact: eebsmith@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28174903
MICA: Multiple interval-based curve alignment
NASA Astrophysics Data System (ADS)
Mann, Martin; Kahle, Hans-Peter; Beck, Matthias; Bender, Bela Johannes; Spiecker, Heinrich; Backofen, Rolf
2018-01-01
MICA enables the automatic synchronization of discrete data curves. To this end, characteristic points of the curves' shapes are identified. These landmarks are used within a heuristic curve registration approach to align profile pairs by mapping similar characteristics onto each other. In combination with a progressive alignment scheme, this enables the computation of multiple curve alignments. Multiple curve alignments are needed to derive meaningful representative consensus data of measured time or data series. MICA was already successfully applied to generate representative profiles of tree growth data based on intra-annual wood density profiles or cell formation data. The MICA package provides a command-line and graphical user interface. The R interface enables the direct embedding of multiple curve alignment computation into larger analyses pipelines. Source code, binaries and documentation are freely available at https://github.com/BackofenLab/MICA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng
2007-08-02
An index in a database system is a data structure that utilizes redundant information about the base data to speed up common searching and retrieval operations. Most commonly used indexes are variants of B-trees, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes call compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations by sacrificing the efficiency of updating the indexes after the modification of an individual record. In addition to the well-known strengths of bitmap indexes, FastBit has a special strength stemming from the bitmap compression scheme used. Themore » compression method is called the Word-Aligned Hybrid (WAH) code. It reduces the bitmap indexes to reasonable sizes and at the same time allows very efficient bitwise logical operations directly on the compressed bitmaps. Compared with the well-known compression methods such as LZ77 and Byte-aligned Bitmap code (BBC), WAH sacrifices some space efficiency for a significant improvement in operational efficiency. Since the bitwise logical operations are the most important operations needed to answer queries, using WAH compression has been shown to answer queries significantly faster than using other compression schemes. Theoretical analyses showed that WAH compressed bitmap indexes are optimal for one-dimensional range queries. Only the most efficient indexing schemes such as B+-tree and B*-tree have this optimality property. However, bitmap indexes are superior because they can efficiently answer multi-dimensional range queries by combining the answers to one-dimensional queries.« less
Lartillot, Nicolas; Brinkmann, Henner; Philippe, Hervé
2007-01-01
Background Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions. Methods We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation. Results Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences. Conclusion The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees. PMID:17288577
Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.
Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari
2016-04-01
Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Alignment-free genome tree inference by learning group-specific distance metrics.
Patil, Kaustubh R; McHardy, Alice C
2013-01-01
Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.
Treangen, Todd J; Ondov, Brian D; Koren, Sergey; Phillippy, Adam M
2014-01-01
Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.
Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.
Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip
2004-09-22
Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl
MISTICA: Minimum Spanning Tree-based Coarse Image Alignment for Microscopy Image Sequences
Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T.
2016-01-01
Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to re-order the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries. PMID:26415193
MISTICA: Minimum Spanning Tree-Based Coarse Image Alignment for Microscopy Image Sequences.
Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T
2016-11-01
Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to reorder the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by the way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries.
Alignment-free inference of hierarchical and reticulate phylogenomic relationships.
Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-Ban; Chua, Xin-Yi; Cong, Yingnan; Hogan, James M; Maetschke, Stefan R; Ragan, Mark A
2017-06-30
We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed. © The Author 2017. Published by Oxford University Press.
A generalized global alignment algorithm.
Huang, Xiaoqiu; Chao, Kun-Mao
2003-01-22
Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
2010-01-01
Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.
Thomas, Paul D
2010-06-09
Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
Simrank: Rapid and sensitive general-purpose k-mer search tool
2011-01-01
Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. PMID:21524302
Yang, Cheng-Hong; Wu, Kuo-Chuan; Chuang, Li-Yeh; Chang, Hsueh-Wei
2018-01-01
DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a r ibulose diphosphate carboxylase ( rbcL ) S NP b arcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris
2011-10-20
Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
MISFITS: evaluating the goodness of fit between a phylogenetic model and an alignment.
Nguyen, Minh Anh Thi; Klaere, Steffen; von Haeseler, Arndt
2011-01-01
As models of sequence evolution become more and more complicated, many criteria for model selection have been proposed, and tools are available to select the best model for an alignment under a particular criterion. However, in many instances the selected model fails to explain the data adequately as reflected by large deviations between observed pattern frequencies and the corresponding expectation. We present MISFITS, an approach to evaluate the goodness of fit (http://www.cibiv.at/software/misfits). MISFITS introduces a minimum number of "extra substitutions" on the inferred tree to provide a biologically motivated explanation why the alignment may deviate from expectation. These extra substitutions plus the evolutionary model then fully explain the alignment. We illustrate the method on several examples and then give a survey about the goodness of fit of the selected models to the alignments in the PANDIT database.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Edgar, Robert C
2004-01-01
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
Chen, Yikai; Wang, Kai; Xu, Chengcheng; Shi, Qin; He, Jie; Li, Peiqing; Shi, Ting
2018-05-19
To overcome the limitations of previous highway alignment safety evaluation methods, this article presents a highway alignment safety evaluation method based on fault tree analysis (FTA) and the characteristics of vehicle safety boundaries, within the framework of dynamic modeling of the driver-vehicle-road system. Approaches for categorizing the vehicle failure modes while driving on highways and the corresponding safety boundaries were comprehensively investigated based on vehicle system dynamics theory. Then, an overall crash probability model was formulated based on FTA considering the risks of 3 failure modes: losing steering capability, losing track-holding capability, and rear-end collision. The proposed method was implemented on a highway segment between Bengbu and Nanjing in China. A driver-vehicle-road multibody dynamics model was developed based on the 3D alignments of the Bengbu to Nanjing section of Ning-Luo expressway using Carsim, and the dynamics indices, such as sideslip angle and, yaw rate were obtained. Then, the average crash probability of each road section was calculated with a fixed-length method. Finally, the average crash probability was validated against the crash frequency per kilometer to demonstrate the accuracy of the proposed method. The results of the regression analysis and correlation analysis indicated good consistency between the results of the safety evaluation and the crash data and that it outperformed the safety evaluation methods used in previous studies. The proposed method has the potential to be used in practical engineering applications to identify crash-prone locations and alignment deficiencies on highways in the planning and design phases, as well as those in service.
Treetrimmer: a method for phylogenetic dataset size reduction.
Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M
2013-04-12
With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.
Vinuesa, Pablo; Ochoa-Sánchez, Luz E; Contreras-Moreira, Bruno
2018-01-01
The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia , demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at https://github.com/vinuesa/get_phylomarkers. A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at https://hub.docker.com/r/csicunam/get_homologues/, which can be easily run on any platform.
Estimation of relative effectiveness of phylogenetic programs by machine learning.
Krivozubov, Mikhail; Goebels, Florian; Spirin, Sergei
2014-04-01
Reconstruction of phylogeny of a protein family from a sequence alignment can produce results of different quality. Our goal is to predict the quality of phylogeny reconstruction basing on features that can be extracted from the input alignment. We used Fitch-Margoliash (FM) method of phylogeny reconstruction and random forest as a predictor. For training and testing the predictor, alignments of orthologous series (OS) were used, for which the result of phylogeny reconstruction can be evaluated by comparison with trees of corresponding organisms. Our results show that the quality of phylogeny reconstruction can be predicted with more than 80% precision. Also, we tried to predict which phylogeny reconstruction method, FM or UPGMA, is better for a particular alignment. With the used set of features, among alignments for which the obtained predictor predicts a better performance of UPGMA, 56% really give a better result with UPGMA. Taking into account that in our testing set only for 34% alignments UPGMA performs better, this result shows a principal possibility to predict the better phylogeny reconstruction method basing on features of a sequence alignment.
Carreno, R A; Barta, J R
1998-11-01
The small subunit ribosomal RNA (SSU rRNA) genes of hippoboscid (Ornithoica vicina Walker) and tabanid (Chrysops niger Macquart) Diptera were sequenced to determine their phylogenetic position within the order and to determine whether or not extensive hypervariable regions in this gene are widespread in the Diptera. A parsimony analysis of an alignment containing 8 dipteran sequences produced a single most parsimonious tree that placed O. vicina as sister group to Drosophila melanogaster Meigen. The tabanid Chrysops niger was sister group to the asilomorphan taxa, and the sister group to the Brachycera was a Tipula sp. although this relationship was not supported by bootstrap analysis. The hippoboscid and tabanid sequences contain extensive hypervariable regions in the V2, V4, V6, and V7 regions as do other Diptera. When these regions of the alignment were excluded from the phylogenetic analysis, a single most parsimonious tree was found. This tree had an identical overall topology to the tree obtained from the total data set. The hypervariable regions in parts of the dipteran SSU rRNA genes were more extensive in the nematocerous dipteran sequences used in this study than in the other dipteran representatives; these hypervariable regions may be of more utility in inferring relationship among species and subspecies than at the suprageneric level.
Self-aligning and compressed autosophy video databases
NASA Astrophysics Data System (ADS)
Holtz, Klaus E.
1993-04-01
Autosophy, an emerging new science, explains `self-assembling structures,' such as crystals or living trees, in mathematical terms. This research provides a new mathematical theory of `learning' and a new `information theory' which permits the growing of self-assembling data network in a computer memory similar to the growing of `data crystals' or `data trees' without data processing or programming. Autosophy databases are educated very much like a human child to organize their own internal data storage. Input patterns, such as written questions or images, are converted to points in a mathematical omni dimensional hyperspace. The input patterns are then associated with output patterns, such as written answers or images. Omni dimensional information storage will result in enormous data compression because each pattern fragment is only stored once. Pattern recognition in the text or image files is greatly simplified by the peculiar omni dimensional storage method. Video databases will absorb input images from a TV camera and associate them with textual information. The `black box' operations are totally self-aligning where the input data will determine their own hyperspace storage locations. Self-aligning autosophy databases may lead to a new generation of brain-like devices.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data.
da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos
2013-12-01
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data
da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos
2013-01-01
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree. PMID:24385862
Yuri, Tamaki; Kimball, Rebecca T.; Harshman, John; Bowie, Rauri C. K.; Braun, Michael J.; Chojnowski, Jena L.; Han, Kin-Lan; Hackett, Shannon J.; Huddleston, Christopher J.; Moore, William S.; Reddy, Sushma; Sheldon, Frederick H.; Steadman, David W.; Witt, Christopher C.; Braun, Edward L.
2013-01-01
Insertion/deletion (indel) mutations, which are represented by gaps in multiple sequence alignments, have been used to examine phylogenetic hypotheses for some time. However, most analyses combine gap data with the nucleotide sequences in which they are embedded, probably because most phylogenetic datasets include few gap characters. Here, we report analyses of 12,030 gap characters from an alignment of avian nuclear genes using maximum parsimony (MP) and a simple maximum likelihood (ML) framework. Both trees were similar, and they exhibited almost all of the strongly supported relationships in the nucleotide tree, although neither gap tree supported many relationships that have proven difficult to recover in previous studies. Moreover, independent lines of evidence typically corroborated the nucleotide topology instead of the gap topology when they disagreed, although the number of conflicting nodes with high bootstrap support was limited. Filtering to remove short indels did not substantially reduce homoplasy or reduce conflict. Combined analyses of nucleotides and gaps resulted in the nucleotide topology, but with increased support, suggesting that gap data may prove most useful when analyzed in combination with nucleotide substitutions. PMID:24832669
Billings, Seth D.; Boctor, Emad M.; Taylor, Russell H.
2015-01-01
We present a probabilistic registration algorithm that robustly solves the problem of rigid-body alignment between two shapes with high accuracy, by aptly modeling measurement noise in each shape, whether isotropic or anisotropic. For point-cloud shapes, the probabilistic framework additionally enables modeling locally-linear surface regions in the vicinity of each point to further improve registration accuracy. The proposed Iterative Most-Likely Point (IMLP) algorithm is formed as a variant of the popular Iterative Closest Point (ICP) algorithm, which iterates between point-correspondence and point-registration steps. IMLP’s probabilistic framework is used to incorporate a generalized noise model into both the correspondence and the registration phases of the algorithm, hence its name as a most-likely point method rather than a closest-point method. To efficiently compute the most-likely correspondences, we devise a novel search strategy based on a principal direction (PD)-tree search. We also propose a new approach to solve the generalized total-least-squares (GTLS) sub-problem of the registration phase, wherein the point correspondences are registered under a generalized noise model. Our GTLS approach has improved accuracy, efficiency, and stability compared to prior methods presented for this problem and offers a straightforward implementation using standard least squares. We evaluate the performance of IMLP relative to a large number of prior algorithms including ICP, a robust variant on ICP, Generalized ICP (GICP), and Coherent Point Drift (CPD), as well as drawing close comparison with the prior anisotropic registration methods of GTLS-ICP and A-ICP. The performance of IMLP is shown to be superior with respect to these algorithms over a wide range of noise conditions, outliers, and misalignments using both mesh and point-cloud representations of various shapes. PMID:25748700
NASA Astrophysics Data System (ADS)
Morris, K. J.; Herrera, S.; Gubili, C.; Tyler, P. A.; Rogers, A.; Hauton, C.
2012-12-01
Despite being an abundant group of significant ecological importance the phylogenetic relationships of the Octocorallia remain poorly understood and very much understudied. We used 1132 bp of two mitochondrial protein-coding genes, nad2 and mtMutS (previously referred to as msh1), to construct a phylogeny for 161 octocoral specimens from the Atlantic, including both Isididae and non-Isididae species. We found that four clades were supported using a concatenated alignment. Two of these (A and B) were in general agreement with the of Holaxonia-Alcyoniina and Anthomastus-Corallium clades identified by previous work. The third and fourth clades represent a split of the Calcaxonia-Pennatulacea clade resulting in a clade containing the Pennatulacea and a small number of Isididae specimens and a second clade containing the remaining Calcaxonia. When individual genes were considered nad2 largely agreed with previous work with MtMutS also producing a fourth clade corresponding to a split of Isididae species from the Calcaxonia-Pennatulacea clade. It is expected these difference are a consequence of the inclusion of Isisdae species that have undergone a gene inversion in the mtMutS gene causing their separation in the MtMutS only tree. The fourth clade in the concatenated tree is also suspected to be a result of this gene inversion, as there were very few Isidiae species included in previous work tree and thus this separation would not be clearly resolved. A~larger phylogeny including both Isididae and non Isididae species is required to further resolve these clades.
Frederickson, Megan E
2009-05-01
The evolutionary stability of mutualism is thought to depend on how well the fitness interests of partners are aligned. Because most ant-myrmecophyte mutualisms are persistent and horizontally transmitted, partners share an interest in growth but not in reproduction. Resources invested in reproduction are unavailable for growth, giving rise to a conflict of interest between partners. I investigated whether this explains why Allomerus octoarticulatus ants sterilize Cordia nodosa trees. Allomerus octoarticulatus nests in the hollow stem domatia of C. nodosa. Workers protect C. nodosa leaves against herbivores but destroy inflorescences. Using C. nodosa trees with Azteca ants, which do not sterilize their hosts, I cut inflorescences off trees to simulate sterilization by A. octoarticulatus. Sterilized C. nodosa grew faster than control trees, providing evidence for a trade-off between growth and reproduction. Allomerus octoarticulatus manipulates this trade-off to its advantage; sterilized trees produce more domatia and can house larger, more fecund colonies.
Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott
2015-01-01
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928
Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott
2015-01-01
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.
GASP: Gapped Ancestral Sequence Prediction for proteins
Edwards, Richard J; Shields, Denis C
2004-01-01
Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
Alignment methods: strategies, challenges, benchmarking, and comparative overview.
Löytynoja, Ari
2012-01-01
Comparative evolutionary analyses of molecular sequences are solely based on the identities and differences detected between homologous characters. Errors in this homology statement, that is errors in the alignment of the sequences, are likely to lead to errors in the downstream analyses. Sequence alignment and phylogenetic inference are tightly connected and many popular alignment programs use the phylogeny to divide the alignment problem into smaller tasks. They then neglect the phylogenetic tree, however, and produce alignments that are not evolutionarily meaningful. The use of phylogeny-aware methods reduces the error but the resulting alignments, with evolutionarily correct representation of homology, can challenge the existing practices and methods for viewing and visualising the sequences. The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Widely used alignment methods are based on heuristic algorithms and unlikely to find globally optimal solutions. The whole concept of one correct alignment for the sequences is questionable, however, as there typically exist vast numbers of alternative, roughly equally good alignments that should also be considered. This uncertainty is hidden by many popular alignment programs and is rarely correctly taken into account in the downstream analyses. The quest for finding and improving the alignment solution is complicated by the lack of suitable measures of alignment goodness. The difficulty of comparing alternative solutions also affects benchmarks of alignment methods and the results strongly depend on the measure used. As the effects of alignment error cannot be predicted, comparing the alignments' performance in downstream analyses is recommended.
Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.
Waddell, Peter J; Ota, Rissa; Penny, David
2009-10-01
Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.
Iterative pass optimization of sequence data
NASA Technical Reports Server (NTRS)
Wheeler, Ward C.
2003-01-01
The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Aligning Biomolecular Networks Using Modular Graph Kernels
NASA Astrophysics Data System (ADS)
Towfic, Fadi; Greenlee, M. Heather West; Honavar, Vasant
Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. We explore a class of algorithms for aligning large biomolecular networks by breaking down such networks into subgraphs and computing the alignment of the networks based on the alignment of their subgraphs. The resulting subnetworks are compared using graph kernels as scoring functions. We provide implementations of the resulting algorithms as part of BiNA, an open source biomolecular network alignment toolkit. Our experiments using Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus and Homo sapiens protein-protein interaction networks extracted from the DIP repository of protein-protein interaction data demonstrate that the performance of the proposed algorithms (as measured by % GO term enrichment of subnetworks identified by the alignment) is competitive with some of the state-of-the-art algorithms for pair-wise alignment of large protein-protein interaction networks. Our results also show that the inter-species similarity scores computed based on graph kernels can be used to cluster the species into a species tree that is consistent with the known phylogenetic relationships among the species.
Structural analysis of polarizing indels: an emerging consensus on the root of the tree of life
2009-01-01
Background The root of the tree of life has been a holy grail ever since Darwin first used the tree as a metaphor for evolution. New methods seek to narrow down the location of the root by excluding it from branches of the tree of life. This is done by finding traits that must be derived, and excluding the root from the taxa those traits cover. However the two most comprehensive attempts at this strategy, performed by Cavalier-Smith and Lake et al., have excluded each other's rootings. Results The indel polarizations of Lake et al. rely on high quality alignments between paralogs that diverged before the last universal common ancestor (LUCA). Therefore, sequence alignment artifacts may skew their conclusions. We have reviewed their data using protein structure information where available. Several of the conclusions are quite different when viewed in the light of structure which is conserved over longer evolutionary time scales than sequence. We argue there is no polarization that excludes the root from all Gram-negatives, and that polarizations robustly exclude the root from the Archaea. Conclusion We conclude that there is no contradiction between the polarization datasets. The combination of these datasets excludes the root from every possible position except near the Chloroflexi. Reviewers This article was reviewed by Greg Fournier (nominated by J. Peter Gogarten), Purificación López-García, and Eugene Koonin. PMID:19706177
29 CFR 780.216 - Nursery activities generally and Christmas tree production.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 29 Labor 3 2010-07-01 2010-07-01 false Nursery activities generally and Christmas tree production... Nursery activities generally and Christmas tree production. (a) The employees of a nursery who are engaged... fruit, nut, shade, vegetable, and ornamental plants or trees, and shrubs, vines, and flowers; (2...
CFD modelling of the aerodynamic effect of trees on urban air pollution dispersion.
Amorim, J H; Rodrigues, V; Tavares, R; Valente, J; Borrego, C
2013-09-01
The current work evaluates the impact of urban trees over the dispersion of carbon monoxide (CO) emitted by road traffic, due to the induced modification of the wind flow characteristics. With this purpose, the standard flow equations with a kε closure for turbulence were extended with the capability to account for the aerodynamic effect of trees over the wind field. Two CFD models were used for testing this numerical approach. Air quality simulations were conducted for two periods of 31h in selected areas of Lisbon and Aveiro, in Portugal, for distinct relative wind directions: approximately 45° and nearly parallel to the main avenue, respectively. The statistical evaluation of modelling performance and uncertainty revealed a significant improvement of results with trees, as shown by the reduction of the NMSE from 0.14 to 0.10 in Lisbon, and from 0.14 to 0.04 in Aveiro, which is independent from the CFD model applied. The consideration of the plant canopy allowed to fulfil the data quality objectives for ambient air quality modelling established by the Directive 2008/50/EC, with an important decrease of the maximum deviation between site measurements and CFD results. In the non-aligned wind situation an average 12% increase of the CO concentrations in the domain was observed as a response to the aerodynamic action of trees over the vertical exchange rates of polluted air with the above roof-level atmosphere; while for the aligned configuration an average 16% decrease was registered due to the enhanced ventilation of the street canyon. These results show that urban air quality can be optimised based on knowledge-based planning of green spaces. Copyright © 2013 Elsevier B.V. All rights reserved.
Dynamic programming algorithms for biological sequence comparison.
Pearson, W R; Miller, W
1992-01-01
Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.
NASA Astrophysics Data System (ADS)
Nakatani, Naoki; Chan, Garnet Kin-Lic
2013-04-01
We investigate tree tensor network states for quantum chemistry. Tree tensor network states represent one of the simplest generalizations of matrix product states and the density matrix renormalization group. While matrix product states encode a one-dimensional entanglement structure, tree tensor network states encode a tree entanglement structure, allowing for a more flexible description of general molecules. We describe an optimal tree tensor network state algorithm for quantum chemistry. We introduce the concept of half-renormalization which greatly improves the efficiency of the calculations. Using our efficient formulation we demonstrate the strengths and weaknesses of tree tensor network states versus matrix product states. We carry out benchmark calculations both on tree systems (hydrogen trees and π-conjugated dendrimers) as well as non-tree molecules (hydrogen chains, nitrogen dimer, and chromium dimer). In general, tree tensor network states require much fewer renormalized states to achieve the same accuracy as matrix product states. In non-tree molecules, whether this translates into a computational savings is system dependent, due to the higher prefactor and computational scaling associated with tree algorithms. In tree like molecules, tree network states are easily superior to matrix product states. As an illustration, our largest dendrimer calculation with tree tensor network states correlates 110 electrons in 110 active orbitals.
Hydrodynamic alignment and assembly of nanofibrils resulting in strong cellulose filaments
Håkansson, Karl M. O.; Fall, Andreas B.; Lundell, Fredrik; Yu, Shun; Krywka, Christina; Roth, Stephan V.; Santoro, Gonzalo; Kvick, Mathias; Prahl Wittberg, Lisa; Wågberg, Lars; Söderberg, L. Daniel
2014-01-01
Cellulose nanofibrils can be obtained from trees and have considerable potential as a building block for biobased materials. In order to achieve good properties of these materials, the nanostructure must be controlled. Here we present a process combining hydrodynamic alignment with a dispersion–gel transition that produces homogeneous and smooth filaments from a low-concentration dispersion of cellulose nanofibrils in water. The preferential fibril orientation along the filament direction can be controlled by the process parameters. The specific ultimate strength is considerably higher than previously reported filaments made of cellulose nanofibrils. The strength is even in line with the strongest cellulose pulp fibres extracted from wood with the same degree of fibril alignment. Successful nanoscale alignment before gelation demands a proper separation of the timescales involved. Somewhat surprisingly, the device must not be too small if this is to be achieved. PMID:24887005
Abaka, Gamze; Bıyıkoğlu, Türker; Erten, Cesim
2013-07-01
Given a pair of metabolic pathways, an alignment of the pathways corresponds to a mapping between similar substructures of the pair. Successful alignments may provide useful applications in phylogenetic tree reconstruction, drug design and overall may enhance our understanding of cellular metabolism. We consider the problem of providing one-to-many alignments of reactions in a pair of metabolic pathways. We first provide a constrained alignment framework applicable to the problem. We show that the constrained alignment problem even in a primitive setting is computationally intractable, which justifies efforts for designing efficient heuristics. We present our Constrained Alignment of Metabolic Pathways (CAMPways) algorithm designed for this purpose. Through extensive experiments involving a large pathway database, we demonstrate that when compared with a state-of-the-art alternative, the CAMPways algorithm provides better alignment results on metabolic networks as far as measures based on same-pathway inclusion and biochemical significance are concerned. The execution speed of our algorithm constitutes yet another important improvement over alternative algorithms. Open source codes, executable binary, useful scripts, all the experimental data and the results are freely available as part of the Supplementary Material at http://code.google.com/p/campways/. Supplementary data are available at Bioinformatics online.
A laid-back trip through the Hennigian Forests
2017-01-01
Background This paper is a comment on the idea of matrix-free Cladistics. Demonstration of this idea’s efficiency is a major goal of the study. Within the proposed framework, the ordinary (phenetic) matrix is necessary only as “source” of Hennigian trees, not as a primary subject of the analysis. Switching from the matrix-based thinking to the matrix-free Cladistic approach clearly reveals that optimizations of the character-state changes are related not to the real processes, but to the form of the data representation. Methods We focused our study on the binary data. We wrote the simple ruby-based script FORESTER version 1.0 that helps represent a binary matrix as an array of the rooted trees (as a “Hennigian forest”). The binary representations of the genomic (DNA) data have been made by script 1001. The Average Consensus method as well as the standard Maximum Parsimony (MP) approach has been used to analyze the data. Principle findings The binary matrix may be easily re-written as a set of rooted trees (maximal relationships). The latter might be analyzed by the Average Consensus method. Paradoxically, this method, if applied to the Hennigian forests, in principle can help to identify clades despite the absence of the direct evidence from the primary data. Our approach may handle the clock- or non clock-like matrices, as well as the hypothetical, molecular or morphological data. Discussion Our proposal clearly differs from the numerous phenetic alignment-free techniques of the construction of the phylogenetic trees. Dealing with the relations, not with the actual “data” also distinguishes our approach from all optimization-based methods, if the optimization is defined as a way to reconstruct the sequences of the character-state changes on a tree, either the standard alignment-based techniques or the “direct” alignment-free procedure. We are not viewing our recent framework as an alternative to the three-taxon statement analysis (3TA), but there are two major differences between our recent proposal and the 3TA, as originally designed and implemented: (1) the 3TA deals with the three-taxon statements or minimal relationships. According to the logic of 3TA, the set of the minimal trees must be established as a binary matrix and used as an input for the parsimony program. In this paper, we operate directly with maximal relationships written just as trees, not as binary matrices, while also using the Average Consensus method instead of the MP analysis. The solely ‘reversal’-based groups can always be found by our method without the separate scoring of the putative reversals before analyses. PMID:28740753
Alignment-free protein interaction network comparison
Ali, Waqar; Rito, Tiago; Reinert, Gesine; Sun, Fengzhu; Deane, Charlotte M.
2014-01-01
Motivation: Biological network comparison software largely relies on the concept of alignment where close matches between the nodes of two or more networks are sought. These node matches are based on sequence similarity and/or interaction patterns. However, because of the incomplete and error-prone datasets currently available, such methods have had limited success. Moreover, the results of network alignment are in general not amenable for distance-based evolutionary analysis of sets of networks. In this article, we describe Netdis, a topology-based distance measure between networks, which offers the possibility of network phylogeny reconstruction. Results: We first demonstrate that Netdis is able to correctly separate different random graph model types independent of network size and density. The biological applicability of the method is then shown by its ability to build the correct phylogenetic tree of species based solely on the topology of current protein interaction networks. Our results provide new evidence that the topology of protein interaction networks contains information about evolutionary processes, despite the lack of conservation of individual interactions. As Netdis is applicable to all networks because of its speed and simplicity, we apply it to a large collection of biological and non-biological networks where it clusters diverse networks by type. Availability and implementation: The source code of the program is freely available at http://www.stats.ox.ac.uk/research/proteins/resources. Contact: w.ali@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161230
TeachEnG: a Teaching Engine for Genomics.
Kim, Minji; Kim, Yeonsung; Qian, Lei; Song, Jun S
2017-10-15
Bioinformatics is a rapidly growing field that has emerged from the synergy of computer science, statistics and biology. Given the interdisciplinary nature of bioinformatics, many students from diverse fields struggle with grasping bioinformatic concepts only from classroom lectures. Interactive tools for helping students reinforce their learning would be thus desirable. Here, we present an interactive online educational tool called TeachEnG (acronym for Teaching Engine for Genomics) for reinforcing key concepts in sequence alignment and phylogenetic tree reconstruction. Our instructional games allow students to align sequences by hand, fill out the dynamic programming matrix in the Needleman-Wunsch global sequence alignment algorithm, and reconstruct phylogenetic trees via the maximum parsimony, Unweighted Pair Group Method with Arithmetic mean (UPGMA) and Neighbor-Joining algorithms. With an easily accessible interface and instant visual feedback, TeachEnG will help promote active learning in bioinformatics. TeachEnG is freely available at http://teacheng.illinois.edu. The source code is available from https://github.com/KnowEnG/TeachEnG under the Artistic License 2.0. It is written in JavaScript and compatible with Firefox, Safari, Chrome and Microsoft Edge. songj@illinois.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Tree Alignment Based on Needleman-Wunsch Algorithm for Sensor Selection in Smart Homes.
Chua, Sook-Ling; Foo, Lee Kien
2017-08-18
Activity recognition in smart homes aims to infer the particular activities of the inhabitant, the aim being to monitor their activities and identify any abnormalities, especially for those living alone. In order for a smart home to support its inhabitant, the recognition system needs to learn from observations acquired through sensors. One question that often arises is which sensors are useful and how many sensors are required to accurately recognise the inhabitant's activities? Many wrapper methods have been proposed and remain one of the popular evaluators for sensor selection due to its superior accuracy performance. However, they are prohibitively slow during the evaluation process and may run into the risk of overfitting due to the extent of the search. Motivated by this characteristic, this paper attempts to reduce the cost of the evaluation process and overfitting through tree alignment. The performance of our method is evaluated on two public datasets obtained in two distinct smart home environments.
Ensembl comparative genomics resources.
Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.
Ensembl comparative genomics resources
Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul
2016-01-01
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847
Mirroring co-evolving trees in the light of their topologies.
Hajirasouliha, Iman; Schönhuth, Alexander; de Juan, David; Valencia, Alfonso; Sahinalp, S Cenk
2012-05-01
Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm
Missing the Forest for the Trees
ERIC Educational Resources Information Center
Amaral, Olga Maia; Garrison, Leslie
2007-01-01
This case study examines the alignment between the Intended Curriculum, Implemented Curriculum and Achieved Curriculum of a fourth grade inquiry based unit, "Food Chains and Webs." Specifically addressed are how the curriculum was modified to meet state standards, how teachers were trained, and how assessment of curricular implementation was…
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.
Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue
2018-05-15
Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.
Hierarchical models for informing general biomass equations with felled tree data
Brian J. Clough; Matthew B. Russell; Christopher W. Woodall; Grant M. Domke; Philip J. Radtke
2015-01-01
We present a hierarchical framework that uses a large multispecies felled tree database to inform a set of general models for predicting tree foliage biomass, with accompanying uncertainty, within the FIA database. Results suggest significant prediction uncertainty for individual trees and reveal higher errors when predicting foliage biomass for larger trees and for...
5. Aerial view of turnpike path running through center of ...
5. Aerial view of turnpike path running through center of photograph along row of trees. 1917 realignment visible along left edge of photograph along edge of forest. Modernized alignment resumes at top right of photograph. View looking north. - Orange Turnpike, Parallel to new Orange Turnpike, Monroe, Orange County, NY
Student Expectations, University Goals: Looking for Alignment in General Education Science
ERIC Educational Resources Information Center
Ericson, Rebecca J.
2012-01-01
This action research dissertation explores the alignment of university goals, faculty practice, and student expectations for general education natural science courses as a first step to understanding how best to restructure the program to ensure that students are learning in alignment with university stated goals for this aspect of their…
A possible biochemical missing link among archaebacteria
NASA Technical Reports Server (NTRS)
Achenbach-Richter, Laurie; Woese, Carl R.; Stetter, Karl O.
1987-01-01
The characteristics of the newly discovered strain of archaebacteria, VC-16, the only archaebacterium known to reduce sulfate, suggest that VC-16 might represent a transitional form between an anaerobic thermophilic sulfur-based type of metabolism and methanogenesis. It is shown here, using a matrix of evolutionary distances derived from an alignment of various archaebacterial 16S rRNAs and the phylogenetic tree derived from these evolutionary distances, that the lineage represented by strain VC-16 arises from the archaebacterial tree precisely where such an interpretation would predict that it would, between the Methanococcus lineage and that of Thermococcus.
CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy.
Zuo, Guanghong; Hao, Bailin
2015-10-01
A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy
Zuo, Guanghong; Hao, Bailin
2015-01-01
A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements. PMID:26563468
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-19
... DEPARTMENT OF THE INTERIOR National Park Service General Management Plan; Joshua Tree National... National Park Service is updating the General Management Plan (GMP) for Joshua Tree National Park... park management and resource analysis, other designations such as establishment of 594,502 acres by...
Simultaneous gene finding in multiple genomes.
König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario
2016-11-15
As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
O'Donoghue, Patrick; Luthey-Schulten, Zaida
2005-02-25
We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q(H), is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed.
Transforming phylogenetic networks: Moving beyond tree space.
Huber, Katharina T; Moulton, Vincent; Wu, Taoyang
2016-09-07
Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks. Copyright © 2016 Elsevier Ltd. All rights reserved.
Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti.
Copetti, Dario; Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L M; Childs, Kevin L; Eguiarte, Luis E; Lee, Seunghee; Liu, Tiffany L; McMahon, Michelle M; Whiteman, Noah K; Wing, Rod A; Wojciechowski, Martin F; Sanderson, Michael J
2017-11-07
Few clades of plants have proven as difficult to classify as cacti. One explanation may be an unusually high level of convergent and parallel evolution (homoplasy). To evaluate support for this phylogenetic hypothesis at the molecular level, we sequenced the genomes of four cacti in the especially problematic tribe Pachycereeae, which contains most of the large columnar cacti of Mexico and adjacent areas, including the iconic saguaro cactus ( Carnegiea gigantea ) of the Sonoran Desert. We assembled a high-coverage draft genome for saguaro and lower coverage genomes for three other genera of tribe Pachycereeae ( Pachycereus , Lophocereus , and Stenocereus ) and a more distant outgroup cactus, Pereskia We used these to construct 4,436 orthologous gene alignments. Species tree inference consistently returned the same phylogeny, but gene tree discordance was high: 37% of gene trees having at least 90% bootstrap support conflicted with the species tree. Evidently, discordance is a product of long generation times and moderately large effective population sizes, leading to extensive incomplete lineage sorting (ILS). In the best supported gene trees, 58% of apparent homoplasy at amino sites in the species tree is due to gene tree-species tree discordance rather than parallel substitutions in the gene trees themselves, a phenomenon termed "hemiplasy." The high rate of genomic hemiplasy may contribute to apparent parallelisms in phenotypic traits, which could confound understanding of species relationships and character evolution in cacti. Published under the PNAS license.
Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti
Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L. M.; Childs, Kevin L.; Eguiarte, Luis E.; Lee, Seunghee; Liu, Tiffany L.; McMahon, Michelle M.; Whiteman, Noah K.; Wing, Rod A.; Wojciechowski, Martin F.; Sanderson, Michael J.
2017-01-01
Few clades of plants have proven as difficult to classify as cacti. One explanation may be an unusually high level of convergent and parallel evolution (homoplasy). To evaluate support for this phylogenetic hypothesis at the molecular level, we sequenced the genomes of four cacti in the especially problematic tribe Pachycereeae, which contains most of the large columnar cacti of Mexico and adjacent areas, including the iconic saguaro cactus (Carnegiea gigantea) of the Sonoran Desert. We assembled a high-coverage draft genome for saguaro and lower coverage genomes for three other genera of tribe Pachycereeae (Pachycereus, Lophocereus, and Stenocereus) and a more distant outgroup cactus, Pereskia. We used these to construct 4,436 orthologous gene alignments. Species tree inference consistently returned the same phylogeny, but gene tree discordance was high: 37% of gene trees having at least 90% bootstrap support conflicted with the species tree. Evidently, discordance is a product of long generation times and moderately large effective population sizes, leading to extensive incomplete lineage sorting (ILS). In the best supported gene trees, 58% of apparent homoplasy at amino sites in the species tree is due to gene tree-species tree discordance rather than parallel substitutions in the gene trees themselves, a phenomenon termed “hemiplasy.” The high rate of genomic hemiplasy may contribute to apparent parallelisms in phenotypic traits, which could confound understanding of species relationships and character evolution in cacti. PMID:29078296
DLRS: gene tree evolution in light of a species tree.
Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens
2012-11-15
PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).
Single-Copy Genes as Molecular Markers for Phylogenomic Studies in Seed Plants
De La Torre, Amanda R.; Sterck, Lieven; Cánovas, Francisco M.; Avila, Concepción; Merino, Irene; Cabezas, José Antonio; Cervera, María Teresa; Ingvarsson, Pär K.
2017-01-01
Phylogenetic relationships among seed plant taxa, especially within the gymnosperms, remain contested. In contrast to angiosperms, for which several genomic, transcriptomic and phylogenetic resources are available, there are few, if any, molecular markers that allow broad comparisons among gymnosperm species. With few gymnosperm genomes available, recently obtained transcriptomes in gymnosperms are a great addition to identifying single-copy gene families as molecular markers for phylogenomic analysis in seed plants. Taking advantage of an increasing number of available genomes and transcriptomes, we identified single-copy genes in a broad collection of seed plants and used these to infer phylogenetic relationships between major seed plant taxa. This study aims at extending the current phylogenetic toolkit for seed plants, assessing its ability for resolving seed plant phylogeny, and discussing potential factors affecting phylogenetic reconstruction. In total, we identified 3,072 single-copy genes in 31 gymnosperms and 2,156 single-copy genes in 34 angiosperms. All studied seed plants shared 1,469 single-copy genes, which are generally involved in functions like DNA metabolism, cell cycle, and photosynthesis. A selected set of 106 single-copy genes provided good resolution for the seed plant phylogeny except for gnetophytes. Although some of our analyses support a sister relationship between gnetophytes and other gymnosperms, phylogenetic trees from concatenated alignments without 3rd codon positions and amino acid alignments under the CAT + GTR model, support gnetophytes as a sister group to Pinaceae. Our phylogenomic analyses demonstrate that, in general, single-copy genes can uncover both recent and deep divergences of seed plant phylogeny. PMID:28460034
Transformative Sustainability Learning: Cultivating a Tree-Planting Ethos in Western Kenya
ERIC Educational Resources Information Center
Bull, Marijoan
2013-01-01
Given the fundamental objective of ESD--perspective change--it is increasingly being aligned with the theoretical foundation of Mezirow's Transformative Learning. In 2008, Sipos et al. built upon this connection by proposing a matrix of learning objectives to assess ESD in formal settings. These objectives, grouped under the title of…
Yamaguchi, M; Miya, M; Okiyama, M; Nishida, M
2000-04-01
Larvae of the deep-sea lanternfish genus Hygophum (Myctophidae) exhibit a remarkable morphological diversity that is quite unexpected, considering their homogeneous adult morphology. In an attempt to elucidate the evolutionary patterns of such larval morphological diversity, nucleotide sequences of a portion of the mitochondrially encoded 16S ribosomal RNA gene were determined for seven Hygophum species and three outgroup taxa. Secondary structure-based alignment resulted in a character matrix consisting of 1172 bp of unambiguously aligned sequences, which were subjected to phylogenetic analyses using maximum-parsimony, maximum-likelihood, and neighbor-joining methods. The resultant tree topologies from the three methods were congruent, with most nodes, including that of the genus Hygophum, being strongly supported by various tree statistics. The most parsimonious reconstruction of the three previously recognized, distinct larval morphs onto the molecular phylogeny revealed that one of the morphs had originated as the common ancestor of the genus, the other two having diversified separately in two subsequent major clades. The patterns of such diversification are discussed in terms of the unusual larval eye morphology and geographic distribution. Copyright 2000 Academic Press.
[Identification of Tibetan medicine "Dida" of Gentianaceae using DNA barcoding].
Liu, Chuan; Zhang, Yu-Xin; Liu, Yue; Chen, Yi-Long; Fan, Gang; Xiang, Li; Xu, Jiang; Zhang, Yi
2016-02-01
The ITS2 barcode was used toidentify Tibetan medicine "Dida", and tosecure its quality and safety in medication. A total of 13 species, 151 experimental samples for the study from the Tibetan Plateau, including Gentianaceae Swertia, Halenia, Gentianopsis, Comastoma, Lomatogonium ITS2 sequences were amplified, and purified PCR products were sequenced. Sequence assembly and consensus sequence generation were performed using the CodonCode Aligner V3.7.1. The Kimura 2-Parameter (K2P) distances were calculated using MEGA 6.0. The neighbor-joining (NJ) phylogenetic trees were constructed. There are 31 haplotypes among 231 bp after alignment of all ITS2 sequence haplotypes, and the average G±C content of 61.40%. The NJ tree strongly supported that every species clustered into their own clade and high identification success rate, except that Swertia bifolia and Swertia wolfangiana could not be distinguished from each other based on the sequence divergences. DNA barcoding could be used as a fast and accurate identification method to distinguish Tibetan medicine "Dida" to ensure its safe use. Copyright© by the Chinese Pharmaceutical Association.
taxonomic diversity and pest vulnerability in street tree assemblages
Urban foresters routinely emphasise the importance of taxonomic diversity to reduce the vulnerability of tree assemblages to invasive pests, but it is unclear to what extent diversity reduces vulnerability to polyphagous (i.e. generalist) pests. Drawing on field data from seven communities in metropolitan Cincinnati, Ohio, USA, we tested the hypothesis that communities with higher diversity would exhibit lower vulnerability to the polyphagous Asian longhorned beetle, which currently threatens the region. Based on street tree compositions and the beetle??s host preferences, Asian longhorned beetle threatened up to 35.6% of individual street trees and 47.5% of the total basal area across the study area, but we did not see clear connections between taxonomic diversity and beetle vulnerability among study communities. For example, the city of Fairfield was among the least diverse communities but had the lowest proportion of trees vulnerable to Asian longhorned beetle, whereas the city of Wyoming exhibited high diversity and high vulnerability. On the other hand, Forest Park aligned with our original hypothesis, as it was characterised by low diversity and high vulnerability. Our results demonstrate that relatively high taxonomic diversity in street tree assemblages does not necessarily lead to reduced vulnerability to a polyphagous pest. Considering the threats posed by polyphagous pests, selecting a set of relatively pest resistant trees known to perform well in ur
Elliott, Grant P
2012-07-01
Given the widespread and often dramatic influence of climate change on terrestrial ecosystems, it is increasingly common for abrupt threshold changes to occur, yet explicitly testing for climate and ecological regime shifts is lacking in climatically sensitive upper treeline ecotones. In this study, quantitative evidence based on empirical data is provided to support the key role of extrinsic, climate-induced thresholds in governing the spatial and temporal patterns of tree establishment in these high-elevation environments. Dendroecological techniques were used to reconstruct a 420-year history of regeneration dynamics within upper treeline ecotones along a latitudinal gradient (approximately 44-35 degrees N) in the Rocky Mountains. Correlation analysis was used to assess the possible influence of minimum and maximum temperature indices and cool-season (November-April) precipitation on regional age-structure data. Regime-shift analysis was used to detect thresholds in tree establishment during the entire period of record (1580-2000), temperature variables significantly Correlated with establishment during the 20th century, and cool-season precipitation. Tree establishment was significantly correlated with minimum temperature during the spring (March-May) and cool season. Regime-shift analysis identified an abrupt increase in regional tree establishment in 1950 (1950-1954 age class). Coincident with this period was a shift toward reduced cool-season precipitation. The alignment of these climate conditions apparently triggered an abrupt increase in establishment that was unprecedented during the period of record. Two main findings emerge from this research that underscore the critical role of climate in governing regeneration dynamics within upper treeline ecotones. (1) Regional climate variability is capable of exceeding bioclimatic thresholds, thereby initiating synchronous and abrupt changes in the spatial and temporal patterns of tree establishment at broad regional scales. (2) The importance of climate parameters exceeding critical threshold values and triggering a regime shift in tree establishment appears to be contingent on the alignment of favorable temperature and moisture regimes. This research suggests that threshold changes in the climate system can fundamentally alter regeneration dynamics within upper treeline ecotones and, through the use of regime-shift analysis, reveals important climate-vegetation linkages.
Efficient Exploration of the Space of Reconciled Gene Trees
Szöllősi, Gergely J.; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent
2013-01-01
Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree–species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree–species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllősi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git. [amalgamation; gene tree reconciliation; gene tree reconstruction; lateral gene transfer; phylogeny.] PMID:23925510
The Calibration and Use of Capacitance Sensors to Monitor Stem Water Content in Trees.
Matheny, Ashley M; Garrity, Steven R; Bohrer, Gil
2017-12-27
Water transport and storage through the soil-plant-atmosphere continuum is critical to the terrestrial water cycle, and has become a major research focus area. Biomass capacitance plays an integral role in the avoidance of hydraulic impairment to transpiration. However, high temporal resolution measurements of dynamic changes in the hydraulic capacitance of large trees are rare. Here, we present procedures for the calibration and use of capacitance sensors, typically used to monitor soil water content, to measure the volumetric water content in trees in the field. Frequency domain reflectometry-style observations are sensitive to the density of the media being studied. Therefore, it is necessary to perform species-specific calibrations to convert from the sensor-reported values of dielectric permittivity to volumetric water content. Calibration is performed on a harvested branch or stem cut into segments that are dried or re-hydrated to produce a full range of water contents used to generate a best-fit regression with sensor observations. Sensors are inserted into calibration segments or installed in trees after pre-drilling holes to a tolerance fit using a fabricated template to ensure proper drill alignment. Special care is taken to ensure that sensor tines make good contact with the surrounding media, while allowing them to be inserted without excessive force. Volumetric water content dynamics observed via the presented methodology align with sap flow measurements recorded using thermal dissipation techniques and environmental forcing data. Biomass water content data can be used to observe the onset of water stress, drought response and recovery, and has the potential to be applied to the calibration and evaluation of new plant-level hydrodynamics models, as well as to the partitioning of remotely sensed moisture products into above- and belowground components.
Genome-wide heterogeneity of nucleotide substitution model fit.
Arbiza, Leonardo; Patricio, Mateus; Dopazo, Hernán; Posada, David
2011-01-01
At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Widespread Amazon forest tree mortality from a single cross-basin squall line event
NASA Astrophysics Data System (ADS)
Negrón-Juárez, Robinson I.; Chambers, Jeffrey Q.; Guimaraes, Giuliano; Zeng, Hongcheng; Raupp, Carlos F. M.; Marra, Daniel M.; Ribeiro, Gabriel H. P. M.; Saatchi, Sassan S.; Nelson, Bruce W.; Higuchi, Niro
2010-08-01
Climate change is expected to increase the intensity of extreme precipitation events in Amazonia that in turn might produce more forest blowdowns associated with convective storms. Yet quantitative tree mortality associated with convective storms has never been reported across Amazonia, representing an important additional source of carbon to the atmosphere. Here we demonstrate that a single squall line (aligned cluster of convective storm cells) propagating across Amazonia in January, 2005, caused widespread forest tree mortality and may have contributed to the elevated mortality observed that year. Forest plot data demonstrated that the same year represented the second highest mortality rate over a 15-year annual monitoring interval. Over the Manaus region, disturbed forest patches generated by the squall followed a power-law distribution (scaling exponent α = 1.48) and produced a mortality of 0.3-0.5 million trees, equivalent to 30% of the observed annual deforestation reported in 2005 over the same area. Basin-wide, potential tree mortality from this one event was estimated at 542 ± 121 million trees, equivalent to 23% of the mean annual biomass accumulation estimated for these forests. Our results highlight the vulnerability of Amazon trees to wind-driven mortality associated with convective storms. Storm intensity is expected to increase with a warming climate, which would result in additional tree mortality and carbon release to the atmosphere, with the potential to further warm the climate system.
Ma, Yazhen; Xu, Ting; Wan, Dongshi; Ma, Tao; Shi, Sheng; Liu, Jianquan; Hu, Quanjun
2015-03-17
Soil salinity is a significant factor that impairs plant growth and agricultural productivity, and numerous efforts are underway to enhance salt tolerance of economically important plants. Populus species are widely cultivated for diverse uses. Especially, they grow in different habitats, from salty soil to mesophytic environment, and are therefore used as a model genus for elucidating physiological and molecular mechanisms of stress tolerance in woody plants. The Salinity Tolerant Poplar Database (STPD) is an integrative database for salt-tolerant poplar genome biology. Currently the STPD contains Populus euphratica genome and its related genetic resources. P. euphratica, with a preference of the salty habitats, has become a valuable genetic resource for the exploitation of tolerance characteristics in trees. This database contains curated data including genomic sequence, genes and gene functional information, non-coding RNA sequences, transposable elements, simple sequence repeats and single nucleotide polymorphisms information of P. euphratica, gene expression data between P. euphratica and Populus tomentosa, and whole-genome alignments between Populus trichocarpa, P. euphratica and Salix suchowensis. The STPD provides useful searching and data mining tools, including GBrowse genome browser, BLAST servers and genome alignments viewer, which can be used to browse genome regions, identify similar sequences and visualize genome alignments. Datasets within the STPD can also be downloaded to perform local searches. A new Salinity Tolerant Poplar Database has been developed to assist studies of salt tolerance in trees and poplar genomics. The database will be continuously updated to incorporate new genome-wide data of related poplar species. This database will serve as an infrastructure for researches on the molecular function of genes, comparative genomics, and evolution in closely related species as well as promote advances in molecular breeding within Populus. The STPD can be accessed at http://me.lzu.edu.cn/stpd/ .
STBase: one million species trees for comparative biology.
McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J
2015-01-01
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
McTavish, Emily Jane; Steel, Mike; Holder, Mark T
2015-12-01
Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree - though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data. Copyright © 2015 Elsevier Inc. All rights reserved.
Negrisolo, Enrico; Kuhl, Heiner; Forcato, Claudio; Vitulo, Nicola; Reinhardt, Richard; Patarnello, Tomaso; Bargelloni, Luca
2010-12-01
Comparative genomics holds the promise to magnify the information obtained from individual genome sequencing projects, revealing common features conserved across genomes and identifying lineage-specific characteristics. To implement such a comparative approach, a robust phylogenetic framework is required to accurately reconstruct evolution at the genome level. Among vertebrate taxa, teleosts represent the second best characterized group, with high-quality draft genome sequences for five model species (Danio rerio, Gasterosteus aculeatus, Oryzias latipes, Takifugu rubripes, and Tetraodon nigroviridis), and several others are in the finishing lane. However, the relationships among the acanthomorph teleost model fishes remain an unresolved taxonomic issue. Here, a genomic region spanning over 1.2 million base pairs was sequenced in the teleost fish Dicentrarchus labrax. Together with genomic data available for the above fish models, the new sequence was used to identify unique orthologous genomic regions shared across all target taxa. Different strategies were applied to produce robust multiple gene and genomic alignments spanning from 11,802 to 186,474 amino acid/nucleotide positions. Ten data sets were analyzed according to Bayesian inference, maximum likelihood, maximum parsimony, and neighbor joining methods. Extensive analyses were performed to explore the influence of several factors (e.g., alignment methodology, substitution model, data set partitions, and long-branch attraction) on the tree topology. Although a general consensus was observed for a closer relationship between G. aculeatus (Gasterosteidae) and Di. labrax (Moronidae) with the atherinomorph O. latipes (Beloniformes) sister taxon of this clade, with the tetraodontiform group Ta. rubripes and Te. nigroviridis (Tetraodontiformes) representing a more distantly related taxon among acanthomorph model fish species, conflicting results were obtained between data sets and methods, especially with respect to the choice of alignment methodology applied to noncoding parts of the genomic region under study. This may limit the use of intergenic/noncoding sequences in phylogenomics until more robust alignment algorithms are developed.
Sela, Itamar; Ashkenazy, Haim; Katoh, Kazutaka; Pupko, Tal
2015-07-01
Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
David W. MacFarlane
2015-01-01
Accurately assessing forest biomass potential is contingent upon having accurate tree biomass models to translate data from forest inventories. Building generality into these models is especially important when they are to be applied over large spatial domains, such as regional, national and international scales. Here, new, generalized whole-tree mass / volume...
NASA Astrophysics Data System (ADS)
Zarco-Tejada, P. J.; Hornero, A.; Hernández-Clemente, R.; Beck, P. S. A.
2018-03-01
The operational monitoring of forest decline requires the development of remote sensing methods that are sensitive to the spatiotemporal variations of pigment degradation and canopy defoliation. In this context, the red-edge spectral region (RESR) was proposed in the past due to its combined sensitivity to chlorophyll content and leaf area variation. In this study, the temporal dimension of the RESR was evaluated as a function of forest decline using a radiative transfer method with the PROSPECT and 3D FLIGHT models. These models were used to generate synthetic pine stands simulating decline and recovery processes over time and explore the temporal rate of change of the red-edge chlorophyll index (CI) as compared to the trajectories obtained for the structure-related Normalized Difference Vegetation Index (NDVI). The temporal trend method proposed here consisted of using synthetic spectra to calculate the theoretical boundaries of the subspace for healthy and declining pine trees in the temporal domain, defined by CItime=n/CItime=n+1 vs. NDVItime=n/NDVItime=n+1. Within these boundaries, trees undergoing decline and recovery processes showed different trajectories through this subspace. The method was then validated using three high-resolution airborne hyperspectral images acquired at 40 cm resolution and 260 spectral bands of 6.5 nm full-width half-maximum (FWHM) over a forest with widespread tree decline, along with field-based monitoring of chlorosis and defoliation (i.e., 'decline' status) in 663 trees between the years 2015 and 2016. The temporal rate of change of chlorophyll vs. structural indices, based on reflectance spectra extracted from the hyperspectral images, was different for trees undergoing decline, and aligned towards the decline baseline established using the radiative transfer models. By contrast, healthy trees over time aligned towards the theoretically obtained healthy baseline. The applicability of this temporal trend method to the red-edge bands of the MultiSpectral Imager (MSI) instrument on board Sentinel-2a for operational forest status monitoring was also explored by comparing the temporal rate of change of the Sentinel-2-derived CI over areas with declining and healthy trees. Results demonstrated that the Sentinel-2a red-edge region was sensitive to the temporal dimension of forest condition, as the relationships obtained for pixels in healthy condition deviated from those of pixels undergoing decline.
NASA Astrophysics Data System (ADS)
Andriani, Tri; Irawan, Mohammad Isa
2017-08-01
Ebola Virus Disease (EVD) is a disease caused by a virus of the genus Ebolavirus (EBOV), family Filoviridae. Ebola virus is classifed into five types, namely Zaire ebolavirus (ZEBOV), Sudan ebolavirus (SEBOV), Bundibugyo ebolavirus (BEBOV), Tai Forest ebolavirus also known as Cote d'Ivoire ebolavirus (CIEBOV), and Reston ebolavirus (REBOV). Identification of kinship types of Ebola virus can be performed using phylogenetic trees. In this study, the phylogenetic tree constructed by UPGMA method in which there are Multiple Alignment using Progressive Method. The results concluded that the phylogenetic tree formation kinship ebola virus types that kind of Tai Forest ebolavirus close to Bundibugyo ebolavirus but the layout state ebola epidemic spread far apart. The genetic distance for this type of Bundibugyo ebolavirus with Tai Forest ebolavirus is 0.3725. Type Tai Forest ebolavirus similar to Bundibugyo ebolavirus not inuenced by the proximity of the area ebola epidemic spread.
Alignment as a Teacher Variable
ERIC Educational Resources Information Center
Porter, Andrew C.; Smithson, John; Blank, Rolf; Zeidner, Timothy
2007-01-01
With the exception of the procedures developed by Porter and colleagues (Porter, 2002), other methods of defining and measuring alignment are essentially limited to alignment between tests and standards. Porter's procedures have been generalized to investigating the alignment between content standards, tests, textbooks, and even classroom…
Lessel, Uta; Wellenzohn, Bernd; Fischer, J Robert; Rarey, Matthias
2012-02-27
A case study is presented illustrating the design of a focused CDK2 library. The scaffold of the library was detected by a feature trees search in a fragment space based on reactions from combinatorial chemistry. For the design the software LoFT (Library optimizer using Feature Trees) was used. The special feature called FTMatch was applied to restrict the parts of the queries where the reagents are permitted to match. This way a 3D scoring function could be simulated. Results were compared with alternative designs by GOLD docking and ROCS 3D alignments.
An information-based network approach for protein classification
Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.
2017-01-01
Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835
ERIC Educational Resources Information Center
Smith, David Arthur
2010-01-01
Much recent work in natural language processing treats linguistic analysis as an inference problem over graphs. This development opens up useful connections between machine learning, graph theory, and linguistics. The first part of this dissertation formulates syntactic dependency parsing as a dynamic Markov random field with the novel…
Using Linguistic Knowledge in Statistical Machine Translation
2010-09-01
on newswire test data . . . . . . . . . . . . . . . . . . . . . 65 3.4 Arabic to English MT results for Arabic morphological segmentation, measured on...web test data. . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Recombination Results. Percentage of sentences with mis-combined words...scores for syntactic reordering of the Spoken Language Domain. 90 5.1 Normalized likelihood of the test set alignments without decision trees, and then
SGML and Related Standards: New Directions as the Second Decade Begins.
ERIC Educational Resources Information Center
Mason, James David
1997-01-01
ISO--International Organization for Standards highlights the activities of WG8 (Working Group 8 of ISO) in the alignment of standards for a common tree model and common query languages. Examines the how Document Style Semantics and Specification Language (DSSSL) and HyTime make documents easier to work with and more powerful in their ability to…
Tree Rings: Timekeepers of the Past.
ERIC Educational Resources Information Center
Phipps, R. L.; McGowan, J.
One of a series of general interest publications on science issues, this booklet describes the uses of tree rings in historical and biological recordkeeping. Separate sections cover the following topics: dating of tree rings, dating with tree rings, tree ring formation, tree ring identification, sample collections, tree ring cross dating, tree…
Development and application of an algorithm to compute weighted multiple glycan alignments.
Hosoda, Masae; Akune, Yukie; Aoki-Kinoshita, Kiyoko F
2017-05-01
A glycan consists of monosaccharides linked by glycosidic bonds, has branches and forms complex molecular structures. Databases have been developed to store large amounts of glycan-binding experiments, including glycan arrays with glycan-binding proteins. However, there are few bioinformatics techniques to analyze large amounts of data for glycans because there are few tools that can handle the complexity of glycan structures. Thus, we have developed the MCAW (Multiple Carbohydrate Alignment with Weights) tool that can align multiple glycan structures, to aid in the understanding of their function as binding recognition molecules. We have described in detail the first algorithm to perform multiple glycan alignments by modeling glycans as trees. To test our tool, we prepared several data sets, and as a result, we found that the glycan motif could be successfully aligned without any prior knowledge applied to the tool, and the known recognition binding sites of glycans could be aligned at a high rate amongst all our datasets tested. We thus claim that our tool is able to find meaningful glycan recognition and binding patterns using data obtained by glycan-binding experiments. The development and availability of an effective multiple glycan alignment tool opens possibilities for many other glycoinformatics analysis, making this work a big step towards furthering glycomics analysis. http://www.rings.t.soka.ac.jp. kkiyoko@soka.ac.jp. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Irigaray, F. Javier Sáenz-De-Cabezón; Moreno-Grijalba, Fernando; Marco, Vicente; Pérez-Moreno, Ignacio
2010-01-01
Azadirachtin, derived from the neem tree, Azadirachta indica A. Juss (Sapindales: Meliaceae), seems promising for use in integrated pest management programs to control a variety of pest species. A commercial formulation of azadirachtin, Align®, has been evaluated against different developmental stages of the European grape berry moth, Lobesia botrana Denis and Schiffermüller (Lepidoptera: Tortricidae). When administered orally, Align reduced the fecundity and fertility of adults treated with 1, 5, and 10 mg litre-1. At the highest doses, fecundity and fertility were zero, but longevity was not affected. An LC50 of 231.5 mg litre-1 was obtained when Align was sprayed on eggs less than 1 day old. Hatching of all egg classes was significantly reduced, and this reduction was more pronounced for eggs less than 24 h old. LC50 values of 2.1 mg litre-1 for first instars and 18.7 mg litre-1 for third instars were obtained when Align was present in the diet. Larvae reared on a diet containing different concentrations of Align did not molt into adults at the highest concentrations (0.3, 0.6, 1.2), and 50% molted at the lowest concentration (0.15). Phenotypic effects included inability to molt properly and deformities. The combination of acute toxicity and low, effective concentrations of Align observed in this study could lead to the inclusion of insecticides containing azadirachtin in integrated management programs against this pest. PMID:20578954
Irigaray, F Javier Sáenz-De-Cabezón; Moreno-Grijalba, Fernando; Marco, Vicente; Pérez-Moreno, Ignacio
2010-01-01
Azadirachtin, derived from the neem tree, Azadirachta indica A. Juss (Sapindales: Meliaceae), seems promising for use in integrated pest management programs to control a variety of pest species. A commercial formulation of azadirachtin, Align, has been evaluated against different developmental stages of the European grape berry moth, Lobesia botrana Denis and Schiffermüller (Lepidoptera: Tortricidae). When administered orally, Align reduced the fecundity and fertility of adults treated with 1, 5, and 10 mg litre(-1). At the highest doses, fecundity and fertility were zero, but longevity was not affected. An LC(50) of 231.5 mg litre(-1) was obtained when Align was sprayed on eggs less than 1 day old. Hatching of all egg classes was significantly reduced, and this reduction was more pronounced for eggs less than 24 h old. LC(50) values of 2.1 mg litre(-1) for first instars and 18.7 mg litre(-1) for third instars were obtained when Align was present in the diet. Larvae reared on a diet containing different concentrations of Align did not molt into adults at the highest concentrations (0.3, 0.6, 1.2), and 50% molted at the lowest concentration (0.15). Phenotypic effects included inability to molt properly and deformities. The combination of acute toxicity and low, effective concentrations of Align observed in this study could lead to the inclusion of insecticides containing azadirachtin in integrated management programs against this pest.
Asian longhorned beetle complicates the relationship ...
Urban foresters routinely emphasise the importance of taxonomic diversity to reduce the vulnerability of tree assemblages to invasive pests, but it is unclear to what extent diversity reduces vulnerability to polyphagous (i.e. generalist) pests. Drawing on field data from seven communities in metropolitan Cincinnati, Ohio, USA, we tested the hypothesis that communities with higher diversity would exhibit lower vulnerability to the polyphagous Asian longhorned beetle, which currently threatens the region. Based on street tree compositions and the beetle?s host preferences, Asian longhorned beetle threatened up to 35.6% of individual street trees and 47.5% of the total basal area across the study area, but we did not see clear connections between taxonomic diversity and beetle vulnerability among study communities. For example, the city of Fairfield was among the least diverse communities but had the lowest proportion of trees vulnerable to Asian longhorned beetle, whereas the city of Wyoming exhibited high diversity and high vulnerability. On the other hand, Forest Park aligned with our original hypothesis, as it was characterised by low diversity and high vulnerability. Our results demonstrate that relatively high taxonomic diversity in street tree assemblages does not necessarily lead to reduced vulnerability to a polyphagous pest. Considering the threats posed by polyphagous pests, selecting a set of relatively pest resistant trees known to perform well in urb
Fast and accurate phylogeny reconstruction using filtered spaced-word matches
Sohrabi-Jahromi, Salma; Morgenstern, Burkhard
2017-01-01
Abstract Motivation: Word-based or ‘alignment-free’ algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don’t-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don’t-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don’t-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. Availability and Implementation: The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/ Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073754
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.
Leimeister, Chris-André; Sohrabi-Jahromi, Salma; Morgenstern, Burkhard
2017-04-01
Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don't-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don't-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don't-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/. chris.leimeister@stud.uni-goettingen.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
2015-01-01
Abstract Trees contribute to enormous plant oil reserves because many trees contain 50%–80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the “proline knot” motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs. PMID:26258573
Cao, Heping
2015-09-01
Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs.
36 CFR 223.4 - Exchange of trees or portions of trees.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 36 Parks, Forests, and Public Property 2 2011-07-01 2011-07-01 false Exchange of trees or portions of trees. 223.4 Section 223.4 Parks, Forests, and Public Property FOREST SERVICE, DEPARTMENT OF... PRODUCTS General Provisions § 223.4 Exchange of trees or portions of trees. Trees or portions of trees may...
36 CFR 223.4 - Exchange of trees or portions of trees.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 36 Parks, Forests, and Public Property 2 2012-07-01 2012-07-01 false Exchange of trees or portions of trees. 223.4 Section 223.4 Parks, Forests, and Public Property FOREST SERVICE, DEPARTMENT OF... PRODUCTS General Provisions § 223.4 Exchange of trees or portions of trees. Trees or portions of trees may...
36 CFR 223.4 - Exchange of trees or portions of trees.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 36 Parks, Forests, and Public Property 2 2014-07-01 2014-07-01 false Exchange of trees or portions of trees. 223.4 Section 223.4 Parks, Forests, and Public Property FOREST SERVICE, DEPARTMENT OF... PRODUCTS General Provisions § 223.4 Exchange of trees or portions of trees. Trees or portions of trees may...
36 CFR 223.4 - Exchange of trees or portions of trees.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 36 Parks, Forests, and Public Property 2 2013-07-01 2013-07-01 false Exchange of trees or portions of trees. 223.4 Section 223.4 Parks, Forests, and Public Property FOREST SERVICE, DEPARTMENT OF... PRODUCTS General Provisions § 223.4 Exchange of trees or portions of trees. Trees or portions of trees may...
Polynomial-Time Algorithms for Building a Consensus MUL-Tree
Cui, Yun; Jansson, Jesper
2012-01-01
Abstract A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host–parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. PMID:22963134
Polynomial-time algorithms for building a consensus MUL-tree.
Cui, Yun; Jansson, Jesper; Sung, Wing-Kin
2012-09-01
A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.
New substitution models for rooting phylogenetic trees.
Williams, Tom A; Heaps, Sarah E; Cherlin, Svetlana; Nye, Tom M W; Boys, Richard J; Embley, T Martin
2015-09-26
The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made. © 2015 The Authors.
Zhang, Peng; Li, Houqiang; Wang, Honghui; Wong, Stephen T C; Zhou, Xiaobo
2011-01-01
Peak detection is one of the most important steps in mass spectrometry (MS) analysis. However, the detection result is greatly affected by severe spectrum variations. Unfortunately, most current peak detection methods are neither flexible enough to revise false detection results nor robust enough to resist spectrum variations. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the optimal decomposition via both peak intensity and common peak information. The common peak information is derived and loopily refined from the density clustering of the latest peak detection result. Finally, we present an improved ant colony optimization biomarker selection method to build a whole MS analysis system. Experiment shows that our peak detection method can better resist spectrum variations and provide higher sensitivity and lower false detection rates than conventional methods. The benefits from our peak-tree-based system for MS disease analysis are also proved on real SELDI data.
Determination of transverse elastic constants of wood using a cylindrically orthotropic model
John C. Hermanson
2003-01-01
The arrangement of anatomical elements in the cross section of a tree can be characterized, at least to a first approximation, with a cylindrical coordinate system. It seems reasonable that the physical properties of wood in the transverse plane, therefore, would exhibit behaviour that is associated with this anatomical alignment. Most of the transverse properties of...
Automatic Configuration of Programmable Logic Controller Emulators
2015-03-01
25 11 Example tree generated using UPGMA [Edw13] . . . . . . . . . . . . . . . . . . . . 33 12 Example sequence alignment for two... UPGMA Unweighted Pair Group Method with Arithmetic Mean URL uniform resource locator VM virtual machine XML Extensible Markup Language xx List of...appearance in the ses- sion, and then they are clustered again using Unweighted Pair Group Method with Arithmetic Mean ( UPGMA ) with a distance matrix based
Aligning ecology and markets in the forest carbon cycle
Matthew D. Hurteau; Bruce A. Hungate; George W. Koch; Malcolm P. North; Gordon R Smith
2013-01-01
A forest carbon (C) offset is a quantifiable unit of C that is commonly developed at the local or regional project scale and is designed to counterbalance anthropogenic C emissions by sequestering C in trees. In capand- trade programs, forest offsets have market value if the sequestered C is additional (more than would have occurred in the absence of the project) and...
Phylogenetic Analyses of Meloidogyne Small Subunit rDNA.
De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques
2002-12-01
Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species.
Phylogenetic Analyses of Meloidogyne Small Subunit rDNA
De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques
2002-01-01
Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species. PMID:19265950
ERIC Educational Resources Information Center
Fulmer, Gavin W.; Polikoff, Morgan S.
2014-01-01
An essential component in school accountability efforts is for assessments to be well-aligned with the standards or curriculum they are intended to measure. However, relatively little prior research has explored methods to determine statistical significance of alignment or misalignment. This study explores analyses of alignment as a special case…
Forced Alignment for Understudied Language Varieties: Testing Prosodylab-Aligner with Tongan Data
ERIC Educational Resources Information Center
Johnson, Lisa M.; Di Paolo, Marianna; Bell, Adrian
2018-01-01
Automated alignment of transcriptions to audio files expedites the process of preparing data for acoustic analysis. Unfortunately, the benefits of auto-alignment have generally been available only to researchers studying majority languages, for which large corpora exist and for which acoustic models have been created by large-scale research…
NASA Astrophysics Data System (ADS)
Merkel, Philipp M.; Schäfer, Björn Malte
2017-10-01
Cross-correlating the lensing signals of galaxies and comic microwave background (CMB) fluctuations is expected to provide valuable cosmological information. In particular, it may help tighten constraints on parameters describing the properties of intrinsically aligned galaxies at high redshift. To access the information conveyed by the cross-correlation signal, its accurate theoretical description is required. We compute the bias to CMB lensing-galaxy shape cross-correlation measurements induced by non-linear structure growth. Using tree-level perturbation theory for the large-scale structure bispectrum, we find that the bias is negative on most angular scales, therefore mimicking the signal of intrinsic alignments. Combining Euclid-like galaxy lensing data with a CMB experiment comparable to the Planck satellite mission, the bias becomes significant only on smallest scales (ℓ ≳ 2500). For improved CMB observations, however, the corrections amount to 10-15 per cent of the CMB lensing-intrinsic alignment signal over a wide multipole range (10 ≲ ℓ ≲ 2000). Accordingly, the power spectrum bias, if uncorrected, translates into 2σ and 3σ errors in the determination of the intrinsic alignment amplitude in the case of CMB stage III and stage IV experiments, respectively.
NASA Astrophysics Data System (ADS)
Stefan Devlin, Benjamin; Nakura, Toru; Ikeda, Makoto; Asada, Kunihiro
We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A self synchronous LUT (SSLUT) consists of a three input tree-type structure with 8bits of SRAM for programming. A self synchronous switch box (SSSB) consists of both pass transistors and buffers to route signals, with 12bits of SRAM. One common block with one SSLUT and one SSSB occupies 2.2Mλ2 area with 35bits of SRAM, and the prototype SSFPGA with 34 × 30 (1020) blocks is designed and fabricated using 65nm CMOS. Measured results show at 1.2V 430MHz and 647MHz operation for a 3bit ripple carry adder, without and with throughput optimization, respectively. We find that using the proposed pipeline alignment techniques we can perform at maximum throughput of 647MHz in various benchmarks on the SSFPGA. We demonstrate up to 56.1 times throughput improvement with our pipeline alignment techniques. The pipeline alignment is carried out within the number of logic elements in the array and pipeline buffers in the switching matrix.
Joe R. McBride; David J. Nowak
1989-01-01
A survey of published reports on urban park tree inventories in the United States and the United Kingdom reveal two types of inventories: (1) Tree Location Inventories and (2) Generalized Information Inventories. Tree location inventories permit managers to relocate specific park trees, along with providing individual tree characteristics and condition data. In...
Zheng, Qi; Grice, Elizabeth A
2016-10-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.
Cavity tree selection by red-cockaded woodpeckers in relation to tree age
D. Craig Rudolph; Richard N. Conner
1991-01-01
We aged over 1350 Red-cockaded Woodpecker (Picoides borealis) cavity trees and a comparable number of randomly selected trees. Resulting data strongly support the hypothesis that Red-cockaded Woodpeckers preferentially select older trees. Ages of recently initiated cavity trees in the Texas study areas generally were similar to those of cavity trees...
Flowering and fruiting of southern browse species
L.K. Halls
1973-01-01
Flowering and fruiting dates are reported for 14 browse species growing in the open and beneath trees in an east Texas pine-hardwood forest. Dates for individual species generally were not influenced by tree cover. In the open, plants generally produced fruit more consistently and abundantly and at an earlier age than beneath the trees.
Rooting phylogenies using gene duplications: an empirical example from the bees (Apoidea).
Brady, Seán G; Litman, Jessica R; Danforth, Bryan N
2011-09-01
The placement of the root node in a phylogeny is fundamental to characterizing evolutionary relationships. The root node of bee phylogeny remains unclear despite considerable previous attention. In order to test alternative hypotheses for the location of the root node in bees, we used the F1 and F2 paralogs of elongation factor 1-alpha (EF-1α) to compare the tree topologies that result when using outgroup versus paralogous rooting. Fifty-two taxa representing each of the seven bee families were sequenced for both copies of EF-1α. Two datasets were analyzed. In the first (the "concatenated" dataset), the F1 and F2 copies for each species were concatenated and the tree was rooted using appropriate outgroups (sphecid and crabronid wasps). In the second dataset (the "duplicated" dataset), the F1 and F2 copies were aligned to each another and each copy for all taxa were treated as separate terminals. In this dataset, the root was placed between the F1 and F2 copies (e.g., paralog rooting). Bayesian analyses demonstrate that the outgroup rooting approach outperforms paralog rooting, recovering deeper clades and showing stronger support for groups well established by both morphological and other molecular data. Sequence characteristics of the two copies were compared at the amino acid level, but little evidence was found to suggest that one copy is more functionally conserved. Although neither approach yields an unambiguous root to the tree, both approaches strongly indicate that the root of bee phylogeny does not fall near Colletidae, as has been previously proposed. We discuss paralog rooting as a general strategy and why this approach performs relatively poorly with our particular dataset. Copyright © 2011 Elsevier Inc. All rights reserved.
Burke, Sean V.; Wysocki, William P.; Clark, Lynn G.
2018-01-01
The systematics of grasses has advanced through applications of plastome phylogenomics, although studies have been largely limited to subfamilies or other subgroups of Poaceae. Here we present a plastome phylogenomic analysis of 250 complete plastomes (179 genera) sampled from 44 of the 52 tribes of Poaceae. Plastome sequences were determined from high throughput sequencing libraries and the assemblies represent over 28.7 Mbases of sequence data. Phylogenetic signal was characterized in 14 partitions, including (1) complete plastomes; (2) protein coding regions; (3) noncoding regions; and (4) three loci commonly used in single and multi-gene studies of grasses. Each of the four main partitions was further refined, alternatively including or excluding positively selected codons and also the gaps introduced by the alignment. All 76 protein coding plastome loci were found to be predominantly under purifying selection, but specific codons were found to be under positive selection in 65 loci. The loci that have been widely used in multi-gene phylogenetic studies had among the highest proportions of positively selected codons, suggesting caution in the interpretation of these earlier results. Plastome phylogenomic analyses confirmed the backbone topology for Poaceae with maximum bootstrap support (BP). Among the 14 analyses, 82 clades out of 309 resolved were maximally supported in all trees. Analyses of newly sequenced plastomes were in agreement with current classifications. Five of seven partitions in which alignment gaps were removed retrieved Panicoideae as sister to the remaining PACMAD subfamilies. Alternative topologies were recovered in trees from partitions that included alignment gaps. This suggests that ambiguities in aligning these uncertain regions might introduce a false signal. Resolution of these and other critical branch points in the phylogeny of Poaceae will help to better understand the selective forces that drove the radiation of the BOP and PACMAD clades comprising more than 99.9% of grass diversity. PMID:29416954
STBase: One Million Species Trees for Comparative Biology
McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.
2015-01-01
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees. PMID:25679219
T-RMSD: a web server for automated fine-grained protein structural classification.
Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric
2013-07-01
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.
T-RMSD: a web server for automated fine-grained protein structural classification
Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric
2013-01-01
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd. PMID:23716642
Knoetze, Rinus; Swart, Antoinette
2014-12-09
A survey was performed to detect the presence of cyst nematodes in the Cape Floristic Region of South Africa. Soil was collected in the rhizosphere of the dominant plant species within blocks of indigenous vegetation and cysts were extracted from them. A total of 81 blocks of indigenous vegetation were sampled as described. Cysts were detected in 7 of these samples, representing 6 different vegetation types. One set of primers was used to amplify the ITS regions from these cysts, including the 5.8S ribosomal gene, as well as short parts of the 18S and 28S ribosomal genes. ITS-rDNA sequences from the indigenous isolates were aligned with selected sequences of other species from the Heteroderidae. Phylogenetic analyses to resolve the relationships between indigenous isolates and selected representatives of the Heteroderidae were conducted using the Maximum Parsimony method. The consensus tree resulting from alignment of the circumfenestrate cysts revealed that isolates SK18, WK1 and WK26 are included in a clade of Globodera species that parasitise non-solanaceous plants, forming a monophyletic group with G. millefolii, G. artemisiae, and an unidentified Globodera sp. from Portugal. In a tree resulting from the alignment of the Heterodera spp., isolates OK14 and WK2 are included in the Afenestrata group, forming a monophyletic group with H. orientalis.This survey unearthed at least four potentially new species of cyst nematodes, which may prove invaluable for the study of the evolution and biogeography of the group.
2016-01-01
Invasive pathogens can cause considerable damage to forest ecosystems. Lack of coevolution is generally thought to enable invasive pathogens to bypass the defence and/or recognition systems in the host. Although mostly true, this argument fails to predict intermittent outcomes in space and time, underlining the need to include the roles of the environment and the phenotype in host–pathogen interactions when predicting disease impacts. We emphasize the need to consider host–tree imbalances from a phenotypic perspective, considering the lack of coevolutionary and evolutionary history with the pathogen and the environment, respectively. We describe how phenotypic plasticity and plastic responses to environmental shifts may become maladaptive when hosts are faced with novel pathogens. The lack of host–pathogen and environmental coevolution are aligned with two global processes currently driving forest damage: globalization and climate change, respectively. We suggest that globalization and climate change act synergistically, increasing the chances of both genotypic and phenotypic imbalances. Short moves on the same continent are more likely to be in balance than if the move is from another part of the world. We use Gremmeniella abietina outbreaks in Sweden to exemplify how host–pathogen phenotypic interactions can help to predict the impacts of specific invasive and emergent diseases. This article is part of the themed issue ‘Tackling emerging fungal threats to animal health, food security and ecosystem resilience’. PMID:28080981
Stenlid, Jan; Oliva, Jonàs
2016-12-05
Invasive pathogens can cause considerable damage to forest ecosystems. Lack of coevolution is generally thought to enable invasive pathogens to bypass the defence and/or recognition systems in the host. Although mostly true, this argument fails to predict intermittent outcomes in space and time, underlining the need to include the roles of the environment and the phenotype in host-pathogen interactions when predicting disease impacts. We emphasize the need to consider host-tree imbalances from a phenotypic perspective, considering the lack of coevolutionary and evolutionary history with the pathogen and the environment, respectively. We describe how phenotypic plasticity and plastic responses to environmental shifts may become maladaptive when hosts are faced with novel pathogens. The lack of host-pathogen and environmental coevolution are aligned with two global processes currently driving forest damage: globalization and climate change, respectively. We suggest that globalization and climate change act synergistically, increasing the chances of both genotypic and phenotypic imbalances. Short moves on the same continent are more likely to be in balance than if the move is from another part of the world. We use Gremmeniella abietina outbreaks in Sweden to exemplify how host-pathogen phenotypic interactions can help to predict the impacts of specific invasive and emergent diseases.This article is part of the themed issue 'Tackling emerging fungal threats to animal health, food security and ecosystem resilience'. © 2016 The Author(s).
General Education Reform: Opportunities for Institutional Alignment
ERIC Educational Resources Information Center
Fuess, Scott M., Jr.; Mitchell, Nancy D.
2011-01-01
General education reform provides strategic opportunities for departments. This article analyzes reform at the University of Nebraska-Lincoln, illustrating how departments could use the reform process to clarify their strategic planning, align with institutional goals, and steer the university closer to departmental objectives. (Contains 1 table.)
Zarco-Tejada, P J; Hornero, A; Hernández-Clemente, R; Beck, P S A
2018-03-01
The operational monitoring of forest decline requires the development of remote sensing methods that are sensitive to the spatiotemporal variations of pigment degradation and canopy defoliation. In this context, the red-edge spectral region (RESR) was proposed in the past due to its combined sensitivity to chlorophyll content and leaf area variation. In this study, the temporal dimension of the RESR was evaluated as a function of forest decline using a radiative transfer method with the PROSPECT and 3D FLIGHT models. These models were used to generate synthetic pine stands simulating decline and recovery processes over time and explore the temporal rate of change of the red-edge chlorophyll index (CI) as compared to the trajectories obtained for the structure-related Normalized Difference Vegetation Index (NDVI). The temporal trend method proposed here consisted of using synthetic spectra to calculate the theoretical boundaries of the subspace for healthy and declining pine trees in the temporal domain, defined by CI time=n /CI time=n+1 vs. NDVI time=n /NDVI time=n+1 . Within these boundaries, trees undergoing decline and recovery processes showed different trajectories through this subspace. The method was then validated using three high-resolution airborne hyperspectral images acquired at 40 cm resolution and 260 spectral bands of 6.5 nm full-width half-maximum (FWHM) over a forest with widespread tree decline, along with field-based monitoring of chlorosis and defoliation (i.e., 'decline' status) in 663 trees between the years 2015 and 2016. The temporal rate of change of chlorophyll vs. structural indices, based on reflectance spectra extracted from the hyperspectral images, was different for trees undergoing decline, and aligned towards the decline baseline established using the radiative transfer models. By contrast, healthy trees over time aligned towards the theoretically obtained healthy baseline . The applicability of this temporal trend method to the red-edge bands of the MultiSpectral Imager (MSI) instrument on board Sentinel-2a for operational forest status monitoring was also explored by comparing the temporal rate of change of the Sentinel-2-derived CI over areas with declining and healthy trees. Results demonstrated that the Sentinel-2a red-edge region was sensitive to the temporal dimension of forest condition, as the relationships obtained for pixels in healthy condition deviated from those of pixels undergoing decline.
2016-01-01
Porcelain crabs from the closely related genera Petrolisthes, Liopetrolisthes, and Allopetrolisthes are known for their diversity of lifestyles, habitats, and coloration. The evolutionary relationships among the species belonging to these three genera is not fully resolved. A molecular phylogeny of the group may help to resolve the long-standing taxonomic question about the validity of the genera Allopetrolisthes and Liopetrolisthes. Using both ‘total evidence’ and single-marker analyses based on a 362-bp alignment of the 16S rRNA mitochondrial DNA and a 328-bp alignment of the Histone 3 nuclear DNA, the phylogenetic relationships among 11 species from Petrolisthes (6 species), Liopetrolisthes (2 species), and Allopetrolisthes (3 species), all native to the south eastern Pacific, were examined. The analyses supported three pairs of sister species: L. mitra + L. patagonicus, P. tuberculatus + P. tuberculosus, and A. angulosus + A. punctatus. No complete segregation of species, according to genera, was evident from tree topologies. Bayesian-factor analyses revealed strong support for the unconstrained tree instead of an alternative tree in which monophyly of the three genera was forced. Thus, the present molecular phylogeny does not support the separation of the species within this complex into the genera Petrolisthes, Liopetrolisthes, and Allopetrolisthes. Taking into account the above and other recent molecular phylogenetic analyses focused on other representatives from the family Porcellanidae, it is tentatively proposed to eliminate the genera Liopetrolisthes and Allopetrolisthes, and to transfer their members to the genus Petrolisthes. PMID:26989636
Twistor-strings and gravity tree amplitudes
NASA Astrophysics Data System (ADS)
Adamo, Tim; Mason, Lionel
2013-04-01
Recently we discussed how Einstein supergravity tree amplitudes might be obtained from the original Witten and Berkovits twistor-string theory when external conformal gravitons are restricted to be Einstein gravitons. Here we obtain a more systematic understanding of the relationship between conformal and Einstein gravity amplitudes in that twistor-string theory. We show that although it does not in general yield Einstein amplitudes, we can nevertheless obtain some partial twistor-string interpretation of the remarkable formulae recently been found by Hodges and generalized to all tree amplitudes by Cachazo and Skinner. The Hodges matrix and its higher degree generalizations encode the world sheet correlators of the twistor string. These matrices control both Einstein amplitudes and those of the conformal gravity arising from the Witten and Berkovits twistor-string. Amplitudes in the latter case arise from products of the diagonal elements of the generalized Hodges matrices and reduced determinants give the former. The reduced determinants arise if the contractions in the worldsheet correlator are restricted to form connected trees at MHV. The (generalized) Hodges matrices arise as weighted Laplacian matrices for the graph of possible contractions in the correlators and the reduced determinants of these weighted Laplacian matrices give the sum of the connected tree contributions by an extension of the matrix-tree theorem.
Zheng, Qi; Grice, Elizabeth A.
2016-01-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost. PMID:27706155
Single Polygon Counting on Cayley Tree of Order 3
NASA Astrophysics Data System (ADS)
Pah, Chin Hee
2010-07-01
We showed that one form of generalized Catalan numbers is the solution to the problem of finding different connected component with finite vertices containing a fixed root for the semi-infinite Cayley tree of order 3. We give the formula for the full graph, Cayley tree of order 3 which is derived from the generalized Catalan numbers. Using ratios of Gamma functions, two upper bounds are given for problem defined on semi-infinite Cayley tree of order 3 as well as the full graph.
U'ren, Jana M; Dalling, James W; Gallery, Rachel E; Maddison, David R; Davis, E Christine; Gibson, Cara M; Arnold, A Elizabeth
2009-04-01
Fungi associated with seeds of tropical trees pervasively affect seed survival and germination, and thus are an important, but understudied, component of forest ecology. Here, we examine the diversity and evolutionary origins of fungi isolated from seeds of an important pioneer tree (Cecropia insignis, Cecropiaceae) following burial in soil for five months in a tropical moist forest in Panama. Our approach, which relied on molecular sequence data because most isolates did not sporulate in culture, provides an opportunity to evaluate several methods currently used to analyse environmental samples of fungi. First, intra- and interspecific divergence were estimated for the nu-rITS and 5.8S gene for four genera of Ascomycota that are commonly recovered from seeds. Using these values we estimated species boundaries for 527 isolates, showing that seed-associated fungi are highly diverse, horizontally transmitted, and genotypically congruent with some foliar endophytes from the same site. We then examined methods for inferring the taxonomic placement and phylogenetic relationships of these fungi, evaluating the effects of manual versus automated alignment, model selection, and inference methods, as well as the quality of BLAST-based identification using GenBank. We found that common methods such as neighbor-joining and Bayesian inference differ in their sensitivity to alignment methods; analyses of particular fungal genera differ in their sensitivity to alignments; and numerous and sometimes intricate disparities exist between BLAST-based versus phylogeny-based identification methods. Lastly, we used our most robust methods to infer phylogenetic relationships of seed-associated fungi in four focal genera, and reconstructed ancestral states to generate preliminary hypotheses regarding the evolutionary origins of this guild. Our results illustrate the dynamic evolutionary relationships among endophytic fungi, pathogens, and seed-associated fungi, and the apparent evolutionary distinctiveness of saprotrophs. Our study also elucidates the diversity, taxonomy, and ecology of an important group of plant-associated fungi and highlights some of the advantages and challenges inherent in the use of ITS data for environmental sampling of fungi.
Wu, Jian-qiang; Wang, Yi-xiang; Yang, Yi; Zhu, Ting-ting; Zhu, Xu-dan
2015-02-01
Crop trees were selected in a 26-year-old even-aged Cunninghamia lanceolata plantation in Lin' an, and compared in plots that were released and unreleased to examine growth and structure responses for 3 years after thinning. Crop tree release significantly increased the mean increments of diameter and volume of individual tree by 1.30 and 1.25 times relative to trees in control stands, respectively. The increments of diameter and volume of crop trees were significantly higher than those of general trees in thinning plots, crop trees and general trees in control plots, which suggested that the responses from different tree types to crop tree release treatment were different. Crop tree release increased the average distances of crop trees to the nearest neighboring trees, reducing competition among crop trees by about 68.2%. 3-year stand volume increment for thinning stands had no significant difference with that of control stands although the number of trees was only 81.5% of the control. Crop trees in thinned plots with diameters over than 14 cm reached 18.0% over 3 years, compared with 12.0% for trees without thinning, suggesting that crop tree release benefited the larger individual trees. The pattern of tree locations in thinning plots tended to be random, complying with the rule that tree distribution pattern changes with growth. Crop tree release in C. lanceolata plantation not only promoted the stand growth, but also optimized the stand structure, benefiting crop trees sustained rapid growth and larger diameter trees production.
Method for estimating potential tree-grade distributions for northeastern forest species
Daniel A. Yaussy; Daniel A. Yaussy
1993-01-01
Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.
Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M
2004-06-12
The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
ERIC Educational Resources Information Center
Flowers, Claudia; Wakeman, Shawnee; Browder, Diane M.; Karvonen, Meagan
2009-01-01
This article describes an alignment procedure, called Links for Academic Learning (LAL), for examining the degree of alignment of alternate assessments based on alternate achievement standards (AA-AAS) to grade-level content standards and instruction. Although some of the alignment criteria are similar to those used in general education…
Romiguier, Jonathan; Ranwez, Vincent; Delsuc, Frédéric; Galtier, Nicolas; Douzery, Emmanuel J P
2013-09-01
Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.
Feng, Yue; Feng, Yue-Mei; Feng, Yang; Lu, Caixia; Liu, Li; Sun, Xiaomei; Dai, Jiejie; Xia, Xueshan
2015-10-01
Chinese tree shrew (Tupaia belangeri chinensis) is a small animal that possess many features, which are valuable in biomedical research, as experimental models. Currently, there are numerous attempts to utilize tree shrews as models for hepatitis C virus (HCV) infection. This study aimed to construct a liver microRNA (miRNA) data of the tree shrew. Three second filial generation tree shrews were used in this study. Total RNA was extracted from each liver of the tree shrew and equal quality mixed, then reverse-transcribed to complementary DNA (cDNA). The cDNAs were amplified by polymerase chain reaction and subjected to high-throughput sequencing. A total of 2060 conserved miRNAs were identified through alignment with the mature miRNAs in miRBase 20.0 database. The gene ontology and Kyoto encyclopedia of genes and genomes analyses of the target genes of the miRNAs revealed several candidate miRNAs, genes and pathways that may involve in the process of HCV infection. The abundance of miR-122 and Let-7 families and their other characteristics provided us more evidences for the utilization of this animal, as a potential model for HCV infection and other related biomedical research. Moreover, 80 novel microRNAs were predicted using the software Mireap. The top 3 abundant miRNAs were validated in other tree samples, based on stem-loop quantitative reverse transcription-polymerase chain reaction. According to the liver microRNA data of Chinese tree shrew, characteristics of the miR-122 and Let-7 families further highlight the suitability of tree shrew as the animal model in HCV research.
Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar
2017-06-01
Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.
The potential of the tree water potential.
Steppe, Kathy
2018-06-12
Non-invasive quantification of tree water potential is one of the grand challenges for assessing the fate of trees and forests in the coming decades. Tree water potential is a robust and direct indicator of tree water status and is preferably used to track how trees, forests and vegetation in general respond to changes in climate and drought. In this issue of Tree Physiology, Dietrich et al. (2018) predict the daily canopy water potential of mature temperate trees from tree water deficit derived from stem diameter variation measurements.
Bachvaroff, Tsvetan R.; Gornik, Sebastian G.; Concepcion, Gregory T.; Waller, Ross F.; Mendez, Gregory S.; Lippmeier, J. Casey; Delwiche, Charles F.
2014-01-01
The alveolates are composed of three major lineages, the ciliates, dinoflagellates, and apicomplexans. Together these ‘protist’ taxa play key roles in primary production and ecology, as well as in illness of humans and other animals. The interface between the dinoflagellate and apicomplexan clades has been an area of recent discovery, blurring the distinction between these two clades. Moreover, phylogenetic analysis has yet to determine the position of basal dinoflagellate clades hence the deepest branches of the dinoflagellate tree currently remain unresolved. Large-scale mRNA sequencing was applied to 11 species of dinoflagellates, including strains of the syndinean genera Hematodinium and Amoebophrya, parasites of crustaceans and dinoflagellates, respectively, to optimize and update the dinoflagellate tree. From the transcriptome-scale data a total of 73 ribosomal protein-coding genes were selected for phylogeny. After individual gene orthology assessment, the genes were concatenated into a >15,000 amino acid alignment with 76 taxa from dinoflagellates, apicomplexans, ciliates, and the outgroup heterokonts. Overall the tree was well resolved and supported, when the data was subsampled with gblocks or constraint trees were tested with the approximately unbiased test. The deepest branches of the dinoflagellate tree can now be resolved with strong support, and provides a clearer view of the evolution of the distinctive traits of dinoflagellates. PMID:24135237
Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian
2013-12-01
Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.
Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.
Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo
2007-01-01
The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.
Quinn, Terrance; Sinkala, Zachariah
2014-01-01
We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in the literature.
snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.
Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M
2012-01-01
The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.
NASA Astrophysics Data System (ADS)
Park, M.; Moon, M.; Park, J.; Cho, S.; Kim, H. S.
2016-12-01
Individual tree growth rates can be affected by various factors such as species, soil fertility, stand development stage, disturbance, and climate etc. To estimate the effect of changes in tree growth rate on the structure and functionality of forest ecosystem in the future, we analyzed the change of species-specific growth trends using the fifth Korea national forest inventory data, which was collected from 2006 to 2010. The ring samples of average tree were collected from nationwide inventory plots and the total number of individual tree ring series was 69,128 covering 185 tree species. Among those, fifty one species with more than 100 tree ring series were used for our analysis. For growth-trend analysis, standardized regional curves of individual species growth were generated from three forest zone in South Korea; subarctic, cool temperate, warm temperate forest zone. Then individual tree ring series was indexed by dividing the growth of the tree by expected growth from standardized regional curves. Then the ratio of all tree ring series were aligned by year and the Spearman's correlation coefficient of each species was calculated. The results show that most of species had increasing growth rates as forests developed after Korean war. For the last thirty years, 67.3% of species including Quercus spp. and Zelkova serrata had positive growth trends, on the other hand, 11.5% of species including Pinus spp. showed negative growth trends probably due to the changes in successional stages in Korean forests and climate change. These trends also vary with climate zone and species. For examples, Pinus densiflora, which showed negative growth trend overall, had steep negative growth trends in boreal and temperate zone, whereas it showed no specific trend in sub-tropical climate zone. Our trend analysis on 51 temperate tree species growth will be essential to predict the temperate forests species change for the this century.
NASA Technical Reports Server (NTRS)
Sugiura, M.; Iyemori, T.; Hoffman, R. A.; Maynard, N. C.; Burch, J. L.; Winningham, J. D.
1984-01-01
The relationships between field-aligned currents, electric fields, and particle fluxes are determined using observations from the polar orbiting low-altitude satellite Dynamics Explorer-2. It is shown that the north-south electric field and the east-west magnetic field components are usually highly correlated in the field-aligned current regions. This proportionality observationally proves that the field-aligned current equals the divergence of the height-integrated ionospheric Pedersen current in the meridional plane to a high degree of approximation. As a general rule, in the evening sector the upward field-aligned currents flow in the boundary plasma sheet region and the downward currents flow in the central plasma sheet region. The current densities determined independently from the plasma and magnetic field measurements are compared. Although the current densities deduced from the two methods are in general agreement, the degree and extent of the agreement vary in individual cases.
NASA Technical Reports Server (NTRS)
Sugiura, M.; Iyemori, T.; Hoffman, R. A.; Maynard, N. C.; Burch, J. L.; Winningham, J. D.
1983-01-01
The relationships between field-aligned currents, electric fields, and particle fluxes are determined using observations from the polar orbiting low-altitude satellite Dynamics Explorer-2. It is shown that the north-south electric field and the east-west magnetic field components are usually highly correlated in the field-aligned current regions. This proportionality observationally proves that the field-aligned current equals the divergence of the height-integrated ionospheric Pedersen current in the meridional plane to a high degree of approximation. As a general rule, in the evening sector the upward field-aligned currents flow in the boundary plasma sheet region and the downward currents flow in the central plasma sheet region. The current densities determined independently from the plasma and magnetic field measurements are compared. Although the current densities deduced from the two methods are in general agreement, the degree and extent of the agreement vary in individual cases.
Franz J. St John; Javier M. Gonzalez; Edwin Pozharski
2010-01-01
In this work glycosyl hydrolase (GH) family 30 (GH30) is analyzed and shown to consist of its currently classified member sequences as well as several homologous sequence groups currently assigned within family GH5. A large scale amino acid sequence alignment and a phylogenetic tree were generated and GH30 groups and subgroups were designated. A partial rearrangement...
Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...
2017-01-19
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Qian; Jun, Se -Ran; Leuze, Michael
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier
2003-01-01
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
Molecular Phylogeny of the Bamboo Sharks (Chiloscyllium spp.)
Masstor, Noor Haslina; Samat, Abdullah; Nor, Shukor Md; Md-Zain, Badrul Munir
2014-01-01
Chiloscyllium, commonly called bamboo shark, can be found inhabiting the waters of the Indo-West Pacific around East Asian countries such as Malaysia, Myanmar, Thailand, Singapore, and Indonesia. The International Union for Conservation of Nature (IUCN) Red List has categorized them as nearly threatened sharks out of their declining population status due to overexploitation. A molecular study was carried out to portray the systematic relationships within Chiloscyllium species using 12S rRNA and cytochrome b gene sequences. Maximum parsimony and Bayesian were used to reconstruct their phylogeny trees. A total of 381 bp sequences' lengths were successfully aligned in the 12S rRNA region, with 41 bp sites being parsimony-informative. In the cytochrome b region, a total of 1120 bp sites were aligned, with 352 parsimony-informative characters. All analyses yield phylogeny trees on which C. indicum has close relationships with C. plagiosum. C. punctatum is sister taxon to both C. indicum and C. plagiosum while C. griseum and C. hasseltii formed their own clade as sister taxa. These Chiloscyllium classifications can be supported by some morphological characters (lateral dermal ridges on the body, coloring patterns, and appearance of hypobranchials and basibranchial plate) that can clearly be used to differentiate each species. PMID:25013766
Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat
2017-01-01
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365
h → μτ and muon g - 2 in the alignment limit of two-Higgs-doublet model
NASA Astrophysics Data System (ADS)
Wang, Lei; Yang, Shuo; Han, Xiao-Fang
2017-06-01
We examine the h → μτ and muon g - 2 in the exact alignment limit of two-Higgs-doublet model. In this case, the couplings of the SM-like Higgs to the SM particles are the same as the Higgs couplings in the SM at the tree level, and the tree-level lepton-flavor-violating coupling hμτ is absent. We assume the lepton-flavor-violating μτ excess observed by CMS to be respectively from the other neutral Higgses, H and A, which almost degenerates with the SM-like Higgs at the 125 GeV. After imposing the relevant theoretical constraints and experimental constraints from the precision electroweak data, B-meson decays, τ decays and Higgs searches, we find that the muon g - 2 anomaly and μτ excess favor the small lepton Yukawa coupling and top Yukawa coupling of the non-SM-like Higgs around 125 GeV, and the lepton-flavor-violating coupling is sensitive to another heavy neutral Higgs mass. In addition, if the μτ excess is from H around 125 GeV, the experimental data of the heavy Higgs decaying into μτ favor mA > 230 GeV for a relatively large H t bar t coupling.
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
NASA Astrophysics Data System (ADS)
Šilhán, Karel
2017-01-01
Dendrogeomorphic methods are frequently used in landslide analyses. Although methods of landslide dating based on tree rings are well developed, they still indicated many questions. The aim of this study was to evaluate the frequently used theoretical scheme based on the event-response relationship. Seventy-four individuals of Norway spruce (Picea abies (L.) Karst.) exhibiting visible external disturbance, were sampled on the Girová landslide (the largest historical flow-like landslide in the Czech Republic). This landslide reactivated in May 2010, and post-landslide tree growth responses were studied in detail. These growth responses were compared with the intensity and occurrence of visible external tree disturbance: tilted stems, damaged root systems, and decapitation. Twenty-nine trees (39.2%) died within one to four years following the 2010 landslide movement. The trees that died following the landslide movement were significantly younger and displayed significantly greater stem tilting than the live trees. Abrupt growth suppression was a more-frequent response among the dead trees, whereas growth release dominated among the live trees. Only two trees (2.7%) created no reaction wood in response to the landslide movement. Forty-four percent of the trees started to produce reaction wood structure after a delay, which generally spanned one year. Some eccentric growth was evident in the tree rings of the landslide year and was significant in the first years following the landslide movement. Missing rings were observed only on the upper sides of the stems, and no false tree rings were observed. The results confirm the general validity of event-response relationship, nevertheless this study points out the limitations and uncertainties of this generally accepted working scheme.
PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.
Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S
2007-10-11
By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
Yu, Xianxian; Duan, Xiaoshan; Zhang, Rui; Fu, Xuehao; Ye, Lingling; Kong, Hongzhi; Xu, Guixia; Shan, Hongyan
2016-01-01
AP1/FUL, SEP, AGL6, and FLC subfamily genes play important roles in flower development. The phylogenetic relationships among them, however, have been controversial, which impedes our understanding of the origin and functional divergence of these genes. One possible reason for the controversy may be the problems caused by changes in the exon-intron structure of genes, which, according to recent studies, may generate non-homologous sites and hamper the homology-based sequence alignment. In this study, we first performed exon-by-exon alignments of these and three outgroup subfamilies (SOC1, AG, and STK). Phylogenetic trees reconstructed based on these matrices show improved resolution and better congruence with species phylogeny. In the context of these phylogenies, we traced evolutionary changes of exon-intron structures in each subfamily. We found that structural changes have occurred frequently following gene duplication and speciation events. Notably, exons 7 and 8 (if present) suffered more structural changes than others. With the knowledge of exon-intron structural changes, we generated more reasonable alignments containing all the focal subfamilies. The resulting trees showed that the SEP subfamily is sister to the monophyletic group formed by AP1/FUL and FLC subfamily genes and that the AGL6 subfamily forms a sister group to the three abovementioned subfamilies. Based on this topology, we inferred the evolutionary history of exon-intron structural changes among different subfamilies. Particularly, we found that the eighth exon originated before the divergence of AP1/FUL, FLC, SEP, and AGL6 subfamilies and degenerated in the ancestral FLC-like gene. These results provide new insights into the origin and evolution of the AP1/FUL, FLC, SEP, and AGL6 subfamilies. PMID:27200066
NASA Technical Reports Server (NTRS)
Cunningham, William C. (Inventor)
1987-01-01
A remotely controlled spray gun is described in which a nozzle and orifice plate are held in precise axial alignment by an alignment member, which in turn is held in alignment with the general outlet of the spray gun by insert. By this arrangement, the precise repeatability of spray patterns is insured.
Dong, Zheng; Zhou, Hongyu; Tao, Peng
2018-02-01
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment
DeBlasio, Dan
2013-01-01
Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for “feature-based accuracy estimator”), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality. PMID:23489379
36 CFR 223.4 - Exchange of trees or portions of trees.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 36 Parks, Forests, and Public Property 2 2010-07-01 2010-07-01 false Exchange of trees or portions of trees. 223.4 Section 223.4 Parks, Forests, and Public Property FOREST SERVICE, DEPARTMENT OF AGRICULTURE SALE AND DISPOSAL OF NATIONAL FOREST SYSTEM TIMBER General Provisions § 223.4 Exchange of trees or...
ERIC Educational Resources Information Center
Bridgman, Anne
2017-01-01
With most 4 year olds in the United States now in center-based early care, the need for aligning instruction from preschool through the early grades (PK-3) has become more pressing. Yet so far there has been little guidance on how to create alignment. Research on PK-3 alignment seeks to provide general principles for creating instructional…
DiCanio, Christian; Nam, Hosung; Whalen, Douglas H.; Timothy Bunnell, H.; Amith, Jonathan D.; García, Rey Castillo
2013-01-01
While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can facilitate the segmentation task for small data sets, even when the target language differs from the training language. The authors also examined whether a phone set with contextualization outperforms a more general one. The accuracy of two forced aligners trained on English (hmalign and p2fa) was assessed using corpus data from Yoloxóchitl Mixtec. Overall, agreement performance was relatively good, with accuracy at 70.9% within 30 ms for hmalign and 65.7% within 30 ms for p2fa. Segmental and tonal categories influenced accuracy as well. For instance, additional stop allophones in hmalign's phone set aided alignment accuracy. Agreement differences between aligners also corresponded closely with the types of data on which the aligners were trained. Overall, using existing alignment systems was found to have potential for making phonetic analysis of small corpora more efficient, with more allophonic phone sets providing better agreement than general ones. PMID:23967953
DiCanio, Christian; Nam, Hosung; Whalen, Douglas H; Bunnell, H Timothy; Amith, Jonathan D; García, Rey Castillo
2013-09-01
While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can facilitate the segmentation task for small data sets, even when the target language differs from the training language. The authors also examined whether a phone set with contextualization outperforms a more general one. The accuracy of two forced aligners trained on English (hmalign and p2fa) was assessed using corpus data from Yoloxóchitl Mixtec. Overall, agreement performance was relatively good, with accuracy at 70.9% within 30 ms for hmalign and 65.7% within 30 ms for p2fa. Segmental and tonal categories influenced accuracy as well. For instance, additional stop allophones in hmalign's phone set aided alignment accuracy. Agreement differences between aligners also corresponded closely with the types of data on which the aligners were trained. Overall, using existing alignment systems was found to have potential for making phonetic analysis of small corpora more efficient, with more allophonic phone sets providing better agreement than general ones.
Accelerated probabilistic inference of RNA structure evolution
Holmes, Ian
2005-01-01
Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387
Kobayashi, Toshiki; Orendurff, Michael S; Zhang, Ming; Boone, David A
2014-05-01
The alignment of transtibial prostheses has a systematic effect on the mean socket reaction moments in amputees. However, understanding their individual differences in response to alignment perturbations is also important for prosthetists to fully utilize the socket reaction moments for dynamic alignment in each unique patient. The aim of this study was to investigate individual responses to alignment perturbations in transtibial prostheses with solid-ankle-cushion-heel feet. A custom instrumented prosthesis alignment component was used to measure the socket reaction moments while walking in 11 amputees with transtibial prostheses under 17 alignment conditions, including 3° and 6° of flexion, extension, abduction, and adduction of the socket, 5mm and 10mm of anterior, posterior, lateral, and medial translation of the socket, and an initial baseline alignment. Coronal moments at 30% of stance and maximum sagittal moments were extracted for comparisons from each amputee. In the coronal plane, varus moment at 30% of stance was generally reduced by adduction or medial translation of the socket in all the amputees. In the sagittal plane, extension moment was generally increased by posterior translation or flexion of the socket; however, this was not necessarily the case for all the amputees. Individual responses to alignment perturbations are not always consistent, and prosthetists would need to be aware of this variance when addressing individual socket reaction moments during dynamic alignment in clinical setting. Copyright © 2014 Elsevier Ltd. All rights reserved.
Phylogenetic Invariants for Metazoan Mitochondrial Genome Evolution.
Sankoff; Blanchette
1998-01-01
The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable N-tuples, which are identically zero, over all choices of branch length, for some trees. Evaluating the invariants associated with all possible trees, using observed N-tuple frequencies over all sequence positions, enables us to rapidly infer the generating tree. An aspect of evolution at the genomic level much studied recently is the rearrangements of gene order along the chromosome from one species to another. Instead of the substitutions responsible for sequence evolution, we examine the non-local processes responsible for genome rearrangements such as inversion of arbitrarily long segments of chromosomes. By treating the potential adjacency of each possible pair of genes as a position", an appropriate substitution" model can be recognized as governing the rearrangement process, and a probabilistically principled phylogenetic inference can be set up. We calculate the invariants for this process for N=5, and apply them to mitochondrial genome data from coelomate metazoans, showing how they resolve key aspects of branching order.
Modeling non-linear growth responses to temperature and hydrology in wetland trees
NASA Astrophysics Data System (ADS)
Keim, R.; Allen, S. T.
2016-12-01
Growth responses of wetland trees to flooding and climate variations are difficult to model because they depend on multiple, apparently interacting factors, but are a critical link in hydrological control of wetland carbon budgets. To more generally understand tree growth to hydrological forcing, we modeled non-linear responses of tree ring growth to flooding and climate at sub-annual time steps, using Vaganov-Shashkin response functions. We calibrated the model to six baldcypress tree-ring chronologies from two hydrologically distinct sites in southern Louisiana, and tested several hypotheses of plasticity in wetlands tree responses to interacting environmental variables. The model outperformed traditional multiple linear regression. More importantly, optimized response parameters were generally similar among sites with varying hydrological conditions, suggesting generality to the functions. Model forms that included interacting responses to multiple forcing factors were more effective than were single response functions, indicating the principle of a single limiting factor is not correct in wetlands and both climatic and hydrological variables must be considered in predicting responses to hydrological or climate change.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically
Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel
2015-01-01
Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
Chandrasekaran, Srinivas Niranj; Yardimci, Galip Gürkan; Erdogan, Ozgün; Roach, Jeffrey; Carter, Charles W.
2013-01-01
We tested the idea that ancestral class I and II aminoacyl-tRNA synthetases arose on opposite strands of the same gene. We assembled excerpted 94-residue Urgenes for class I tryptophanyl-tRNA synthetase (TrpRS) and class II Histidyl-tRNA synthetase (HisRS) from a diverse group of species, by identifying and catenating three blocks coding for secondary structures that position the most highly conserved, active-site residues. The codon middle-base pairing frequency was 0.35 ± 0.0002 in all-by-all sense/antisense alignments for 211 TrpRS and 207 HisRS sequences, compared with frequencies between 0.22 ± 0.0009 and 0.27 ± 0.0005 for eight different representations of the null hypothesis. Clustering algorithms demonstrate further that profiles of middle-base pairing in the synthetase antisense alignments are correlated along the sequences from one species-pair to another, whereas this is not the case for similar operations on sets representing the null hypothesis. Most probable reconstructed sequences for ancestral nodes of maximum likelihood trees show that middle-base pairing frequency increases to approximately 0.42 ± 0.002 as bacterial trees approach their roots; ancestral nodes from trees including archaeal sequences show a less pronounced increase. Thus, contemporary and reconstructed sequences all validate important bioinformatic predictions based on descent from opposite strands of the same ancestral gene. They further provide novel evidence for the hypothesis that bacteria lie closer than archaea to the origin of translation. Moreover, the inverse polarity of genetic coding, together with a priori α-helix propensities suggest that in-frame coding on opposite strands leads to similar secondary structures with opposite polarity, as observed in TrpRS and HisRS crystal structures. PMID:23576570
Creating a medical dictionary using word alignment: the influence of sources and resources.
Nyström, Mikael; Merkel, Magnus; Petersson, Håkan; Ahlfeldt, Hans
2007-11-23
Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.
Creating a medical dictionary using word alignment: The influence of sources and resources
Nyström, Mikael; Merkel, Magnus; Petersson, Håkan; Åhlfeldt, Hans
2007-01-01
Background Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10. PMID:18036221
PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction
Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.
2008-01-01
A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945
16 CFR 501.2 - Christmas tree ornaments.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 16 Commercial Practices 1 2010-01-01 2010-01-01 false Christmas tree ornaments. 501.2 Section 501.2 Commercial Practices FEDERAL TRADE COMMISSION RULES, REGULATIONS, STATEMENT OF GENERAL POLICY OR... PROHIBITIONS UNDER PART 500 § 501.2 Christmas tree ornaments. Christmas tree ornaments packaged and labeled for...
Probabilistic atlas based labeling of the cerebral vessel tree
NASA Astrophysics Data System (ADS)
Van de Giessen, Martijn; Janssen, Jasper P.; Brouwer, Patrick A.; Reiber, Johan H. C.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke
2015-03-01
Preoperative imaging of the cerebral vessel tree is essential for planning therapy on intracranial stenoses and aneurysms. Usually, a magnetic resonance angiography (MRA) or computed tomography angiography (CTA) is acquired from which the cerebral vessel tree is segmented. Accurate analysis is helped by the labeling of the cerebral vessels, but labeling is non-trivial due to anatomical topological variability and missing branches due to acquisition issues. In recent literature, labeling the cerebral vasculature around the Circle of Willis has mainly been approached as a graph-based problem. The most successful method, however, requires the definition of all possible permutations of missing vessels, which limits application to subsets of the tree and ignores spatial information about the vessel locations. This research aims to perform labeling using probabilistic atlases that model spatial vessel and label likelihoods. A cerebral vessel tree is aligned to a probabilistic atlas and subsequently each vessel is labeled by computing the maximum label likelihood per segment from label-specific atlases. The proposed method was validated on 25 segmented cerebral vessel trees. Labeling accuracies were close to 100% for large vessels, but dropped to 50-60% for small vessels that were only present in less than 50% of the set. With this work we showed that using solely spatial information of the vessel labels, vessel segments from stable vessels (>50% presence) were reliably classified. This spatial information will form the basis for a future labeling strategy with a very loose topological model.
Generalized Processing Tree Models: Jointly Modeling Discrete and Continuous Variables.
Heck, Daniel W; Erdfelder, Edgar; Kieslich, Pascal J
2018-05-24
Multinomial processing tree models assume that discrete cognitive states determine observed response frequencies. Generalized processing tree (GPT) models extend this conceptual framework to continuous variables such as response times, process-tracing measures, or neurophysiological variables. GPT models assume finite-mixture distributions, with weights determined by a processing tree structure, and continuous components modeled by parameterized distributions such as Gaussians with separate or shared parameters across states. We discuss identifiability, parameter estimation, model testing, a modeling syntax, and the improved precision of GPT estimates. Finally, a GPT version of the feature comparison model of semantic categorization is applied to computer-mouse trajectories.
Multilevel Evaluation Alignment: An Explication of a Four-Step Model
ERIC Educational Resources Information Center
Yang, Huilan; Shen, Jianping; Cao, Honggao; Warfield, Charles
2004-01-01
Using the evaluation work on the W.K. Kellogg Foundation's Unleashing Resources Initiative as an example, in this article we explicate a general four-step model appropriate for multilevel evaluation alignment. We review the relevant literature, argue for the need for evaluation alignment in a multilevel context, explain the four-step model,…
Teacher Education, Experience, and the Practice of Aligned Instruction
ERIC Educational Resources Information Center
Polikoff, Morgan S.
2013-01-01
Research over the past two decades has shown the alignment of teachers’ instruction with state standards is generally weak. Proposing that alignment is a useful measure of teachers' curricular knowledge (Shulman, 1986), this study uses a large database of teacher reports of their content coverage to understand the relationship of teacher…
What happens to living cull trees left after heavy cutting in mixed hardwood stands?
George R., Jr. Trimble; Henry Clay Smith
1963-01-01
In the Appalachian Mountains, the logging operator usually cuts only those trees that he thinks will yield a profit, and leaves the trees that appear to be unprofitable. Generally these unprofitable trees are either below merchantable size or are culls-trees of merchantable size that contain too little sound material to justify harvesting costs.
Growth of black walnut trees in eight midwestern states -- a provenance test.
Calvin F. Bey
1973-01-01
At age six, survival of black walnut trees was not related to latitude of source at six out of eight locations. Trees from as far as 200 miles south of the planting generally grew as large or larger than trees from local or northern sources.
Tree-Ring-Based Reconstruction of Precipitation in the Bighorn Basin, Wyoming, since 1260 a.d.
NASA Astrophysics Data System (ADS)
Gray, Stephen T.; Fastie, Christopher L.; Jackson, Stephen T.; Betancourt, Julio L.
2004-10-01
C-semiring Frameworks for Minimum Spanning Tree Problems
NASA Astrophysics Data System (ADS)
Bistarelli, Stefano; Santini, Francesco
In this paper we define general algebraic frameworks for the Minimum Spanning Tree problem based on the structure of c-semirings. We propose general algorithms that can compute such trees by following different cost criteria, which must be all specific instantiation of c-semirings. Our algorithms are extensions of well-known procedures, as Prim or Kruskal, and show the expressivity of these algebraic structures. They can deal also with partially-ordered costs on the edges.
Measuring the distance between multiple sequence alignments.
Blackburne, Benjamin P; Whelan, Simon
2012-02-15
Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.
Choosing and Using Introns in Molecular Phylogenetics
Creer, Simon
2007-01-01
Introns are now commonly used in molecular phylogenetics in an attempt to recover gene trees that are concordant with species trees, but there are a range of genomic, logistical and analytical considerations that are infrequently discussed in empirical studies that utilize intron data. This review outlines expedient approaches for locus selection, overcoming paralogy problems, recombination detection methods and the identification and incorporation of LVHs in molecular systematics. A range of parsimony and Bayesian analytical approaches are also described in order to highlight the methods that can currently be employed to align sequences and treat indels in subsequent analyses. By covering the main points associated with the generation and analysis of intron data, this review aims to provide a comprehensive introduction to using introns (or any non-coding nuclear data partition) in contemporary phylogenetics. PMID:19461984
SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees
Mallo, Diego; De Oliveira Martins, Leonardo; Posada, David
2016-01-01
We present a fast and flexible software package—SimPhy—for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases. PMID:26526427
Leaf area compounds height-related hydraulic costs of water transport in Oregon White Oak trees.
N. Phillips; B. J. Bond; N. G. McDowell; Michael G. Ryan; A. Schauer
2003-01-01
The ratio of leaf to sapwood area generally decreases with tree size, presumably to moderate hydraulic costs of tree height. This study assessed consequences of tree size and leaf area on water flux in Quercus garryana Dougl. ex. Hook (Oregon White Oak), a species in which leaf to sapwood area ratio increases with tree size. We tested hypotheses that...
Baldwin, Elizabeth; Plotto, Anne; Bai, Jinhe; Manthey, John; Zhao, Wei; Raithore, Smita; Irey, Mike
2018-03-21
Orange trees affected by huanglongbing (HLB) exhibit excessive fruit drop, and fruit loosely attached to the tree may have inferior flavor. Fruit were collected from healthy and HLB-infected ( Candidatus liberibacter asiaticus) 'Hamlin' and 'Valencia' trees. Prior to harvest, the trees were shaken, fruit that dropped collected, tree-retained fruit harvested, and all fruit juiced. For chemical analyses, sugars and acids were generally lowest in HLB dropped (HLB-D) fruit juice compared to nonshaken healthy (H), healthy retained (H-R), and healthy dropped fruit (H-D) in early season (December) but not for the late season (January) 'Hamlin' or 'Valencia' except for sugar/acid ratio. The bitter limonoids, many flavonoids, and terpenoid volatiles were generally higher in HLB juice, especially HLB-D juice, compared to the other samples. The lower sugars, higher bitter limonoids, flavonoids, and terpenoid volatiles in HLB-D fruit, loosely attached to the tree, contributed to off-flavor, as was confirmed by sensory analyses.
Reasoning over taxonomic change: exploring alignments for the Perelleschus use case.
Franz, Nico M; Chen, Mingmin; Yu, Shizhuo; Kianmajd, Parisa; Bowers, Shawn; Ludäscher, Bertram
2015-01-01
Classifications and phylogenetic inferences of organismal groups change in light of new insights. Over time these changes can result in an imperfect tracking of taxonomic perspectives through the re-/use of Code-compliant or informal names. To mitigate these limitations, we introduce a novel approach for aligning taxonomies through the interaction of human experts and logic reasoners. We explore the performance of this approach with the Perelleschus use case of Franz & Cardona-Duque (2013). The use case includes six taxonomies published from 1936 to 2013, 54 taxonomic concepts (i.e., circumscriptions of names individuated according to their respective source publications), and 75 expert-asserted Region Connection Calculus articulations (e.g., congruence, proper inclusion, overlap, or exclusion). An Open Source reasoning toolkit is used to analyze 13 paired Perelleschus taxonomy alignments under heterogeneous constraints and interpretations. The reasoning workflow optimizes the logical consistency and expressiveness of the input and infers the set of maximally informative relations among the entailed taxonomic concepts. The latter are then used to produce merge visualizations that represent all congruent and non-congruent taxonomic elements among the aligned input trees. In this small use case with 6-53 input concepts per alignment, the information gained through the reasoning process is on average one order of magnitude greater than in the input. The approach offers scalable solutions for tracking provenance among succeeding taxonomic perspectives that may have differential biases in naming conventions, phylogenetic resolution, ingroup and outgroup sampling, or ostensive (member-referencing) versus intensional (property-referencing) concepts and articulations.
76 FR 71241 - Christmas Tree Promotion, Research, and Information Order; Stay of Regulations
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-17
...-0008-FR-1A] RIN 0581-AD00 Christmas Tree Promotion, Research, and Information Order; Stay of...-funded promotion, research, and information program for fresh cut Christmas trees, effective November 9..., including the Christmas tree industry and the general public, an opportunity to become more familiar with...
7 CFR 82.5 - General requirements.
Code of Federal Regulations, 2010 CFR
2010-01-01
... program, the trees to be removed must be fruit-bearing and have been planted after the 1987 and before the 2003 calendar years. Abandoned orchards and dead trees will not qualify. The block of trees for removal must be easily definable by separations from other blocks of eligible trees and contain at least 1,000...
7 CFR 81.5 - General requirements.
Code of Federal Regulations, 2010 CFR
2010-01-01
... program, the trees to be removed must have yielded at least 1.5 tons of dried prune/plums per net-planted...-plum trees. Abandoned orchards and dead trees will not qualify. In new orchards diverted, qualifying trees must be at least 5 years of age (6th leaf), contain at least two scaffolds, and be capable of...
The Tree Worker's Manual. [Revised.
ERIC Educational Resources Information Center
Lilly, S. J.
This manual acquaints readers with the general operations of the tree care industry. The manual covers subjects important to a tree worker and serves as a training aid for workers at the entry level as tree care professionals. Each chapter begins with a set of objectives and may include figures, tables, and photographs. Ten chapters are included:…
7 CFR 82.5 - General requirements.
Code of Federal Regulations, 2013 CFR
2013-01-01
... program, the trees to be removed must be fruit-bearing and have been planted after the 1987 and before the 2003 calendar years. Abandoned orchards and dead trees will not qualify. The block of trees for removal must be easily definable by separations from other blocks of eligible trees and contain at least 1,000...
36 CFR 294.42 - Prohibition on tree cutting, sale, or removal.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 36 Parks, Forests, and Public Property 2 2013-07-01 2013-07-01 false Prohibition on tree cutting... OF AGRICULTURE SPECIAL AREAS Colorado Roadless Area Management § 294.42 Prohibition on tree cutting, sale, or removal. (a) General. Trees may not be cut, sold, or removed in Colorado Roadless Areas...
7 CFR 81.5 - General requirements.
Code of Federal Regulations, 2014 CFR
2014-01-01
... separations from other blocks and contain at least 1,000 eligible trees or comprise an entire orchard. (b) Any... program, the trees to be removed must have yielded at least 1.5 tons of dried prune/plums per net-planted...-plum trees. Abandoned orchards and dead trees will not qualify. In new orchards diverted, qualifying...
7 CFR 81.5 - General requirements.
Code of Federal Regulations, 2013 CFR
2013-01-01
... separations from other blocks and contain at least 1,000 eligible trees or comprise an entire orchard. (b) Any... program, the trees to be removed must have yielded at least 1.5 tons of dried prune/plums per net-planted...-plum trees. Abandoned orchards and dead trees will not qualify. In new orchards diverted, qualifying...
7 CFR 82.5 - General requirements.
Code of Federal Regulations, 2012 CFR
2012-01-01
... program, the trees to be removed must be fruit-bearing and have been planted after the 1987 and before the 2003 calendar years. Abandoned orchards and dead trees will not qualify. The block of trees for removal must be easily definable by separations from other blocks of eligible trees and contain at least 1,000...
7 CFR 82.5 - General requirements.
Code of Federal Regulations, 2011 CFR
2011-01-01
... program, the trees to be removed must be fruit-bearing and have been planted after the 1987 and before the 2003 calendar years. Abandoned orchards and dead trees will not qualify. The block of trees for removal must be easily definable by separations from other blocks of eligible trees and contain at least 1,000...
7 CFR 81.5 - General requirements.
Code of Federal Regulations, 2012 CFR
2012-01-01
... separations from other blocks and contain at least 1,000 eligible trees or comprise an entire orchard. (b) Any... program, the trees to be removed must have yielded at least 1.5 tons of dried prune/plums per net-planted...-plum trees. Abandoned orchards and dead trees will not qualify. In new orchards diverted, qualifying...
7 CFR 82.5 - General requirements.
Code of Federal Regulations, 2014 CFR
2014-01-01
... program, the trees to be removed must be fruit-bearing and have been planted after the 1987 and before the 2003 calendar years. Abandoned orchards and dead trees will not qualify. The block of trees for removal must be easily definable by separations from other blocks of eligible trees and contain at least 1,000...
7 CFR 81.5 - General requirements.
Code of Federal Regulations, 2011 CFR
2011-01-01
... separations from other blocks and contain at least 1,000 eligible trees or comprise an entire orchard. (b) Any... program, the trees to be removed must have yielded at least 1.5 tons of dried prune/plums per net-planted...-plum trees. Abandoned orchards and dead trees will not qualify. In new orchards diverted, qualifying...
36 CFR 294.42 - Prohibition on tree cutting, sale, or removal.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 36 Parks, Forests, and Public Property 2 2014-07-01 2014-07-01 false Prohibition on tree cutting... OF AGRICULTURE SPECIAL AREAS Colorado Roadless Area Management § 294.42 Prohibition on tree cutting, sale, or removal. (a) General. Trees may not be cut, sold, or removed in Colorado Roadless Areas...
A generalized system of models forecasting Central States tree growth.
Stephen R. Shifley
1987-01-01
Describes the development and testing of a system of individual tree-based growth projection models applicable to species in Indiana, Missouri, and Ohio. Annual tree basal area growth is estimated as a function of tree size, crown ratio, stand density, and site index. Models are compatible with the STEMS and TWIGS Projection System.
Impersonating the Standard Model Higgs boson: Alignment without decoupling
Carena, Marcela; Low, Ian; Shah, Nausheen R.; ...
2014-04-03
In models with an extended Higgs sector there exists an alignment limit, in which the lightest CP-even Higgs boson mimics the Standard Model Higgs. The alignment limit is commonly associated with the decoupling limit, where all non-standard scalars are significantly heavier than the Z boson. However, alignment can occur irrespective of the mass scale of the rest of the Higgs sector. In this work we discuss the general conditions that lead to “alignment without decoupling”, therefore allowing for the existence of additional non-standard Higgs bosons at the weak scale. The values of tan β for which this happens are derivedmore » in terms of the effective Higgs quartic couplings in general two-Higgs-doublet models as well as in supersymmetric theories, including the MSSM and the NMSSM. In addition, we study the information encoded in the variations of the SM Higgs-fermion couplings to explore regions in the m A – tan β parameter space.« less
Code of Federal Regulations, 2013 CFR
2013-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE CHRISTMAS TREE PROMOTION, RESEARCH... eligible domestic producers and importers of Christmas trees favor the continuance, amendment, suspension, or termination of the Christmas Tree Promotion, Research, and Information Order shall be conducted in...
Code of Federal Regulations, 2014 CFR
2014-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE CHRISTMAS TREE PROMOTION, RESEARCH... eligible domestic producers and importers of Christmas trees favor the continuance, amendment, suspension, or termination of the Christmas Tree Promotion, Research, and Information Order shall be conducted in...
Code of Federal Regulations, 2012 CFR
2012-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE CHRISTMAS TREE PROMOTION, RESEARCH... eligible domestic producers and importers of Christmas trees favor the continuance, amendment, suspension, or termination of the Christmas Tree Promotion, Research, and Information Order shall be conducted in...
78 FR 24665 - Gypsy Moth Generally Infested Areas; Additions in Wisconsin
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-26
... forest, shade, and commercial trees such as nursery stock and Christmas trees. The gypsy moth regulations... tree growers, and 2 nurseries. We expect that most if not all of these businesses are small according...
Ronald E. McRoberts; Paolo Moser; Laio Zimermann Oliveira; Alexander C. Vibrans
2015-01-01
Forest inventory estimates of tree volume for large areas are typically calculated by adding the model predictions of volumes for individual trees at the plot level, calculating the mean over plots, and expressing the result on a per unit area basis. The uncertainty in the model predictions is generally ignored, with the result that the precision of the large-area...
On simulated annealing phase transitions in phylogeny reconstruction.
Strobl, Maximilian A R; Barker, Daniel
2016-08-01
Phylogeny reconstruction with global criteria is NP-complete or NP-hard, hence in general requires a heuristic search. We investigate the powerful, physically inspired, general-purpose heuristic simulated annealing, applied to phylogeny reconstruction. Simulated annealing mimics the physical process of annealing, where a liquid is gently cooled to form a crystal. During the search, periods of elevated specific heat occur, analogous to physical phase transitions. These simulated annealing phase transitions play a crucial role in the outcome of the search. Nevertheless, they have received comparably little attention, for phylogeny or other optimisation problems. We analyse simulated annealing phase transitions during searches for the optimal phylogenetic tree for 34 real-world multiple alignments. In the same way in which melting temperatures differ between materials, we observe distinct specific heat profiles for each input file. We propose this reflects differences in the search landscape and can serve as a measure for problem difficulty and for suitability of the algorithm's parameters. We discuss application in algorithmic optimisation and as a diagnostic to assess parameterisation before computationally costly, large phylogeny reconstructions are launched. Whilst the focus here lies on phylogeny reconstruction under maximum parsimony, it is plausible that our results are more widely applicable to optimisation procedures in science and industry. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Perception of socket alignment perturbations in amputees with transtibial prostheses.
Boone, David A; Kobayashi, Toshiki; Chou, Teri G; Arabian, Adam K; Coleman, Kim L; Orendurff, Michael S; Zhang, Ming
2012-01-01
A person with amputation's subjective perception is the only tool available to describe fit and comfort to a prosthetist. However, few studies have investigated the effect of alignment on this perception. The aim of this article is to determine whether people with amputation could perceive the alignment perturbations of their prostheses and effectively communicate them. A randomized controlled perturbation of angular (3 and 6 degrees) and translational (5 and 10 mm) alignments in the sagittal (flexion, extension, and anterior and posterior translations) and coronal (abduction, adduction, and medial and lateral translations) planes were induced from an aligned condition in 11 subjects with transtibial prostheses. The perception was evaluated when standing (static) and immediately after walking (dynamic) using software that used a visual analog scale under each alignment condition. In the coronal plane, Friedman test demonstrated general statistical differences in static (p < 0.001) and dynamic (p < 0.001) measures of perceptions with angular perturbations. In the sagittal plane, it also demonstrated general statistical differences in late-stance dynamic measures of perceptions (p < 0.001) with angular perturbations, as well as in early-stance dynamic measures of perceptions (p < 0.05) with translational perturbations. Fisher exact test suggested that people with amputation's perceptions were good indicators for coronal angle malalignments but less reliable when defining other alignment conditions.
Photosynthetic capacity peaks at intermediate size in temperate deciduous trees.
Thomas, Sean C
2010-05-01
Studies of age-related changes in leaf functional biology have generally been based on dichotomous comparisons of young and mature individuals (e.g., saplings and mature canopy trees), with little data available to describe changes through the entire ontogeny of trees, particularly of broadleaf angiosperms. Leaf-level gas-exchange and morphological parameters were quantified in situ in the upper canopy of trees acclimated to high light conditions, spanning a wide range of ontogenetic stages from saplings (approximately 1 cm in stem diameter) to trees >60 cm d.b.h. and nearing their maximum lifespan, in three temperate deciduous tree species in central Ontario, Canada. Traits associated with growth performance, including leaf photosynthetic capacity (expressed on either an area, mass or leaf N basis), stomatal conductance, leaf size and leaf N content, generally showed a unimodal ('hump-shaped') pattern, with peak values at an intermediate ontogenetic stage. In contrast, leaf mass per area (LMA) and related morphological parameters (leaf thickness, leaf tissue density, leaf C content) increased monotonically with tree size, as did water-use efficiency; these monotonic relationships were well described by simple allometric functions of the form Y = aX(b). For traits showing unimodal patterns, tree size corresponding to the trait maximum differed markedly among traits: all three species showed a similar pattern in which the peak for leaf size occurred in trees approximately 2-6 cm d.b.h., followed by leaf chemical traits and photosynthetic capacity on a mass or leaf N basis and finally by photosynthetic capacity on a leaf area basis, which peaked approximately at the size of reproductive onset. It is argued that ontogenetic increases in photosynthetic capacity and related traits early in tree ontogeny are general among relatively shade-tolerant tree species that have a low capacity for leaf-level acclimation, as are declines in this set of traits late in tree ontogeny.
NASA Technical Reports Server (NTRS)
English, Thomas
2005-01-01
A standard tool of reliability analysis used at NASA-JSC is the event tree. An event tree is simply a probability tree, with the probabilities determining the next step through the tree specified at each node. The nodal probabilities are determined by a reliability study of the physical system at work for a particular node. The reliability study performed at a node is typically referred to as a fault tree analysis, with the potential of a fault tree existing.for each node on the event tree. When examining an event tree it is obvious why the event tree/fault tree approach has been adopted. Typical event trees are quite complex in nature, and the event tree/fault tree approach provides a systematic and organized approach to reliability analysis. The purpose of this study was two fold. Firstly, we wanted to explore the possibility that a semi-Markov process can create dependencies between sojourn times (the times it takes to transition from one state to the next) that can decrease the uncertainty when estimating time to failures. Using a generalized semi-Markov model, we studied a four element reliability model and were able to demonstrate such sojourn time dependencies. Secondly, we wanted to study the use of semi-Markov processes to introduce a time variable into the event tree diagrams that are commonly developed in PRA (Probabilistic Risk Assessment) analyses. Event tree end states which change with time are more representative of failure scenarios than are the usual static probability-derived end states.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.
Wise, Michael J
2016-01-01
Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees
2016-01-01
Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa. PMID:27898695
Basin-Wide Amazon Forest Tree Mortality From a Large 2005 Storm
NASA Astrophysics Data System (ADS)
Negron Juarez, R. I.; Chambers, J. Q.; Guimaraes, G.; Zeng, H.; Raupp, C.; Marra, D. M.; Ribeiro, G.; Saatchi, S. S.; Higuchi, N.
2010-12-01
Blowdowns are a recurrent characteristic of Amazon forests and are produced, among others, by squall lines. Squall lines are aligned clusters (typical length of 1000 km, width of 200 km) of deep convective cells that produce heavy rainfall during the dry season and significant rainfall during the wet season. These squall lines (accompanied by intense downbursts from convective cells) have been associated with large blowdowns characterized by uprooted, snapped trees, and trees being dragged down by other falling trees. Most squall lines in Amazonia form along the northeastern coast of South America as sea breeze-induced instability lines and propagate inside the continent. They occur frequently (~4 times per month), and can reach the central and even extreme western parts of Amazonia. Squall lines can also be generated inside the Amazon and propagate toward the equator. In January 2005 a squall line propagated from south to north across the entire Amazon basin producing widespread forest tree mortality and contributed to the elevated mortality observed that year. Over the Manaus region (3.4 x104 km2), disturbed forest patches generated by the squall produced a mortality of 0.3-0.5 million trees, equivalent to 30% of the observed annual deforestation reported in 2005 over the same area. The elevated mortality observed in the Central Amazon in 2005 is unlikely to be related to the 2005 Amazon drought since drought did not affect Central or Eastern Amazonia. Assuming a similar rate of forest mortality across the basin, the squall line could have potentially produced tree mortality estimated at 542 ± 121 million trees, equivalent to 23% of the mean annual biomass accumulation estimated for these forests. Our results highlight the vulnerability of Amazon trees to wind-driven mortality associated with convective storms. This vulnerability is likely to increase in a warming climate with models projecting an increase in storm intensity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carlson, C.L.; Adriano, D.C.
Differences in aboveground tissue concentrations of trace elements were assessed for sweetgum (Liquidambar styraciflua L.) and sycamore (Plantanus occidentalis L.) growing on two abandoned coal fly ash basins and a control soil. The wet basin (pH = 5.58) had originally received precipitator ash in an ash-water slurry, while the dry basin (pH = 8.26) had received both precipitator and bottom ash in dry form. In general, trees from the wet basin exhibited elevated trace element concentrations in comparison to the controls, while the dry basin trees exhibited reduced concentrations. On eof the most striking differenced in elemental concentrations among themore » ash basin and control trees was observed for Mn, with the control trees exhibiting concentrations orders of magnitude greater than the ash basin trees. Differences in foliar trace element concentrations among the sites can generally be explained by differences in substrate trace element concentrations and/or substrate pH. While trees from the wet ash basin generally had the highest trace element concentrations, these trees also attained the greatest height and diameter growth, suggesting that the elevated trace element concentrations in the wet basin substrate are not limiting the establishment of these two species. The greater height and diameter growth of the wet basin trees is presumably a result of the greater water-holding capacity of the substrate on this site. Differences in growth and tissue concentrations between sweetgum and sycamore highlight the importance of using more than one species when assessing metal toxicity or deficiency on a given substrate.« less
Code of Federal Regulations, 2013 CFR
2013-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE CHRISTMAS TREE PROMOTION, RESEARCH, AND INFORMATION ORDER Christmas Tree Promotion, Research, and Information Order Definitions § 1214.18... favorable image of Christmas trees to the general public with the intent of improving the perception and...
Code of Federal Regulations, 2014 CFR
2014-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE CHRISTMAS TREE PROMOTION, RESEARCH, AND INFORMATION ORDER Christmas Tree Promotion, Research, and Information Order Definitions § 1214.18... favorable image of Christmas trees to the general public with the intent of improving the perception and...
Code of Federal Regulations, 2012 CFR
2012-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE CHRISTMAS TREE PROMOTION, RESEARCH, AND INFORMATION ORDER Christmas Tree Promotion, Research, and Information Order Definitions § 1214.18... favorable image of Christmas trees to the general public with the intent of improving the perception and...
Modeling and experimental characterization of electromigration in interconnect trees
NASA Astrophysics Data System (ADS)
Thompson, C. V.; Hau-Riege, S. P.; Andleigh, V. K.
1999-11-01
Most modeling and experimental characterization of interconnect reliability is focussed on simple straight lines terminating at pads or vias. However, laid-out integrated circuits often have interconnects with junctions and wide-to-narrow transitions. In carrying out circuit-level reliability assessments it is important to be able to assess the reliability of these more complex shapes, generally referred to as `trees.' An interconnect tree consists of continuously connected high-conductivity metal within one layer of metallization. Trees terminate at diffusion barriers at vias and contacts, and, in the general case, can have more than one terminating branch when they include junctions. We have extended the understanding of `immortality' demonstrated and analyzed for straight stud-to-stud lines, to trees of arbitrary complexity. This leads to a hierarchical approach for identifying immortal trees for specific circuit layouts and models for operation. To complete a circuit-level-reliability analysis, it is also necessary to estimate the lifetimes of the mortal trees. We have developed simulation tools that allow modeling of stress evolution and failure in arbitrarily complex trees. We are testing our models and simulations through comparisons with experiments on simple trees, such as lines broken into two segments with different currents in each segment. Models, simulations and early experimental results on the reliability of interconnect trees are shown to be consistent.
The algebra of the general Markov model on phylogenetic trees and networks.
Sumner, J G; Holland, B R; Jarvis, P D
2012-04-01
It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.
Conserved structure and inferred evolutionary history of long terminal repeats (LTRs)
2013-01-01
Background Long terminal repeats (LTRs, consisting of U3-R-U5 portions) are important elements of retroviruses and related retrotransposons. They are difficult to analyse due to their variability. The aim was to obtain a more comprehensive view of structure, diversity and phylogeny of LTRs than hitherto possible. Results Hidden Markov models (HMM) were created for 11 clades of LTRs belonging to Retroviridae (class III retroviruses), animal Metaviridae (Gypsy/Ty3) elements and plant Pseudoviridae (Copia/Ty1) elements, complementing our work with Orthoretrovirus HMMs. The great variation in LTR length of plant Metaviridae and the few divergent animal Pseudoviridae prevented building HMMs from both of these groups. Animal Metaviridae LTRs had the same conserved motifs as retroviral LTRs, confirming that the two groups are closely related. The conserved motifs were the short inverted repeats (SIRs), integrase recognition signals (5´TGTTRNR…YNYAACA 3´); the polyadenylation signal or AATAAA motif; a GT-rich stretch downstream of the polyadenylation signal; and a less conserved AT-rich stretch corresponding to the core promoter element, the TATA box. Plant Pseudoviridae LTRs differed slightly in having a conserved TATA-box, TATATA, but no conserved polyadenylation signal, plus a much shorter R region. The sensitivity of the HMMs for detection in genomic sequences was around 50% for most models, at a relatively high specificity, suitable for genome screening. The HMMs yielded consensus sequences, which were aligned by creating an HMM model (a ‘Superviterbi’ alignment). This yielded a phylogenetic tree that was compared with a Pol-based tree. Both LTR and Pol trees supported monophyly of retroviruses. In both, Pseudoviridae was ancestral to all other LTR retrotransposons. However, the LTR trees showed the chromovirus portion of Metaviridae clustering together with Pseudoviridae, dividing Metaviridae into two portions with distinct phylogeny. Conclusion The HMMs clearly demonstrated a unitary conserved structure of LTRs, supporting that they arose once during evolution. We attempted to follow the evolution of LTRs by tracing their functional foundations, that is, acquisition of RNAse H, a combined promoter/ polyadenylation site, integrase, hairpin priming and the primer binding site (PBS). Available information did not support a simple evolutionary chain of events. PMID:23369192
Some correlations between sugar maple tree characteristics and sap and sugar yields
Barton M. Blum
1971-01-01
Simple correlation coefficients between various characteristics of sugar maple trees and sap sugar concentration, sap volume yield, and total sugar production are given for the 1968 sap season. Correlation coefficients in general indicated that individual tree characteristics that express tree and crown size are significantly related to sap volume yield and total sugar...
Estimating moisture content of tree-length roundwood
Alexander Clark; Richard F. Daniels
2000-01-01
The green weight of southern pine tree-length roundwood delivered to the pulp mill is generally known. However, for optimum mill efficiency it is desirable to know dry weight. The moisture content of tree-length pine logs is quite variable. The moisture content of pine tree-length logs increases significantly with increasing stem height. Moisture content also varies...
Machine Learning Through Signature Trees. Applications to Human Speech.
ERIC Educational Resources Information Center
White, George M.
A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…
Releasing 75- to 80-year-old Appalachian hardwood sawtimber trees--5-year d.b.h. response
H.C. Smith; G.W. Miller
1991-01-01
Generally, mature trees on good growing sites are seldom thinned or released. Instead, at maturity the trees are harvested. Data were summarized from north-central West Virginia study areas (northern red oak site index 70 feet and above) where mature trees were released on all sides of the crown (full release).
Optimization of sequence alignment for simple sequence repeat regions.
Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C
2011-07-20
Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
Local-scale drivers of tree survival in a temperate forest.
Wang, Xugao; Comita, Liza S; Hao, Zhanqing; Davies, Stuart J; Ye, Ji; Lin, Fei; Yuan, Zuoqiang
2012-01-01
Tree survival plays a central role in forest ecosystems. Although many factors such as tree size, abiotic and biotic neighborhoods have been proposed as being important in explaining patterns of tree survival, their contributions are still subject to debate. We used generalized linear mixed models to examine the relative importance of tree size, local abiotic conditions and the density and identity of neighbors on tree survival in an old-growth temperate forest in northeastern China at three levels (community, guild and species). Tree size and both abiotic and biotic neighborhood variables influenced tree survival under current forest conditions, but their relative importance varied dramatically within and among the community, guild and species levels. Of the variables tested, tree size was typically the most important predictor of tree survival, followed by biotic and then abiotic variables. The effect of tree size on survival varied from strongly positive for small trees (1-20 cm dbh) and medium trees (20-40 cm dbh), to slightly negative for large trees (>40 cm dbh). Among the biotic factors, we found strong evidence for negative density and frequency dependence in this temperate forest, as indicated by negative effects of both total basal area of neighbors and the frequency of conspecific neighbors. Among the abiotic factors tested, soil nutrients tended to be more important in affecting tree survival than topographic variables. Abiotic factors generally influenced survival for species with relatively high abundance, for individuals in smaller size classes and for shade-tolerant species. Our study demonstrates that the relative importance of variables driving patterns of tree survival differs greatly among size classes, species guilds and abundance classes in temperate forest, which can further understanding of forest dynamics and offer important insights into forest management.
Local-Scale Drivers of Tree Survival in a Temperate Forest
Wang, Xugao; Comita, Liza S.; Hao, Zhanqing; Davies, Stuart J.; Ye, Ji; Lin, Fei; Yuan, Zuoqiang
2012-01-01
Tree survival plays a central role in forest ecosystems. Although many factors such as tree size, abiotic and biotic neighborhoods have been proposed as being important in explaining patterns of tree survival, their contributions are still subject to debate. We used generalized linear mixed models to examine the relative importance of tree size, local abiotic conditions and the density and identity of neighbors on tree survival in an old-growth temperate forest in northeastern China at three levels (community, guild and species). Tree size and both abiotic and biotic neighborhood variables influenced tree survival under current forest conditions, but their relative importance varied dramatically within and among the community, guild and species levels. Of the variables tested, tree size was typically the most important predictor of tree survival, followed by biotic and then abiotic variables. The effect of tree size on survival varied from strongly positive for small trees (1–20 cm dbh) and medium trees (20–40 cm dbh), to slightly negative for large trees (>40 cm dbh). Among the biotic factors, we found strong evidence for negative density and frequency dependence in this temperate forest, as indicated by negative effects of both total basal area of neighbors and the frequency of conspecific neighbors. Among the abiotic factors tested, soil nutrients tended to be more important in affecting tree survival than topographic variables. Abiotic factors generally influenced survival for species with relatively high abundance, for individuals in smaller size classes and for shade-tolerant species. Our study demonstrates that the relative importance of variables driving patterns of tree survival differs greatly among size classes, species guilds and abundance classes in temperate forest, which can further understanding of forest dynamics and offer important insights into forest management. PMID:22347996
Gauging Item Alignment through Online Systems While Controlling for Rater Effects
ERIC Educational Resources Information Center
Anderson, Daniel; Irvin, Shawn; Alonzo, Julie; Tindal, Gerald A.
2015-01-01
The alignment of test items to content standards is critical to the validity of decisions made from standards-based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the…
MultiPhyl: a high-throughput phylogenomics webserver using distributed computing
Keane, Thomas M.; Naughton, Thomas J.; McInerney, James O.
2007-01-01
With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php. PMID:17553837
Dynamic evaluation of anterior dental alignment in a sample of 8- to 11-year-old children.
Lombardo, Luca; Berveglieri, Chiara; Guarneri, Antonio; Siciliani, Giuseppe
2012-06-01
To describe perceptions related to different anterior dental alignments in a sample of 106 children aged between 8- and 11-years-old. We employed dynamic media (videos) showing a smile in four different arrangements (ideal incisal occlusion - N, median diastema - D, incisal crowding - A, protruding incisors - P), with and without general contextual attractiveness. The perception is the same both for the whole face and for the frontal smile alone and there are no significant differences between the answers from male and female interviews. Smiles with normal alignment gain higher scores for esthetics and are associated with more positive qualities. In contrast, smiles with proclined and crowded teeth obtain lower scores. Analysis of the results showed that there are no significant differences between perceptions, sensations and judgments related to smiles presented either as part of the whole face or in only the lower facial third: general facial attractiveness does not influence evaluations of the smile. The study confirms the general tendency to award higher scores to smiles with normal alignment. Copyright © 2012 CEO. Published by Elsevier Masson SAS. All rights reserved.
USDA-ARS?s Scientific Manuscript database
The major tree nuts include almonds, Brazil nuts, cashew nuts, hazelnuts, macadamia nuts, pecans, pine nuts, pistachio nuts, and walnuts. Tree nut oils are appreciated in food applications because of their flavors and are generally more expensive than other gourmet oils. Research during the last de...
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
2010-01-01
Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.
David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A
2010-02-08
All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir
2011-01-01
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353
A single-probe heat pulse method for estimating sap velocity in trees.
López-Bernal, Álvaro; Testi, Luca; Villalobos, Francisco J
2017-10-01
Available sap flow methods are still far from being simple, cheap and reliable enough to be used beyond very specific research purposes. This study presents and tests a new single-probe heat pulse (SPHP) method for monitoring sap velocity in trees using a single-probe sensor, rather than the multi-probe arrangements used up to now. Based on the fundamental conduction-convection principles of heat transport in sapwood, convective velocity (V h ) is estimated from the temperature increase in the heater after the application of a heat pulse (ΔT). The method was validated against measurements performed with the compensation heat pulse (CHP) technique in field trees of six different species. To do so, a dedicated three-probe sensor capable of simultaneously applying both methods was produced and used. Experimental measurements in the six species showed an excellent agreement between SPHP and CHP outputs for moderate to high flow rates, confirming the applicability of the method. In relation to other sap flow methods, SPHP presents several significant advantages: it requires low power inputs, it uses technically simpler and potentially cheaper instrumentation, the physical damage to the tree is minimal and artefacts caused by incorrect probe spacing and alignment are removed. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Soft context clustering for F0 modeling in HMM-based speech synthesis
NASA Astrophysics Data System (ADS)
Khorram, Soheil; Sameti, Hossein; King, Simon
2015-12-01
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
The tree to the left, the forest to the right: political attitude and perceptual bias.
Caparos, Serge; Fortier-St-Pierre, Simon; Gosselin, Jérémie; Blanchette, Isabelle; Brisson, Benoit
2015-01-01
A prominent model suggests that individuals to the right of the political spectrum are more cognitively rigid and less tolerant of ambiguity than individuals to the left. On the basis of this model, we predicted that a psychological mechanism linked to the resolution of visual ambiguity--perceptual bias--would be linked to political attitude. Perceptual bias causes western individuals to favour a global interpretation when scrutinizing ambiguous hierarchical displays (e.g., alignment of trees) that can be perceived either in terms of their local elements (e.g., several trees) or in terms of their global structure (e.g., a forest). Using three tasks (based on Navon-like hierarchical figures or on the Ebbinghaus illusion), we demonstrate (1) that right-oriented Westerners present a stronger bias towards global perception than left-oriented Westerners and (2) that this stronger bias is linked to higher cognitive rigidity. This study establishes for the first time that political ideology, a high-level construct, is directly reflected in low-level perception. Right- and left-oriented individuals actually see the world differently. Copyright © 2014 Elsevier B.V. All rights reserved.
A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins
Vallat, Brinda Kizhakke; Pillardy, Jaroslaw; Elber, Ron
2010-01-01
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is employed to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50-100 percent) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6Å RMSD from the native structure), decays linearly as a function of the TM structural-alignment score. PMID:18300226
Projected power iteration for network alignment
NASA Astrophysics Data System (ADS)
Onaran, Efe; Villar, Soledad
2017-08-01
The network alignment problem asks for the best correspondence between two given graphs, so that the largest possible number of edges are matched. This problem appears in many scientific problems (like the study of protein-protein interactions) and it is very closely related to the quadratic assignment problem which has graph isomorphism, traveling salesman and minimum bisection problems as particular cases. The graph matching problem is NP-hard in general. However, under some restrictive models for the graphs, algorithms can approximate the alignment efficiently. In that spirit the recent work by Feizi and collaborators introduce EigenAlign, a fast spectral method with convergence guarantees for Erd-s-Renyí graphs. In this work we propose the algorithm Projected Power Alignment, which is a projected power iteration version of EigenAlign. We numerically show it improves the recovery rates of EigenAlign and we describe the theory that may be used to provide performance guarantees for Projected Power Alignment.
Using structure to explore the sequence alignment space of remote homologs.
Kuziemko, Andrew; Honig, Barry; Petrey, Donald
2011-10-01
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
Triangular Alignment (TAME). A Tensor-based Approach for Higher-order Network Alignment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mohammadi, Shahin; Gleich, David F.; Kolda, Tamara G.
2015-11-01
Network alignment is an important tool with extensive applications in comparative interactomics. Traditional approaches aim to simultaneously maximize the number of conserved edges and the underlying similarity of aligned entities. We propose a novel formulation of the network alignment problem that extends topological similarity to higher-order structures and provide a new objective function that maximizes the number of aligned substructures. This objective function corresponds to an integer programming problem, which is NP-hard. Consequently, we approximate this objective function as a surrogate function whose maximization results in a tensor eigenvalue problem. Based on this formulation, we present an algorithm called Triangularmore » AlignMEnt (TAME), which attempts to maximize the number of aligned triangles across networks. We focus on alignment of triangles because of their enrichment in complex networks; however, our formulation and resulting algorithms can be applied to general motifs. Using a case study on the NAPABench dataset, we show that TAME is capable of producing alignments with up to 99% accuracy in terms of aligned nodes. We further evaluate our method by aligning yeast and human interactomes. Our results indicate that TAME outperforms the state-of-art alignment methods both in terms of biological and topological quality of the alignments.« less
Accelerating large-scale protein structure alignments with graphics processing units
2012-01-01
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Xinhua Zhou; Michele M. Schoeneberger; James R. Brandle; Tala N. Awada; Jianmin Chu; Derrel L. Martin; Jihong Li; Yuqiang Li; Carl W. Mize
2014-01-01
Quantifying carbon in agroforestry trees requires biomass equations that capture the growth differences (e.g., tree specific gravity and architecture) created in the more open canopies of agroforestry plantings compared with those generally encountered in forests. Whereas forest-derived equations are available, equations for open-grown trees are not. Data from...
Tentative guides for the selection of plus trees and superior stands in Douglas-fir.
Leo A. Isaac
1955-01-01
Interest among foresters in forest tree improvement has increased rapidly in recent years. Geneticists have learned that some individual trees greatly excel their neighbors in desirable characteristics, and that some entire stands are superior to other stands of the same species in a general locality. They have learned that many of the desirable tree characteristics...
Alaska Is Our Home--Book 3: A Natural Science Handbook for Alaskan Students.
ERIC Educational Resources Information Center
Bury, John; Bury, Susan
The third book in a series of natural science handbooks for Alaskan students focuses on Alaskan plantlife. The first chapter, on trees, gives general information about trees and explains how to identify and locate trees in the three main Alaskan tree families: pine, willow, and birch. The second chapter, on plants, describes 14 kinds of edible…
Natural falling of beetle-killed ponderosa pine
J. M. Schmid; S. A. Mata; W. F. McCambridge
1985-01-01
Beetle-killed trees in the Front Range of Colorado were observed for their rate and direction of falling. No trees fell within the 2 years following infestation. Thereafter, trees generally fell at the rate of 3-5% per year unless winds exceeded 75 mph. Most trees fell to the east and broke off between ground level and 2 feet above ground.
NASA Astrophysics Data System (ADS)
Kisi, Ozgur; Kilic, Yasin
2016-11-01
The generalization ability of artificial neural networks (ANNs) and M5 model tree (M5Tree) in modeling reference evapotranspiration ( ET 0 ) is investigated in this study. Daily climatic data, average temperature, solar radiation, wind speed, and relative humidity from six different stations operated by California Irrigation Management Information System (CIMIS) located in two different regions of the USA were used in the applications. King-City Oasis Rd., Arroyo Seco, and Salinas North stations are located in San Joaquin region, and San Luis Obispo, Santa Monica, and Santa Barbara stations are located in the Southern region. In the first part of the study, the ANN and M5Tree models were used for estimating ET 0 of six stations and results were compared with the empirical methods. The ANN and M5Tree models were found to be better than the empirical models. In the second part of the study, the ANN and M5Tree models obtained from one station were tested using the data from the other two stations for each region. ANN models performed better than the CIMIS Penman, Hargreaves, Ritchie, and Turc models in two stations while the M5Tree models generally showed better accuracy than the corresponding empirical models in all stations. In the third part of the study, the ANN and M5Tree models were calibrated using three stations located in San Joaquin region and tested using the data from the other three stations located in the Southern region. Four-input ANN and M5Tree models performed better than the CIMIS Penman in only one station while the two-input ANN models were found to be better than the Hargreaves, Ritchie, and Turc models in two stations.
Large longitudinal spin alignment generated in inelastic nuclear reactions
NASA Astrophysics Data System (ADS)
Hoff, D. E. M.; Potel, G.; Brown, K. W.; Charity, R. J.; Pruitt, C. D.; Sobotka, L. G.; Webb, T. B.; Roeder, B.; Saastamoinen, A.
2018-05-01
Large longitudinal spin alignment of E /A =24 MeV 7Li projectiles inelastically excited by Be, C, and Al targets was observed when the latter remain in their ground state. This alignment is a consequence of an angular-momentum-excitation-energy mismatch, which is well described by a DWBA cluster-model (α +t ). The longitudinal alignment of several other systems is also well described by DWBA calculations, including one where a cluster model is inappropriate, demonstrating that the alignment mechanism is a more general phenomenon. Predictions are made for inelastic excitation of 12C for beam energies above and below the mismatch threshold.
Irrational exuberance for resolved species trees.
Hahn, Matthew W; Nakhleh, Luay
2016-01-01
Phylogenomics has largely succeeded in its aim of accurately inferring species trees, even when there are high levels of discordance among individual gene trees. These resolved species trees can be used to ask many questions about trait evolution, including the direction of change and number of times traits have evolved. However, the mapping of traits onto trees generally uses only a single representation of the species tree, ignoring variation in the gene trees used to construct it. Recognizing that genes underlie traits, these results imply that many traits follow topologies that are discordant with the species topology. As a consequence, standard methods for character mapping will incorrectly infer the number of times a trait has evolved. This phenomenon, dubbed "hemiplasy," poses many problems in analyses of character evolution. Here we outline these problems, explaining where and when they are likely to occur. We offer several ways in which the possible presence of hemiplasy can be diagnosed, and discuss multiple approaches to dealing with the problems presented by underlying gene tree discordance when carrying out character mapping. Finally, we discuss the implications of hemiplasy for general phylogenetic inference, including the possible drawbacks of the widespread push for "resolved" species trees. © 2015 The Author(s). Evolution © 2015 The Society for the Study of Evolution.
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Zuo, Shu-di; Ren, Yin; Weng, Xian; Ding, Hong-feng; Luo, Yun-jian
2015-02-01
Biomass allometric equation (BAE) considered as a simple and reliable method in the estimation of forest biomass and carbon was used widely. In China, numerous studies focused on the BAEs for coniferous forest and pure broadleaved forest, and generalized BAEs were frequently used to estimate the biomass and carbon of mixed broadleaved forest, although they could induce large uncertainty in the estimates. In this study, we developed the species-specific and generalized BAEs using biomass measurement for 9 common broadleaved trees (Castanopsis fargesii, C. lamontii, C. tibetana, Lithocarpus glaber, Sloanea sinensis, Daphniphyllum oldhami, Alniphyllum fortunei, Manglietia yuyuanensis, and Engelhardtia fenzlii) of subtropical evergreen broadleaved forest, and compared differences in species-specific and generalized BAEs. The results showed that D (diameter at breast height) was a better independent variable in estimating the biomass of branch, leaf, root, aboveground section and total tree than a combined variable (D2 H) of D and H (tree height) , but D2H was better than D in estimating stem biomass. R2 (coefficient of determination) values of BAEs for 6 species decreased when adding H as the second independent variable into D- only BAEs, where R2 value for S. sinensis decreased by 5.6%. Compared with generalized D- and D2H-based BAEs, standard errors of estimate (SEE) of BAEs for 8 tree species decreased, and similar decreasing trend was observed for different components, where SEEs of the branch decreased by 13.0% and 20.3%. Therefore, the biomass carbon storage and its dynamic estimates were influenced largely by tree species and model types. In order to improve the accuracy of the estimates of biomass and carbon, we should consider the differences in tree species and model types.
Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance
2013-01-01
Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets. PMID:24180377
Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder
2016-01-01
The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541
Leveraging the rice genome sequence for monocot comparative and translational genomics.
Lohithaswa, H C; Feltus, F A; Singh, H P; Bacon, C D; Bailey, C D; Paterson, A H
2007-07-01
Common genome anchor points across many taxa greatly facilitate translational and comparative genomics and will improve our understanding of the Tree of Life. To add to the repertoire of genomic tools applicable to the study of monocotyledonous plants in general, we aligned Allium and Musa ESTs to Oryza BAC sequences and identified candidate Allium-Oryza and Musa-Oryza conserved intron-scanning primers (CISPs). A random sampling of 96 CISP primer pairs, representing loci from 11 of the 12 chromosomes in rice, were tested on seven members of the order Poales and on representatives of the Arecales, Asparagales, and Zingiberales monocot orders. The single-copy amplification success rates of Allium (31.3%), Cynodon (31.4%), Hordeum (30.2%), Musa (37.5%), Oryza (61.5%), Pennisetum (33.3%), Sorghum (47.9%), Zea (33.3%), Triticum (30.2%), and representatives of the palm family (32.3%) suggest that subsets of these primers will provide DNA markers suitable for comparative and translational genomics in orphan crops, as well as for applications in conservation biology, ecology, invasion biology, population biology, systematic biology, and related fields.
duVerle, David A; Yotsukura, Sohiya; Nomura, Seitaro; Aburatani, Hiroyuki; Tsuda, Koji
2016-09-13
Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-11-28
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat
2016-03-01
At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
NASA Astrophysics Data System (ADS)
Mora, R.; Barahona, A.; Aguilar, H.
2015-04-01
This paper presents a method for using high detail volumetric information, captured with a land based photogrammetric survey, to obtain information from individual trees. Applying LIDAR analysis techniques it is possible to measure diameter at breast height, height at first branch (commercial height), basal area and volume of an individual tree. Given this information it is possible to calculate how much of that tree can be exploited as wood. The main objective is to develop a methodology for successfully surveying one individual tree, capturing every side of the stem a using high resolution digital camera and reference marks with GPS coordinates. The process is executed for several individuals of two species present in the metropolitan area in San Jose, Costa Rica, Delonix regia (Bojer) Raf. and Tabebuia rosea (Bertol.) DC., each one with different height, stem shape and crown area. Using a photogrammetry suite all the pictures are aligned, geo-referenced and a dense point cloud is generated with enough detail to perform the required measurements, as well as a solid tridimensional model for volume measurement. This research will open the way to develop a capture methodology with an airborne camera using close range UAVs. An airborne platform will make possible to capture every individual in a forest plantation, furthermore if the analysis techniques applied in this research are automated it will be possible to calculate with high precision the exploit potential of a forest plantation and improve its management.
Reasoning over Taxonomic Change: Exploring Alignments for the Perelleschus Use Case
Franz, Nico M.; Chen, Mingmin; Yu, Shizhuo; Kianmajd, Parisa; Bowers, Shawn; Ludäscher, Bertram
2015-01-01
Classifications and phylogenetic inferences of organismal groups change in light of new insights. Over time these changes can result in an imperfect tracking of taxonomic perspectives through the re-/use of Code-compliant or informal names. To mitigate these limitations, we introduce a novel approach for aligning taxonomies through the interaction of human experts and logic reasoners. We explore the performance of this approach with the Perelleschus use case of Franz & Cardona-Duque (2013). The use case includes six taxonomies published from 1936 to 2013, 54 taxonomic concepts (i.e., circumscriptions of names individuated according to their respective source publications), and 75 expert-asserted Region Connection Calculus articulations (e.g., congruence, proper inclusion, overlap, or exclusion). An Open Source reasoning toolkit is used to analyze 13 paired Perelleschus taxonomy alignments under heterogeneous constraints and interpretations. The reasoning workflow optimizes the logical consistency and expressiveness of the input and infers the set of maximally informative relations among the entailed taxonomic concepts. The latter are then used to produce merge visualizations that represent all congruent and non-congruent taxonomic elements among the aligned input trees. In this small use case with 6-53 input concepts per alignment, the information gained through the reasoning process is on average one order of magnitude greater than in the input. The approach offers scalable solutions for tracking provenance among succeeding taxonomic perspectives that may have differential biases in naming conventions, phylogenetic resolution, ingroup and outgroup sampling, or ostensive (member-referencing) versus intensional (property-referencing) concepts and articulations. PMID:25700173
Saturn Orbits Car Making into the Twenty-First Century. A Case Study
1993-04-01
two engine variations of the 1.9 liter four-cylinder aluminum block, a standard 85-horsepower, single overhead camshaft (SOHC) 8-valve and a high...performance, 124-horsepower, dual overhead camshafts (DOHC) 16-valve version. Its optional anti-lock braking system was a safety addition not normally found...Treece, James B. "The Planets May be Perfectly Aligned For Saturn’s Lift-Off." Business Week Oct. 22, 1990: 40. Tree %.e, James B. "War, Recession
NASA Technical Reports Server (NTRS)
Fox, G. E.
1985-01-01
Comparisons of complete 16S ribosomal ribonucleic acid (rRNA) sequences established that the secondary structure of these molecules is highly conserved. Earlier work with 5S rRNA secondary structure revealed that when structural conservation exists the alignment of sequences is straightforward. The constancy of structure implies minimal functional change. Under these conditions a uniform evolutionary rate can be expected so that conditions are favorable for phylogenetic tree construction.
77 FR 42694 - Helena National Forest, Montana, Telegraph Vegetation Project
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-20
... slashing generally small diameter trees followed by prescribed burning within the Jericho Mountain... dead and dying trees, promoting desirable regeneration, reducing fuels and the risk of wildfire, and... for Action Wide-scale tree mortality has occurred throughout the project area due to the mountain pine...
Precommercial crop-tree release increases diameter growth of Appalachian hardwood saplings
H. Clay Smith; Neil I. Lamson
1983-01-01
Codominant seedling-origin crop trees 25 to 39 feet tall in even-aged, precommercial-size hardwood stands were released in West Virginia. Trees were located on two sites: good oak site index 75 and fair oak site 63. Species studied were black cherry, sweet birch, and yellow-poplar. Three-year results indicated that the trees generally responded to release; the 3-year...
NASA Astrophysics Data System (ADS)
Carlson, Eric D.; Foley, Lee M.; Guzman, Edward; Korblova, Eva D.; Visvanathan, Rayshan; Ryu, SeongHo; Gim, Min-Jun; Tuchband, Michael R.; Yoon, Dong Ki; Clark, Noel A.; Walba, David M.
2017-08-01
The control of the molecular orientation of liquid crystals (LCs) is important in both understanding phase properties and the continuing development of new LC technologies including displays, organic transistors, and electro-optic devices. Many techniques have been developed for successfully inducing alignment of calamitic LCs, though these techniques typically do not translate to the alignment of bent-core liquid crystals (BCLCs). Some techniques have been utilized to align various phases of BCLCs, but these techniques are often unsuccessful for general alignment of multiple materials and/or multiple phases. Here, we demonstrate that glass cells treated with polydimethylsiloxane (PDMS) thin films induce high quality homeotropic alignment of multiple mesophases of four BCLCs. On cooling to the lowest temperature phase the homeotropic alignment is lost, and spherulitic growth is seen in crystal and crystal-like phases including the dark conglomerate (DC) and helical nanofilament (HNF) phases. Evidence of homeotropic alignment is observed using polarized optical microscopy. We speculate that the methyl groups on the surface of the PDMS films strongly interact with the aliphatic tails of each mesogens, resulting in homeotropic alignment.
Some aspects of SR beamline alignment
NASA Astrophysics Data System (ADS)
Gaponov, Yu. A.; Cerenius, Y.; Nygaard, J.; Ursby, T.; Larsson, K.
2011-09-01
Based on the Synchrotron Radiation (SR) beamline optical element-by-element alignment with analysis of the alignment results an optimized beamline alignment algorithm has been designed and developed. The alignment procedures have been designed and developed for the MAX-lab I911-4 fixed energy beamline. It has been shown that the intermediate information received during the monochromator alignment stage can be used for the correction of both monochromator and mirror without the next stages of alignment of mirror, slits, sample holder, etc. Such an optimization of the beamline alignment procedures decreases the time necessary for the alignment and becomes useful and helpful in the case of any instability of the beamline optical elements, storage ring electron orbit or the wiggler insertion device, which could result in the instability of angular and positional parameters of the SR beam. A general purpose software package for manual, semi-automatic and automatic SR beamline alignment has been designed and developed using the developed algorithm. The TANGO control system is used as the middle-ware between the stand-alone beamline control applications BLTools, BPMonitor and the beamline equipment.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.
Daily, Jeff
2016-02-10
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
The evolutionary history of ferns inferred from 25 low-copy nuclear genes.
Rothfels, Carl J; Li, Fay-Wei; Sigel, Erin M; Huiet, Layne; Larsson, Anders; Burge, Dylan O; Ruhsam, Markus; Deyholos, Michael; Soltis, Douglas E; Stewart, C Neal; Shaw, Shane W; Pokorny, Lisa; Chen, Tao; dePamphilis, Claude; DeGironimo, Lisa; Chen, Li; Wei, Xiaofeng; Sun, Xiao; Korall, Petra; Stevenson, Dennis W; Graham, Sean W; Wong, Gane K-S; Pryer, Kathleen M
2015-07-01
• Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies. © 2015 Botanical Society of America, Inc.
Gibbs measures with memory of length 2 on an arbitrary-order Cayley tree
NASA Astrophysics Data System (ADS)
Akın, Hasan
In this paper, we consider the Ising-Vanniminus model on an arbitrary-order Cayley tree. We generalize the results conjectured by Akın [Chinese J. Phys. 54(4), 635-649 (2016) and Int. J. Mod. Phys. B 31(13), 1750093 (2017)] for an arbitrary-order Cayley tree. We establish the existence and a full classification of translation-invariant Gibbs measures (TIGMs) with a memory of length 2 associated with the model on arbitrary-order Cayley tree. We construct the recurrence equations corresponding to the generalized ANNNI model. We satisfy the Kolmogorov consistency condition. We propose a rigorous measure-theoretical approach to investigate the Gibbs measures with a memory of length 2 for the model. We explain if the number of branches of the tree does not change the number of Gibbs measures. Also, we try to determine when the phase transition does occur.
Li, Weibin; Hartmann, Henrik; Adams, Henry D; Zhang, Hongxia; Jin, Changjie; Zhao, Chuanyan; Guan, Dexin; Wang, Anzhi; Yuan, Fenghui; Wu, Jiabing
2018-06-11
Non-structural carbohydrates (NSC) play a central role in plant functioning as energy carriers and building blocks for primary and secondary metabolism. Many studies have investigated how environmental and anthropogenic changes, like increasingly frequent and severe drought episodes, elevated CO2 and atmospheric nitrogen (N) deposition, influence NSC concentrations in individual trees. However, this wealth of data has not been analyzed yet to identify general trends using a common statistical framework. A thorough understanding of tree responses to global change is required for making realistic predictions of vegetation dynamics. Here we compiled data from 57 experimental studies on 71 tree species and conducted a meta-analysis to evaluate general responses of stored soluble sugars, starch and total NSC (soluble sugars + starch) concentrations in different tree organs (foliage, above-ground wood and roots) to drought, elevated CO2 and N deposition. We found that drought significantly decreased total NSC in roots (-17.3%), but not in foliage and above-ground woody tissues (bole, branch, stem and/or twig). Elevated CO2 significantly increased total NSC in foliage (+26.2%) and roots (+12.8%), but not in above-ground wood. By contrast, total NSC significantly decreased in roots (-17.9%), increased in above-ground wood (+6.1%), but was unaffected in foliage from N fertilization. In addition, the response of NSC to three global change drivers was strongly affected by tree taxonomic type, leaf habit, tree age and treatment intensity. Our results pave the way for a better understanding of general tree function responses to drought, elevated CO2 and N fertilization. The existing data also reveal that more long-term studies on mature trees that allow testing interactions between these factors are urgently needed to provide a basis for forecasting tree responses to environmental change at the global scale.
Optimized graph-based mosaicking for virtual microscopy
NASA Astrophysics Data System (ADS)
Steckhan, Dirk G.; Wittenberg, Thomas
2009-02-01
Virtual microscopy has the potential to partially replace traditional microscopy. For virtualization, the slide is scanned once by a fully automatized robotic microscope and saved digitally. Typically, such a scan results in several hundreds to thousands of fields of view. Since robotic stages have positioning errors, these fields of view have to be registered locally and globally in an additional step. In this work we propose a new global mosaicking method for the creation of virtual slides based on sub-pixel exact phase correlation for local alignment in combination with Prim's minimum spanning tree algorithm for global alignment. Our algorithm allows for a robust reproduction of the original slide even in the presence of views with little to no information content. This makes it especially suitable for the mosaicking of cervical smears. These smears often exhibit large empty areas, which do not contain enough information for common stitching approaches.
Diseases of Ornamental and Shade Trees, Shrubs, Vines, and Ground Covers.
ERIC Educational Resources Information Center
Nichols, Lester P.
This agriculture extension service publication from Pennsylvania State University covers the identification and control of common ornamental trees, shrubs, and ground cover diseases. The publication is divided into sections. The first section discusses the diseases of ornamental and shade trees, including general diseases and diseases of specific…
Tree improvement research in oak species
Franklin C. Cech
1971-01-01
Early efforts at developing new forms of oak by interspecific hybridization have not been very successful, although spontaneous hybrids appear rather readily in nature. Preliminary reports on a few seed-source or provenance studies indicate that differences among sources are generally less than differences among trees within sources. This directs tree-improvement...
KaDonna C. Randolph
2018-01-01
Tree crown conditions are visually assessed by the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program as an indicator of forest health. These assessments are useful because individual tree photosynthetic capacity is dependent upon the size and condition of the crown. In general, trees with full, vigorous crowns are associated...
Automated interferometric alignment system for paraboloidal mirrors
Maxey, L.C.
1993-09-28
A method is described for a systematic method of interpreting interference fringes obtained by using a corner cube retroreflector as an alignment aid when aligning a paraboloid to a spherical wavefront. This is applicable to any general case where such alignment is required, but is specifically applicable in the case of aligning an autocollimating test using a diverging beam wavefront. In addition, the method provides information which can be systematically interpreted such that independent information about pitch, yaw and focus errors can be obtained. Thus, the system lends itself readily to automation. Finally, although the method is developed specifically for paraboloids, it can be seen to be applicable to a variety of other aspheric optics when applied in combination with a wavefront corrector that produces a wavefront which, when reflected from the correctly aligned aspheric surface will produce a collimated wavefront like that obtained from the paraboloid when it is correctly aligned to a spherical wavefront. 14 figures.
ERIC Educational Resources Information Center
Schuder, Donald L.
This guide presents information on controlling insect pests of ornamental trees and shrubs. It is organized for easy reference by nurserymen, arborists, and others desirous of controlling insect damage. General information given includes notes on spraying and sprayers, insecticides, general purpose sprays, phytotoxicity, and health precautions.…
T.O. Veteli; W.J. Mattson; P. Niemela; R. Julkunen-Tiitto; S. Kellomaki; K. Kuokkanen; A. Lavola
2007-01-01
Global climate change includes concomitant changes in many components of the abiotic flux necessary for plant life. In this paper, we investigate the combined effects of elevated CO2 (720 ppm) and temperature (+2 K) on the phytochemistry of three deciduous tree species. The analysis revealed that elevated CO2 generally...
Hernandez, J E; Epstein, L D; Rodriguez, M H; Rodriguez, A D; Rejmankova, E; Roberts, D R
1997-03-01
We propose the use of generalized tree models (GTMs) to analyze data from entomological field studies. Generalized tree models can be used to characterize environments with different mosquito breeding capacity. A GTM simultaneously analyzes a set of predictor variables (e.g., vegetation coverage) in relation to a response variable (e.g., counts of Anopheles albimanus larvae), and how it varies with respect to a set of criterion variables (e.g., presence of predators). The algorithm produces a treelike graphical display with its root at the top and 2 branches stemming down from each node. At each node, conditions on the value of predictors partition the observations into subgroups (environments) in which the relation between response and criterion variables is most homogeneous.
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeWalle, D.R.; Swistock, B.R.; Sharpe, W.E.
Studies were conducted at five Appalchian sites to determine if chemical element concentrations in sapwood tree rings from six tree species varied with soil and soil leachate acidity. The most recent 5-yr-growth increment was extracted from 10 tree boles of each species at each site and analyzed for chemical content using plasma emission spectroscopy. Sapwood tree rings generally showed higher concentrations of Mn and lower concentrations of Sr at sites with lower soil pH. Differences in tree-ring concentrations for Ca and Mn among sites were also found in soil water samples at these sites. Significant differences in soil leachate Almore » between sites were not duplicated in tree rings. Sapwood tree-ring chemistry in red oak (Quercus rubra L.), black cherry (Prunus serotina Ehrh.), eastern white pine (pinus strobus L.) and eastern hemlock (Tsuga canadensis (L.) Carr.) was generally responsive to differences in soil chemistry between sites. Chestnut oak (Quercus prinus L.) and pignut hickory (Carya glabra (Mill.) Sweet) were the least responsive species tested. Overall, results show that several common tree species and selected elements are potentially useful for studying historic soil acidification trends at these study sites.« less
Roca, Alberto I
2014-01-01
The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
TreeScaper: Visualizing and Extracting Phylogenetic Signal from Sets of Trees.
Huang, Wen; Zhou, Guifang; Marchand, Melissa; Ash, Jeremy R; Morris, David; Van Dooren, Paul; Brown, Jeremy M; Gallivan, Kyle A; Wilgenbusch, Jim C
2016-12-01
Modern phylogenomic analyses often result in large collections of phylogenetic trees representing uncertainty in individual gene trees, variation across genes, or both. Extracting phylogenetic signal from these tree sets can be challenging, as they are difficult to visualize, explore, and quantify. To overcome some of these challenges, we have developed TreeScaper, an application for tree set visualization as well as the identification of distinct phylogenetic signals. GUI and command-line versions of TreeScaper and a manual with tutorials can be downloaded from https://github.com/whuang08/TreeScaper/releases TreeScaper is distributed under the GNU General Public License. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sharma, Virag; Hiller, Michael
2017-08-21
Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Field alignment of bent-core smectic liquid crystals for analog optical phase modulation
NASA Astrophysics Data System (ADS)
Shen, Y.; Goodhew, L.; Shao, R.; Moran, M.; Korblova, E.; Walba, D. M.; Clark, N. A.; Maclennan, J. E.; Rudquist, P.
2015-05-01
A general method for aligning bent-core smectic liquid crystal materials is described. Alternating electric fields between interdigitated electrodes patterned on one cell surface create torques on the liquid crystal that result in uniform "bookshelf" orientation of the smectic layers. The aligned cell can then be driven in the conventional way by applying an electric field between all of the stripe electrodes connected together and a monolithic electrode on the other cell surface. Fast, analog, optical phase-only modulation is demonstrated in a device containing a polar, bent-core SmAPF material aligned using this technique.
Efficient and robust model-to-image alignment using 3D scale-invariant features.
Toews, Matthew; Wells, William M
2013-04-01
This paper presents feature-based alignment (FBA), a general method for efficient and robust model-to-image alignment. Volumetric images, e.g. CT scans of the human body, are modeled probabilistically as a collage of 3D scale-invariant image features within a normalized reference space. Features are incorporated as a latent random variable and marginalized out in computing a maximum a posteriori alignment solution. The model is learned from features extracted in pre-aligned training images, then fit to features extracted from a new image to identify a globally optimal locally linear alignment solution. Novel techniques are presented for determining local feature orientation and efficiently encoding feature intensity in 3D. Experiments involving difficult magnetic resonance (MR) images of the human brain demonstrate FBA achieves alignment accuracy similar to widely-used registration methods, while requiring a fraction of the memory and computation resources and offering a more robust, globally optimal solution. Experiments on CT human body scans demonstrate FBA as an effective system for automatic human body alignment where other alignment methods break down. Copyright © 2012 Elsevier B.V. All rights reserved.
Efficient and Robust Model-to-Image Alignment using 3D Scale-Invariant Features
Toews, Matthew; Wells, William M.
2013-01-01
This paper presents feature-based alignment (FBA), a general method for efficient and robust model-to-image alignment. Volumetric images, e.g. CT scans of the human body, are modeled probabilistically as a collage of 3D scale-invariant image features within a normalized reference space. Features are incorporated as a latent random variable and marginalized out in computing a maximum a-posteriori alignment solution. The model is learned from features extracted in pre-aligned training images, then fit to features extracted from a new image to identify a globally optimal locally linear alignment solution. Novel techniques are presented for determining local feature orientation and efficiently encoding feature intensity in 3D. Experiments involving difficult magnetic resonance (MR) images of the human brain demonstrate FBA achieves alignment accuracy similar to widely-used registration methods, while requiring a fraction of the memory and computation resources and offering a more robust, globally optimal solution. Experiments on CT human body scans demonstrate FBA as an effective system for automatic human body alignment where other alignment methods break down. PMID:23265799
REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.
2009-01-01
The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
Approaches of multilayer overlay process control for 28nm FD-SOI derivative applications
NASA Astrophysics Data System (ADS)
Duclaux, Benjamin; De Caunes, Jean; Perrier, Robin; Gatefait, Maxime; Le Gratiet, Bertrand; Chapon, Jean-Damien; Monget, Cédric
2018-03-01
Derivative technology like embedded Non-Volatile Memories (eNVM) is raising new types of challenges on the "more than Moore" path. By its construction: overlay is critical across multiple layers, by its running mode: usage of high voltage are stressing leakages and breakdown, and finally with its targeted market: Automotive, Industry automation, secure transactions… which are all requesting high device reliability (typically below 1ppm level). As a consequence, overlay specifications are tights, not only between one layer and its reference, but also among the critical layers sharing the same reference. This work describes a broad picture of the key points for multilayer overlay process control in the case of a 28nm FD-SOI technology and its derivative flows. First, the alignment trees of the different flow options have been optimized using a realistic process assumptions calculation for indirect overlay. Then, in the case of a complex alignment tree involving heterogeneous scanner toolset, criticality of tool matching between reference layer and critical layers of the flow has been highlighted. Improving the APC control loops of these multilayer dependencies has been studied with simulations of feed-forward as well as implementing new rework algorithm based on multi-measures. Finally, the management of these measurement steps raises some issues for inline support and using calculations or "virtual overlay" could help to gain some tool capability. A first step towards multilayer overlay process control has been taken.
Wysocki, William P; Ruiz-Sanchez, Eduardo; Yin, Yanbin; Duvall, Melvin R
2016-05-20
Next-generation sequencing now allows for total RNA extracts to be sequenced in non-model organisms such as bamboos, an economically and ecologically important group of grasses. Bamboos are divided into three lineages, two of which are woody perennials with bisexual flowers, which undergo gregarious monocarpy. The third lineage, which are herbaceous perennials, possesses unisexual flowers that undergo annual flowering events. Transcriptomes were assembled using both reference-based and de novo methods. These two methods were tested by characterizing transcriptome content using sequence alignment to previously characterized reference proteomes and by identifying Pfam domains. Because of the striking differences in floral morphology and phenology between the herbaceous and woody bamboo lineages, MADS-box genes, transcription factors that control floral development and timing, were characterized and analyzed in this study. Transcripts were identified using phylogenetic methods and categorized as A, B, C, D or E-class genes, which control floral development, or SOC or SVP-like genes, which control the timing of flowering events. Putative nuclear orthologues were also identified in bamboos to use as phylogenetic markers. Instances of gene copies exhibiting topological patterns that correspond to shared phenotypes were observed in several gene families including floral development and timing genes. Alignments and phylogenetic trees were generated for 3,878 genes and for all genes in a concatenated analysis. Both the concatenated analysis and those of 2,412 separate gene trees supported monophyly among the woody bamboos, which is incongruent with previous phylogenetic studies using plastid markers.
77 FR 61024 - Notice of Public Meeting and Request for Comments
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-05
... public meeting and public comments--The National Christmas Tree Lighting and the subsequent 26-day event... National Christmas Tree Lighting and the subsequent 26-day event. The general plan and theme for the event... comments and suggestions on the planning of the 2012 National Christmas Tree Lighting and the subsequent 26...
Data quality in citizen science urban tree inventories
Lara A. Roman; Bryant C. Scharenbroch; Johan P.A. Ostberg; Lee S. Mueller; Jason G. Henning; Andrew K. Koeser; Jessica R. Sanders; Daniel R. Betz; Rebecca C. Jordan
2017-01-01
Citizen science has been gaining popularity in ecological research and resource management in general and in urban forestry specifically. As municipalities and nonprofits engage volunteers in tree data collection, it is critical to understand data quality. We investigated observation error by comparing street tree data collected by experts to data collected by less...
Market-based approaches to tree valuation
Geoffrey H. Donovan; David T. Butry
2008-01-01
A recent four-part series in Arborist News outlined different appraisal processes used to value urban trees. The final article in the series described the three generally accepted approaches to tree valuation: the sales comparison approach, the cost approach, and the income capitalization approach. The author, D. Logan Nelson, noted that the sales comparison approach...
Best predictors for postfire mortality of ponderosa pine trees in the Intermountain West
Carolyn Hull Sieg; Joel D. McMillin; James F. Fowler; Kurt K. Allen; Jose F. Negron; Linda L. Wadleigh; John A. Anhold; Ken E. Gibson
2006-01-01
Numerous wildfires in recent years have highlighted managers' needs for reliable tools to predict postfire mortality of ponderosa pine (Pinus ponderosa Dougl. ex Laws.) trees. General applicability of existing mortality models is uncertain, as researchers have used different sets of variables. We quantified tree attributes, crown and bole fire...
A guide for salvaging white pine injured by forest fires
Thomas W. McConkey; Donald R. Gedney
1951-01-01
White pine forests are severely damaged by forest fires. Generally a fire kills all trees less than 20 feet high immediately. Larger trees may die later, depending on the degree of injury. Salvage operations must be started soon after a fire, because insects and fungi quickly attack trees that are killed.
Northeastern Forest Experiment Station
1973-01-01
This booklet outlines what happens most of the time as decay develops in a living tree. The drawings are designed to give an accurate general presentation of the decay process by focusing only on the major portions of an extremely complex process that involves the interactions among microorganisms, environmental factors, and the tree. The better we understand these...
Wayne D. Shepperd; John R. Jones
1985-01-01
In forestry, a nurse crop generally is a crop of trees or shrubs that fosters the development of another tree species, usually by protecting the second species, during its youth, from frost, insolation, or wind (Ford-Robertson 1971). Aspen may be a nurse crop for shade-tolerant tree species that do not become established in full sunlight (e.g., Engelmann spruce)....
Twentieth-century decline of large-diameter trees in Yosemite National Park, California, USA
Lutz, J.A.; van Wagtendonk, J.W.; Franklin, J.F.
2009-01-01
Studies of forest change in western North America often focus on increased densities of small-diameter trees rather than on changes in the large tree component. Large trees generally have lower rates of mortality than small trees and are more resilient to climate change, but these assumptions have rarely been examined in long-term studies. We combined data from 655 historical (1932-1936) and 210 modern (1988-1999) vegetation plots to examine changes in density of large-diameter trees in Yosemite National Park (3027 km2). We tested the assumption of stability for large-diameter trees, as both individual species and communities of large-diameter trees. Between the 1930s and 1990s, large-diameter tree density in Yosemite declined 24%. Although the decrease was apparent in all forest types, declines were greatest in subalpine and upper montane forests (57.0% of park area), and least in lower montane forests (15.3% of park area). Large-diameter tree densities of 11 species declined while only 3 species increased. Four general patterns emerged: (1) Pinus albicaulis, Quercus chrysolepis, and Quercus kelloggii had increases in density of large-diameter trees occur throughout their ranges; (2) Pinus jeffreyi, Pinus lambertiana, and Pinus ponderosa, had disproportionately larger decreases in large-diameter tree densities in lower-elevation portions of their ranges; (3) Abies concolor and Pinus contorta, had approximately uniform decreases in large-diameter trees throughout their elevational ranges; and (4) Abies magnifica, Calocedrus decurrens, Juniperus occidentalis, Pinus monticola, Pseudotsuga menziesii, and Tsuga mertensiana displayed little or no change in large-diameter tree densities. In Pinus ponderosa-Calocedrus decurrens forests, modern large-diameter tree densities were equivalent whether or not plots had burned since 1936. However, in unburned plots, the large-diameter trees were predominantly A. concolor, C. decurrens, and Q. chrysolepis, whereas P. ponderosa dominated the large-diameter component of burned plots. Densities of large-diameter P. ponderosa were 8.1 trees ha-1 in plots that had experienced fire, but only 0.5 trees ha-1 in plots that remained unburned. ?? 2009 Elsevier B.V. All rights reserved.
Improving generalized inverted index lock wait times
NASA Astrophysics Data System (ADS)
Borodin, A.; Mirvoda, S.; Porshnev, S.; Ponomareva, O.
2018-01-01
Concurrent operations on tree like data structures is a cornerstone of any database system. Concurrent operations intended for improving read\\write performance and usually implemented via some way of locking. Deadlock-free methods of concurrency control are known as tree locking protocols. These protocols provide basic operations(verbs) and algorithm (ways of operation invocations) for applying it to any tree-like data structure. These algorithms operate on data, managed by storage engine which are very different among RDBMS implementations. In this paper, we discuss tree locking protocol implementation for General inverted index (Gin) applied to multiversion concurrency control (MVCC) storage engine inside PostgreSQL RDBMS. After that we introduce improvements to locking protocol and provide usage statistics about evaluation of our improvement in very high load environment in one of the world’s largest IT company.
DE 1 observations of type 1 counterstreaming electrons and field-aligned currents
NASA Technical Reports Server (NTRS)
Lin, C. S.; Burch, J. L.; Barfield, J. N.; Sugiura, M.; Nielsen, E.
1984-01-01
Dynamics Explorer 1 satellite observations of plasma and magnetic fields during type one counterstreaming electron events are presented. Counterstreaming electrons are observed at high altitudes in the region of field-aligned current. The total current density computed from the plasma data in the 18-10,000 eV energy range is generally about 1-2 micro-A/sq m. For the downward current, low-energy electrons contribute more than 40 percent of the total plasma current density integrated above 18 eV. For the upward current, such electrons contribute less than 50 percent of that current density. Electron beams in the field-aligned direction are occasionally detected. The pitch angle distributions of counterstreaming electrons are generally enhanced at both small and large pitch angles. STARE simultaneous observations for one DE 1 pass indicated that the field-aligned current was closed through Pedersen currents in the ionosphere. The directions of the ionospheric current systems are consistent with the DE 1 observations at high altitudes.
Ghaffari, Mahsa; Tangen, Kevin; Alaraj, Ali; Du, Xinjian; Charbel, Fady T; Linninger, Andreas A
2017-12-01
In this paper, we present a novel technique for automatic parametric mesh generation of subject-specific cerebral arterial trees. This technique generates high-quality and anatomically accurate computational meshes for fast blood flow simulations extending the scope of 3D vascular modeling to a large portion of cerebral arterial trees. For this purpose, a parametric meshing procedure was developed to automatically decompose the vascular skeleton, extract geometric features and generate hexahedral meshes using a body-fitted coordinate system that optimally follows the vascular network topology. To validate the anatomical accuracy of the reconstructed vasculature, we performed statistical analysis to quantify the alignment between parametric meshes and raw vascular images using receiver operating characteristic curve. Geometric accuracy evaluation showed an agreement with area under the curves value of 0.87 between the constructed mesh and raw MRA data sets. Parametric meshing yielded on-average, 36.6% and 21.7% orthogonal and equiangular skew quality improvement over the unstructured tetrahedral meshes. The parametric meshing and processing pipeline constitutes an automated technique to reconstruct and simulate blood flow throughout a large portion of the cerebral arterial tree down to the level of pial vessels. This study is the first step towards fast large-scale subject-specific hemodynamic analysis for clinical applications. Copyright © 2017 Elsevier Ltd. All rights reserved.
Modeling Answer Change Behavior: An Application of a Generalized Item Response Tree Model
ERIC Educational Resources Information Center
Jeon, Minjeong; De Boeck, Paul; van der Linden, Wim
2017-01-01
We present a novel application of a generalized item response tree model to investigate test takers' answer change behavior. The model allows us to simultaneously model the observed patterns of the initial and final responses after an answer change as a function of a set of latent traits and item parameters. The proposed application is illustrated…
29 CFR 780.216 - Nursery activities generally and Christmas tree production.
Code of Federal Regulations, 2013 CFR
2013-07-01
...; on-going treatment with fertilizer, herbicides, and pesticides as necessary; (2) After approximately... trees in cultivated soil with continued treatment with fertilizers, herbicides, and pesticides as...
29 CFR 780.216 - Nursery activities generally and Christmas tree production.
Code of Federal Regulations, 2012 CFR
2012-07-01
...; on-going treatment with fertilizer, herbicides, and pesticides as necessary; (2) After approximately... trees in cultivated soil with continued treatment with fertilizers, herbicides, and pesticides as...
29 CFR 780.216 - Nursery activities generally and Christmas tree production.
Code of Federal Regulations, 2014 CFR
2014-07-01
...; on-going treatment with fertilizer, herbicides, and pesticides as necessary; (2) After approximately... trees in cultivated soil with continued treatment with fertilizers, herbicides, and pesticides as...
29 CFR 780.216 - Nursery activities generally and Christmas tree production.
Code of Federal Regulations, 2011 CFR
2011-07-01
...; on-going treatment with fertilizer, herbicides, and pesticides as necessary; (2) After approximately... trees in cultivated soil with continued treatment with fertilizers, herbicides, and pesticides as...
Reynolds, Robert F; Bauerle, William L; Wang, Ying
2009-09-01
Deciduous trees have a seasonal carbon dioxide exchange pattern that is attributed to changes in leaf biochemical properties. However, it is not known if the pattern in leaf biochemical properties - maximum Rubisco carboxylation (V(cmax)) and electron transport (J(max)) - differ between species. This study explored whether a general pattern of changes in V(cmax), J(max), and a standardized soil moisture response accounted for carbon dioxide exchange of deciduous trees throughout the growing season. The model MAESTRA was used to examine V(cmax) and J(max) of leaves of five deciduous trees, Acer rubrum 'Summer Red', Betula nigra, Quercus nuttallii, Quercus phellos and Paulownia elongata, and their response to soil moisture. MAESTRA was parameterized using data from in situ measurements on organs. Linking the changes in biochemical properties of leaves to the whole tree, MAESTRA integrated the general pattern in V(cmax) and J(max) from gas exchange parameters of leaves with a standardized soil moisture response to describe carbon dioxide exchange throughout the growing season. The model estimates were tested against measurements made on the five species under both irrigated and water-stressed conditions. Measurements and modelling demonstrate that the seasonal pattern of biochemical activity in leaves and soil moisture response can be parameterized with straightforward general relationships. Over the course of the season, differences in carbon exchange between measured and modelled values were within 6-12 % under well-watered conditions and 2-25 % under water stress conditions. Hence, a generalized seasonal pattern in the leaf-level physiological change of V(cmax) and J(max), and a standardized response to soil moisture was sufficient to parameterize carbon dioxide exchange for large-scale evaluations. Simplification in parameterization of the seasonal pattern of leaf biochemical activity and soil moisture response of deciduous forest species is demonstrated. This allows reliable modelling of carbon exchange for deciduous trees, thus circumventing the need for extensive gas exchange experiments on different species.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hughes, T.P.; Clark, R.M.; Mostrom, M.A.
This report discusses the following topics on the LAMDA program: General maintenance; CTSS FCL script; DOS batch files; Macintosh MPW scripts; UNICOS FCL script; VAX/MS command file; LINC calling tree; and LAMDA calling tree.
Inferring Phylogenetic Networks Using PhyloNet.
Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay
2018-07-01
PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.
Stamatakis, Alexandros
2006-11-01
RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets > or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively. icwww.epfl.ch/~stamatak
Mathematics and evolutionary biology make bioinformatics education comprehensible.
Jungck, John R; Weisstein, Anton E
2013-09-01
The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.
Mathematics and evolutionary biology make bioinformatics education comprehensible
Weisstein, Anton E.
2013-01-01
The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621
77 FR 4239 - Sexual Assault Prevention and Response (SAPR) Program
Federal Register 2010, 2011, 2012, 2013, 2014
2012-01-27
.... Affected Public: Federal Government; Individuals or Households; Business or Other For-Profit; Not-For... General of the Military Departments and IG, DoD respectively. (2) Develop strategic program guidance...) Align Service SAPR Strategic Plans with the DoD SAPR Strategic Plan. (5) Align Service prevention...
On fractal properties of arterial trees.
Zamir, M
1999-04-21
The question of fractal properties of arterial trees is considered in light of data from the extensive tree structure of the right coronary artery of a human heart. Because of the highly non-uniform structure of this tree, the study focuses on the purely geometrical rather than statistical aspects of fractal properties. The large number of arterial bifurcations comprising the tree were found to have a mixed degree of asymmetry at all levels of the tree, including the depth of the tree where it has been generally supposed that they would be symmetrical. Cross-sectional area ratios of daughter to parent vessels were also found to be highly mixed at all levels, having values both above and below 1.0, rather than consistently above as has been generally supposed in the past. Calculated values of the power law index which describes the theoretical relation between the diameters of the three vessel segments at an arterial bifurcation were found to range far beyond the two values associated with the cube and square laws, and not clearly favoring one or the other. On the whole the tree structure was found to have what we have termed "pseudo-fractal" properties, in the sense that vessels of different calibers displayed the same branching pattern but with a range of values of the branching parameters. The results suggest that a higher degree of fractal character, one in which the branching parameters are constant throughout the tree structure, is unlikely to be attained in non-uniform vascular structures. Copyright 1999 Academic Press.
RBT-GA: a novel metaheuristic for solving the Multiple Sequence Alignment problem.
Taheri, Javid; Zomaya, Albert Y
2009-07-07
Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences.
Automatic bone segmentation in knee MR images using a coarse-to-fine strategy
NASA Astrophysics Data System (ADS)
Park, Sang Hyun; Lee, Soochahn; Yun, Il Dong; Lee, Sang Uk
2012-02-01
Segmentation of bone and cartilage from a three dimensional knee magnetic resonance (MR) image is a crucial element in monitoring and understanding of development and progress of osteoarthritis. Until now, various segmentation methods have been proposed to separate the bone from other tissues, but it still remains challenging problem due to different modality of MR images, low contrast between bone and tissues, and shape irregularity. In this paper, we present a new fully-automatic segmentation method of bone compartments using relevant bone atlases from a training set. To find the relevant bone atlases and obtain the segmentation, a coarse-to-fine strategy is proposed. In the coarse step, the best atlas among the training set and an initial segmentation are simultaneously detected using branch and bound tree search. Since the best atlas in the coarse step is not accurately aligned, all atlases from the training set are aligned to the initial segmentation, and the best aligned atlas is selected in the middle step. Finally, in the fine step, segmentation is conducted as adaptively integrating shape of the best aligned atlas and appearance prior based on characteristics of local regions. For experiment, femur and tibia bones of forty test MR images are segmented by the proposed method using sixty training MR images. Experimental results show that a performance of the segmentation and the registration becomes better as going near the fine step, and the proposed method obtain the comparable performance with the state-of-the-art methods.
Assessing crown dynamics and inter-tree competition in southern pines
Timothy A. Martin; Angelica Garcia; Tania Quesada; Eric J. Jokela; Salvador Gezan
2015-01-01
Genetic improvement of southern pines has been underway for 50 years and during this time, deployment of germplasm has generally evolved from more genetically diverse to less genetically diverse. Information is needed on how deployment of individual genotypes in pure blocks will affect traits such as within-stand variation in individual tree traits, as well as tree-...
A. Morani; D. Nowak; S. Hirabayashi; G. Guidolotti; M. Medori; V. Muzzini; S. Fares; G. Scarascia Mugnozza; C. Calfapietra
2014-01-01
Ozone flux estimates from the i-Tree model were compared with ozone flux measurements using the Eddy Covariance technique in a periurban Mediterranean forest near Rome (Castelporziano). For the first time i-Tree model outputs were compared with field measurements in relation to dry deposition estimates. Results showed generally a...
Kathleen L. Kavanaugh; Matthew B. Dickinson; Anthony S. Bova
2010-01-01
Current operational methods for predicting tree mortality from fire injury are regression-based models that only indirectly consider underlying causes and, thus, have limited generality. A better understanding of the physiological consequences of tree heating and injury are needed to develop biophysical process models that can make predictions under changing or novel...
A modeling study of the impact of urban trees on ozone
David J. Nowak; Kevin L. Civerolo; S. Trivikrama Rao; Gopal Sistla; Christopher J. Luley; Daniel E. Crane
2000-01-01
Modeling the effects of increased urban tree cover on ozone concentrations (July 13-15, 1995) from Washington, DC, to central Massachusetts reveals that urban trees generally reduce ozone concentrations in cities, but tend to increase average ozone concentrations in the overall modeling domain. During the daytime, average ozone reductions in urban areas (1 ppb) were...
Ecological and economic determinants of invasive tree species on Alabama forestland
Anwar Hussain; Changyou Sun; Xiaoping Zhou; Ian A. Munn
2008-01-01
The spread of invasive tree species has caused increasing harm to the environment. This study was motivated by the considerations that earlier studies generally ignored the role of economic factors related to the occurrence and abundance of invasive species, and empirical analyses of invasive trees on forestland have been inadequate. We assessed the impact of...
A tree classification for the selection forests of the Sierra Nevada
Duncan Dunning
1928-01-01
Individuality in man is accepted without question. In domestic animals, also, good and bad individuals are generally recognized. Even in some cultivated plants —orange trees and rubber trees— the poor producers are searched out and eliminated. Indeed, individual variability is a normal condition in all groups of organisms. Yet forest trees are...
Christopher E. Looney; Anthony W. D' Amato; Brian J. Palik; Robert A. Slesak
2017-01-01
Fraxinus nigra Marsh. (black ash), a dominant tree species of wetland forests in northern Minnesota, USA, is imperiled by the invasive insect emerald ash borer (EAB; Agrilus planipennis Fairmaire, 1888). Regeneration of associated tree species is generally low in F. nigra forests and could be impacted...
Projecting a Stand Table Through Time
Quang V. Cao; V. Clark Baldwin
1999-01-01
Stand tables provide number of trees per acre for each diameter class. This paper presents a general technique to predict a future stand table, based on the current stand table and future stand summary statistics such as trees and basal area per acre, and average diameter. The stand projection technique involves (a) predicting surviving trees for each class, and (b)...
Richard N. Conner; Daniel Saenz; D. Craig Rudolph; Richard R. Schaefer
2004-01-01
Extent of Phellinus pini decay in loblolly pines and red-cockaded woodpecker cavity trees in eastern Texas. Memoirs of The New York Botanical Garden 89: 315-321, 2004. To determine the prevalence of Phellinus pini in pines generally and red-cockaded woodpecker cavity trees specifically, we dissected 24 loblolly pines (...
Naturalized Exotic Tree Species in Puerto Rico
John K. Francis; Henri A. Liogier
1991-01-01
Many exotic tree species have been imported into Puerto Rico for their wood, fruit, and use as coffee shade and ornamentals. Some of these trees have naturalized (reproduced without human intervention) and some have escaped into natural forests. At least 118 exotic species are reproducing in Puerto Rico. Estimates are given for the general rate of spread and future...
Dinucleotide controlled null models for comparative RNA gene prediction.
Gesell, Tanja; Washietl, Stefan
2008-05-27
Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.
Comparative transcriptomics of 5 high-altitude vertebrates and their low-altitude relatives
Tang, Qianzi; Zhou, Xuming; Jin, Long; Guan, Jiuqiang; Liu, Rui; Li, Jing; Long, Kereng; Tian, Shilin; Che, Tiandong; Hu, Silu; Liang, Yan; Yang, Xuemei; Tao, Xuan; Zhong, Zhijun; Wang, Guosong; Chen, Xiaohui; Li, Diyan; Ma, Jideng; Wang, Xun; Mai, Miaomiao; Jiang, An’an; Luo, Xiaolin; Lv, Xuebin; Gladyshev, Vadim N; Li, Xuewei
2017-01-01
Abstract Background Species living at high altitude are subject to strong selective pressures due to inhospitable environments (e.g., hypoxia, low temperature, high solar radiation, and lack of biological production), making these species valuable models for comparative analyses of local adaptation. Studies that have examined high-altitude adaptation have identified a vast array of rapidly evolving genes that characterize the dramatic phenotypic changes in high-altitude animals. However, how high-altitude environment shapes gene expression programs remains largely unknown. Findings We generated a total of 910 Gb of high-quality RNA-seq data for 180 samples derived from 6 tissues of 5 agriculturally important high-altitude vertebrates (Tibetan chicken, Tibetan pig, Tibetan sheep, Tibetan goat, and yak) and their cross-fertile relatives living in geographically neighboring low-altitude regions. Of these, ∼75% reads could be aligned to their respective reference genomes, and on average ∼60% of annotated protein coding genes in each organism showed FPKM expression values greater than 0.5. We observed a general concordance in topological relationships between the nucleotide alignments and gene expression–based trees. Tissue and species accounted for markedly more variance than altitude based on either the expression or the alternative splicing patterns. Cross-species clustering analyses showed a tissue-dominated pattern of gene expression and a species-dominated pattern for alternative splicing. We also identified numerous differentially expressed genes that could potentially be involved in phenotypic divergence shaped by high-altitude adaptation. Conclusions These data serve as a valuable resource for examining the convergence and divergence of gene expression changes between species as they adapt or acclimatize to high-altitude environments. PMID:29149296
Comparative transcriptomics of 5 high-altitude vertebrates and their low-altitude relatives.
Tang, Qianzi; Gu, Yiren; Zhou, Xuming; Jin, Long; Guan, Jiuqiang; Liu, Rui; Li, Jing; Long, Kereng; Tian, Shilin; Che, Tiandong; Hu, Silu; Liang, Yan; Yang, Xuemei; Tao, Xuan; Zhong, Zhijun; Wang, Guosong; Chen, Xiaohui; Li, Diyan; Ma, Jideng; Wang, Xun; Mai, Miaomiao; Jiang, An'an; Luo, Xiaolin; Lv, Xuebin; Gladyshev, Vadim N; Li, Xuewei; Li, Mingzhou
2017-12-01
Species living at high altitude are subject to strong selective pressures due to inhospitable environments (e.g., hypoxia, low temperature, high solar radiation, and lack of biological production), making these species valuable models for comparative analyses of local adaptation. Studies that have examined high-altitude adaptation have identified a vast array of rapidly evolving genes that characterize the dramatic phenotypic changes in high-altitude animals. However, how high-altitude environment shapes gene expression programs remains largely unknown. We generated a total of 910 Gb of high-quality RNA-seq data for 180 samples derived from 6 tissues of 5 agriculturally important high-altitude vertebrates (Tibetan chicken, Tibetan pig, Tibetan sheep, Tibetan goat, and yak) and their cross-fertile relatives living in geographically neighboring low-altitude regions. Of these, ∼75% reads could be aligned to their respective reference genomes, and on average ∼60% of annotated protein coding genes in each organism showed FPKM expression values greater than 0.5. We observed a general concordance in topological relationships between the nucleotide alignments and gene expression-based trees. Tissue and species accounted for markedly more variance than altitude based on either the expression or the alternative splicing patterns. Cross-species clustering analyses showed a tissue-dominated pattern of gene expression and a species-dominated pattern for alternative splicing. We also identified numerous differentially expressed genes that could potentially be involved in phenotypic divergence shaped by high-altitude adaptation. These data serve as a valuable resource for examining the convergence and divergence of gene expression changes between species as they adapt or acclimatize to high-altitude environments. © The Authors 2017. Published by Oxford University Press.
VC-dimension of univariate decision trees.
Yildiz, Olcay Taner
2015-02-01
In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
Parental alignments and rejection: an empirical study of alienation in children of divorce.
Johnston, Janet R
2003-01-01
This study of family relationships after divorce examined the frequency and extent of child-parent alignments and correlates of children's rejection of a parent, these being basic components of the controversial idea of "parental alienation syndrome." The sample consisted of 215 children from the family courts and general community two to three years after parental separation. The findings indicate that children's attitudes toward their parents range from positive to negative, with relatively few being extremely aligned or rejecting. Rejection of a parent has multiple determinants, with both the aligned and rejected parents contributing to the problem, in addition to vulnerabilities within children themselves.
Automated interferometric alignment system for paraboloidal mirrors
Maxey, L. Curtis
1993-01-01
A method is described for a systematic method of interpreting interference fringes obtained by using a corner cube retroreflector as an alignment aid when aigning a paraboloid to a spherical wavefront. This is applicable to any general case where such alignment is required, but is specifically applicable in the case of aligning an autocollimating test using a diverging beam wavefront. In addition, the method provides information which can be systematically interpreted such that independent information about pitch, yaw and focus errors can be obtained. Thus, the system lends itself readily to automation. Finally, although the method is developed specifically for paraboloids, it can be seen to be applicable to a variety of other aspheric optics when applied in combination with a wavefront corrector that produces a wavefront which, when reflected from the correctly aligned aspheric surface will produce a collimated wavefront like that obtained from the paraboloid when it is correctly aligned to a spherical wavefront.
Rand, Karin; Bar, Einat; Ben-Ari, Matan; Lewinsohn, Efraim; Inbar, Moshe
2014-06-01
Pistacia palaestina Boiss. (Anacardiaceae), a sibling species of P. terebinthus also known as turpentine tree or terebinth tree, is common in the Levant region. The aphid Baizongia pistaciae L. manipulates the leaves of the plant to form large galls, which provide both food and protection for its developing offspring. We analyzed the levels and composition of mono-and sesquiterpenes in both leaves and galls of ten naturally growing trees. Our results show that monoterpene hydrocarbons are the main constituents of P. palaestina leaves and galls, but terpene levels and composition vary among trees. Despite this inter-tree variation, terpene levels and compositions in galls from different trees resemble each other more than the patterns displayed by leaves from the same trees. Generally, galls contain 10 to 60 fold higher total terpene amounts than leaves, especially of the monoterpenes α-pinene and limonene. Conversely, the leaves generally accumulate more sesquiterpenes, in particular E-caryophyllene, germacrene D and δ-cadinene, in comparison to galls. Our results clearly show that the terpene pattern in the galls is not a simple reflection of that of the leaves and suggest that aphids have a strong impact on the metabolism of their host plant, possibly for their own defense.
Kane, Jeffrey M.; van Mantgem, Phillip J.; Lalemand, Laura; Keifer, MaryBeth
2017-01-01
Managers require accurate models to predict post-fire tree mortality to plan prescribed fire treatments and examine their effectiveness. Here we assess the performance of a common post-fire tree mortality model with an independent dataset of 11 tree species from 13 National Park Service units in the western USA. Overall model discrimination was generally strong, but performance varied considerably among species and sites. The model tended to have higher sensitivity (proportion of correctly classified dead trees) and lower specificity (proportion of correctly classified live trees) for many species, indicating an overestimation of mortality. Variation in model accuracy (percentage of live and dead trees correctly classified) among species was not related to sample size or percentage observed mortality. However, we observed a positive relationship between specificity and a species-specific bark thickness multiplier, indicating that overestimation was more common in thin-barked species. Accuracy was also quite low for thinner bark classes (<1 cm) for many species, leading to poorer model performance. Our results indicate that a common post-fire mortality model generally performs well across a range of species and sites; however, some thin-barked species and size classes would benefit from further refinement to improve model specificity.
2014-01-01
Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393
ERIC Educational Resources Information Center
Carroll, Kathleen
2015-01-01
The challenge of updating curriculum to align with Common Core State Standards is a national one felt by states, districts, and teachers alike. Teachers generally express enthusiasm for the Common Core, but consistently cite a lack of high-quality curricula as an impediment to teaching them. The demand for core-aligned quality materials has far…
NEURONAL ACTION ON THE DEVELOPING BLOOD VESSEL PATTERN
James, Jennifer M.; Mukouyama, Yoh-suke
2011-01-01
The nervous system relies on a highly specialized network of blood vessels for development and neuronal survival. Recent evidence suggests that both the central and peripheral nervous systems (CNS and PNS) employ multiple mechanisms to shape the vascular tree to meet its specific metabolic demands, such as promoting nerve-artery alignment in the PNS or the development the blood brain barrier in the CNS. In this article we discuss how the nervous system directly influences blood vessel patterning resulting in neuro-vascular congruence that is maintained throughout development and in the adult. PMID:21978864
Personologic alignment and the treatment of posttraumatic distress.
Everly, G
2001-01-01
The therapeutic alliance is generally considered the sine qua non of successful psychotherapy. Yet, establishing the therapeutic alliance with patients suffering from syndromes of posttraumatic distress (including posttraumatic stress disorder) represents an unusual challenge. This paper describes the use of a personality-based approach to the establishment of the therapeutic alliance. This approach is referred to as personologic alignment and consists of alignment with preferential processes, as well as thematic belief systems. It represents an integration of the personology of Theodore Millon and the rhetoric of Aristotle.
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gu, Shengyin; Anderson, Iain; Kunin, Victor
2007-05-07
Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
STELLAR: fast and exact local alignments
2011-01-01
Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de. PMID:22151882
Heterogeneous Compression of Large Collections of Evolutionary Trees.
Matthews, Suzanne J
2015-01-01
Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
Evolution of physician-hospital alignment models: a case study of comanagement.
Sowers, Kevin W; Newman, Paul R; Langdon, Jeffrey C
2013-06-01
Recently, quality, financial, and regulatory demands have driven physicians to seek alignment opportunities with hospitals. The motivation for alignment on the part of physicians and hospitals is now accelerating because the new paradigm under healthcare reform requires an increased focus on improving quality, cost, and efficiency. We (1) identify the key drivers for physician-hospital alignment models; (2) summarize comanagement as a physician-hospital alignment model; and (3) explore a detailed case study of comanagement as an option to better align physicians with hospital goals on quality, safety, and outcomes. A Medline abstract review was performed that identified 45 references that discuss options for physician-hospital alignment. None of the articles identified provide a detailed example of successful alignment structures. A detailed case study of a successful comanagement alignment program is reviewed. The key drivers for alignment are inpatient growth rates, declining reimbursements, and the opportunity to improve quality, decrease costs, and increase efficiency. Two general strategies of alignment involve noneconomic and/or economic integration. In our example, comanagement with economic integration was chosen as the preferred structure for physician-hospital alignment. The choice of structure will vary depending on the existing relationships and governance of the hospital and the physicians in the targeted area of focus. The measure of success in building physician-hospital alignment is measured in improvements in care for the patient, reduced cost of care delivery, and improved relations between physicians and hospital leadership.
Soil hydrology of agroforestry systems: Competition for water or positive tree-crops interactions?
NASA Astrophysics Data System (ADS)
Gerjets, Rowena; Richter, Falk; Jansen, Martin; Carminati, Andrea
2017-04-01
In dry periods during the growing season crops may suffer from severe water stress. The question arises whether the alternation of crop and tree strips might enhance and sustain soil water resources available for crops during drought events. Trees reduce wind exposure, decreasing the potential evapotranspiration of crops and soils; additionally hydraulic lift from the deep roots of trees to the drier top soil might provide additional water for shallow-rooted crops. To understand the above and belowground water relations of agroforestry systems, we measured soil moisture and soil water potential in crop strips as a function of distance to the trees at varying depth as well as meteorological parameters. At the agroforestry site Reiffenhausen, Lower Saxony, Germany, two different tree species are planted, each in one separated tree strip: willow breed Tordis ((Salix viminalis x Salix Schwerinii) x Salix viminalis) and poplar clone Max 1 (Populus nigra x Populus maximowiczii). In between the tree strips a crop strip of 24 m width was established with annual crop rotation, managed the same way as the reference site. During a drought period in May 2016 with less than 2 mm rain in four weeks, an overall positive effect on hydrological conditions of the agroforestry system was observed. The results show that trees shaded the soil surface, lowering the air temperature and further increasing the soil moisture in the crop strips compared to the reference site, which was located far from the trees. At the reference site the crops took up water in the upper soil (<20 cm depth); after the soil reached water potentials below -100 kPa, root water uptake moved to deeper soil layers (<40 cm). Because of the higher wind and solar radiation exposure the reference soil profile was severely dried out. Also in the crop strips of the agroforestry system, crops took up water in the upper soil. However, the lower soil layers remained wet for an extended period of time. The tree strips reduced the wind speed, hence lowering evapotranspiration in the crop strip. The plot was not aligned directly to North and we observed steeper soil water potential gradients in the part of the crop strip more exposed to sunlight. The two tree species behaved differently. The poplar strips showed more marked diurnal changes in soil water potential, with fast drying during daytime and rewetting during nighttime. We suppose that the rewetting during nighttime was caused by hydraulic lift, which supports passively the drier upper soil with water from the wetter, lower soil layers. This experimental study shows the importance of above- and belowground tree-crop interactions and demonstrate the positive effect of tree strips in reducing drought stress in crops.
NASA Astrophysics Data System (ADS)
Ogle, K.; Fell, M.; Barber, J. J.
2016-12-01
Empirical, field studies of plant functional traits have revealed important trade-offs among pairs or triplets of traits, such as the leaf (LES) and wood (WES) economics spectra. Trade-offs include correlations between leaf longevity (LL) vs specific leaf area (SLA), LL vs mass-specific leaf respiration rate (RmL), SLA vs RmL, and resistance to breakage vs wood density. Ordination analyses (e.g., PCA) show groupings of traits that tend to align with different life-history strategies or taxonomic groups. It is unclear, however, what underlies such trade-offs and emergent spectra. Do they arise from inherent physiological constraints on growth, or are they more reflective of environmental filtering? The relative importance of these mechanisms has implications for predicting biogeochemical cycling, which is influenced by trait distributions of the plant community. We address this question using an individual-based model of tree growth (ACGCA) to quantify the theoretical trait space of trees that emerges from physiological constraints. ACGCA's inputs include 32 physiological, anatomical, and allometric traits, many of which are related to the LES and WES. We fit ACGCA to 1.6 million USFS FIA observations of tree diameters and heights to obtain vectors of trait values that produce realistic growth, and we explored the structure of this trait space. No notable correlations emerged among the 496 trait pairs, but stepwise regressions revealed complicated multi-variate structure: e.g., relationships between pairs of traits (e.g., RmL and SLA) are governed by other traits (e.g., LL, radiation-use efficiency [RUE]). We also simulated growth under various canopy gap scenarios that impose varying degrees of environmental filtering to explore the multi-dimensional trait space (hypervolume) of trees that died vs survived. The centroid and volume of the hypervolumes differed among dead and live trees, especially under gap conditions leading to low mortality. Traits most predictive of tree-level mortality were maximum tree height, RUE, xylem conducting area, and branch turn-over rate. We are using these hypervolumes as priors to an emulator that approximates the ACGCA, which we are fitting to the FIA data to quantify species-specific trait spectra and to explore factors giving rise to species differences.
Alignments of Dark Matter Halos with Large-scale Tidal Fields: Mass and Redshift Dependence
NASA Astrophysics Data System (ADS)
Chen, Sijie; Wang, Huiyuan; Mo, H. J.; Shi, Jingjing
2016-07-01
Large-scale tidal fields estimated directly from the distribution of dark matter halos are used to investigate how halo shapes and spin vectors are aligned with the cosmic web. The major, intermediate, and minor axes of halos are aligned with the corresponding tidal axes, and halo spin axes tend to be parallel with the intermediate axes and perpendicular to the major axes of the tidal field. The strengths of these alignments generally increase with halo mass and redshift, but the dependence is only on the peak height, ν \\equiv {δ }{{c}}/σ ({M}{{h}},z). The scaling relations of the alignment strengths with the value of ν indicate that the alignment strengths remain roughly constant when the structures within which the halos reside are still in a quasi-linear regime, but decreases as nonlinear evolution becomes more important. We also calculate the alignments in projection so that our results can be compared directly with observations. Finally, we investigate the alignments of tidal tensors on large scales, and use the results to understand alignments of halo pairs separated at various distances. Our results suggest that the coherent structure of the tidal field is the underlying reason for the alignments of halos and galaxies seen in numerical simulations and in observations.
Thinning northern hardwoods in New England by dominant-tree removal — early results
William Leak
2007-01-01
Commercial thinning is a widely accepted practice in northern hardwood stands of New England. Commercial thinning guidelines for eastern hardwoods generally recommend releasing selected crop trees or the removal of trees in less-than-dominant crown classes unless they are of poor health or quality. However, many northern hardwood stands in New England have a dominant...
A.R. Weiskittel; D. Maguire; R.A. Monserud
2007-01-01
Static models of individual tree crown attributes such as height to crown base and maximum branch diameter profile have been developed for several commercially important species. Dynamic models of individual branch growth and mortality have received less attention, but have generally been developed retrospectively by dissecting felled trees; however, this approach is...
Martin Wilmking; Glenn P. Juday; Valerie A. Barber; Harold S.J. Zald
2004-01-01
Northern and high-latitude alpine treelines are generally thought to be limited by available warmth. Most studies of tree-growth-climate interaction at treeline as well as climate reconstructions using dendrochronology report positive growth response of treeline trees to warmer temperatures. However, population-wide responses of treeline trees to climate remain largely...
A call to improve methods for estimating tree biomass for regional and national assessments
Aaron R. Weiskittel; David W. MacFarlane; Philip J. Radtke; David L.R. Affleck; Hailemariam Temesgen; Christopher W. Woodall; James A. Westfall; John W. Coulston
2015-01-01
Tree biomass is typically estimated using statistical models. This review highlights five limitations of most tree biomass models, which include the following: (1) biomass data are costly to collect and alternative sampling methods are used; (2) belowground data and models are generally lacking; (3) models are often developed from small and geographically limited data...
Robert A. Haack; Robert K. Lawrence; George C. Heaton
2001-01-01
Overwintering behavior of Tomicus piniperda (L.) was studied in a Scotch pine (Pinus sylvestris L.) Christmas tree plantation in Indiana (1992-1994) and a plantation in Michigan (1994). In general, adults feed inside shoots during summer, then move to overwintering sites at the base of trees in autumn. In early autumn, adults were...
Adjusting forest density estimates for surveyor bias in historical tree surveys
Brice B. Hanberry; Jian Yang; John M. Kabrick; Hong S. He
2012-01-01
The U.S. General Land Office surveys, conducted between the late 1700s to early 1900s, provide records of trees prior to widespread European and American colonial settlement. However, potential and documented surveyor bias raises questions about the reliability of historical tree density estimates and other metrics based on density estimated from these records. In this...
Selected mechanical and physical properties of Chinese tallow tree juvenile wood
Todd F. Shupe; LEslie H. Groom; Thomas L. Eberhardt; Thomas C. Pesacreta; Timothy G. Rials
2008-01-01
Chinese tallow tree is a noxious, invasive plant in the Southeastern United States. It is generally considered a nuisance and has no current commercial use. The objective of this research was to determine the moduli of rupture (MOR) and elasticity (MOE) of the stem wood of this species at different vertical sampling locations. Three Chinese tallow trees were felled and...
Local and general above-stump biomass functions for loblolly pine and slash pine trees
Carlos A. Gonzalez-Beneke; Salvador Gezan; Tmothy J. Albaugh; H. Lee Allen; Harold E. Burkhart; Thomas R. Fox; Eric J. Jokela; Christopher Maier; Timothy A. Martin; Rafael A. Rubilar; Lisa J. Samuelson
2014-01-01
There is an increasing interest in estimating biomass for loblolly pine (Pinus taeda L.) and slash pine (Pinus elliottii Engelm. var. elliottii), two of the most ecologically and commercially important tree species in North America. The majority of the available individual-tree allometric models are local, relying on stem diameter outside bark at breast height (dbh)...
The Effect of Trees on Crime in Portland, Oregon
Geoffrey H. Donovan; Jeffrey P. Prestemon
2012-01-01
The authors estimate the relationship between trees and three crime aggregates (all crime, violent crime, and property crime) and two individual crimes (burglary and vandalism) in Portland, Oregon. During the study period (2005-2007), 431 crimes were reported at the 2,813 single-family homes in our sample. In general, the authors find that trees in the public right of...
LAMDA programmer`s manual. [Final report, Part 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hughes, T.P.; Clark, R.M.; Mostrom, M.A.
This report discusses the following topics on the LAMDA program: General maintenance; CTSS FCL script; DOS batch files; Macintosh MPW scripts; UNICOS FCL script; VAX/MS command file; LINC calling tree; and LAMDA calling tree.
Nagy, László G; Házi, Judit; Szappanos, Balázs; Kocsubé, Sándor; Bálint, Balázs; Rákhely, Gábor; Vágvölgyi, Csaba; Papp, Tamás
2012-07-01
Bursts of diversification are known to have contributed significantly to the extant morphological and species diversity, but evidence for many of the theoretical predictions about adaptive radiations have remained contentious. Despite their tremendous diversity, patterns of evolutionary diversification and the contribution of explosive episodes in fungi are largely unknown. Here, using the genus Coprinellus (Psathyrellaceae, Agaricales) as a model, we report the first explosive fungal radiation and infer that the onset of the radiation correlates with a change from a multilayered to a much simpler defense structure on the fruiting bodies. We hypothesize that this change constitutes a key innovation, probably relaxing constraints on diversification imposed by nutritional investment into the development of protective tissues of fruiting bodies. Fossil calibration suggests that Coprinellus mushrooms radiated during the Miocene coinciding with global radiation of large grazing mammals following expansion of dry open grasslands. In addition to diversification rate-based methods, we test the hard polytomy hypothesis, by analyzing the resolvability of internal nodes of the backbone of the putative radiation using Reversible-Jump MCMC. We discuss potential applications and pitfalls of this approach as well as how biologically meaningful polytomies can be distinguished from alignment shortcomings. Our data provide insights into the nature of adaptive radiations in general by revealing a deceleration of morphological diversification through time. The dynamics of morphological diversification was approximated by obtaining the temporal distribution of state changes in discrete traits along the trees and comparing it with the tempo of lineage accumulation. We found that the number of state changes correlate with the number of lineages, even in parts of the tree with short internal branches, and peaks around the onset of the explosive radiation followed by a slowdown, most likely because of the decrease in available niches.
Analysis of codon usage in beta-tubulin sequences of helminths.
von Samson-Himmelstjerna, G; Harder, A; Failing, K; Pape, M; Schnieder, T
2003-07-01
Codon usage bias has been shown to be correlated with gene expression levels in many organisms, including the nematode Caenorhabditis elegans. Here, the codon usage (cu) characteristics for a set of currently available beta-tubulin coding sequences of helminths were assessed by calculating several indices, including the effective codon number (Nc), the intrinsic codon deviation index (ICDI), the P2 value and the mutational response index (MRI). The P2 value gives a measure of translational pressure, which has been shown to be correlated to high gene expression levels in some organisms, but it has not yet been analysed in that respect in helminths. For all but two of the C. elegans beta-tubulin coding sequences investigated, the P2 value was the only index that indicated the presence of codon usage bias. Therefore, we propose that in general the helminth beta-tubulin sequences investigated here are not expressed at high levels. Furthermore, we calculated the correlation coefficients for the cu patterns of the helminth beta-tubulin sequences compared with those of highly expressed genes in organisms such as Escherichia coli and C. elegans. It was found that beta-tubulin cu patterns for all sequences of members of the Strongylida were significantly correlated to those for highly expressed C. elegans genes. This approach provides a new measure for comparing the adaptation of cu of a particular coding sequence with that of highly expressed genes in possible expression systems.Finally, using the cu patterns of the sequences studied, a phylogenetic tree was constructed. The topology of this tree was very much in concordance with that of a phylogeny based on small subunit ribosomal DNA sequence alignments.
IcyTree: rapid browser-based visualization for phylogenetic trees and networks
2017-01-01
Abstract Summary: IcyTree is an easy-to-use application which can be used to visualize a wide variety of phylogenetic trees and networks. While numerous phylogenetic tree viewers exist already, IcyTree distinguishes itself by being a purely online tool, having a responsive user interface, supporting phylogenetic networks (ancestral recombination graphs in particular), and efficiently drawing trees that include information such as ancestral locations or trait values. IcyTree also provides intuitive panning and zooming utilities that make exploring large phylogenetic trees of many thousands of taxa feasible. Availability and Implementation: IcyTree is a web application and can be accessed directly at http://tgvaughan.github.com/icytree. Currently supported web browsers include Mozilla Firefox and Google Chrome. IcyTree is written entirely in client-side JavaScript (no plugin required) and, once loaded, does not require network access to run. IcyTree is free software, and the source code is made available at http://github.com/tgvaughan/icytree under version 3 of the GNU General Public License. Contact: tgvaughan@gmail.com PMID:28407035
IcyTree: rapid browser-based visualization for phylogenetic trees and networks.
Vaughan, Timothy G
2017-08-01
IcyTree is an easy-to-use application which can be used to visualize a wide variety of phylogenetic trees and networks. While numerous phylogenetic tree viewers exist already, IcyTree distinguishes itself by being a purely online tool, having a responsive user interface, supporting phylogenetic networks (ancestral recombination graphs in particular), and efficiently drawing trees that include information such as ancestral locations or trait values. IcyTree also provides intuitive panning and zooming utilities that make exploring large phylogenetic trees of many thousands of taxa feasible. IcyTree is a web application and can be accessed directly at http://tgvaughan.github.com/icytree . Currently supported web browsers include Mozilla Firefox and Google Chrome. IcyTree is written entirely in client-side JavaScript (no plugin required) and, once loaded, does not require network access to run. IcyTree is free software, and the source code is made available at http://github.com/tgvaughan/icytree under version 3 of the GNU General Public License. tgvaughan@gmail.com. © The Author(s) 2017. Published by Oxford University Press.
Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics.
Grünewald, Stefan; Long, Yangjing; Wu, Yaokun
2018-03-09
Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.
Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction.
Sayyari, Erfan; Mirarab, Siavash
2016-11-11
Inferring species trees from gene trees using the coalescent-based summary methods has been the subject of much attention, yet new scalable and accurate methods are needed. We introduce DISTIQUE, a new statistically consistent summary method for inferring species trees from gene trees under the coalescent model. We generalize our results to arbitrary phylogenetic inference problems; we show that two arbitrarily chosen leaves, called anchors, can be used to estimate relative distances between all other pairs of leaves by inferring relevant quartet trees. This results in a family of distance-based tree inference methods, with running times ranging between quadratic to quartic in the number of leaves. We show in simulated studies that DISTIQUE has comparable accuracy to leading coalescent-based summary methods and reduced running times.
ARYANA: Aligning Reads by Yet Another Approach
2014-01-01
Motivation Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $106 prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. Contribution We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. Availability ARYANA with complete source code can be obtained from http://github.com/aryana-aligner PMID:25252881
ARYANA: Aligning Reads by Yet Another Approach.
Gholami, Milad; Arbabi, Aryan; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Sadeghi, Mehdi
2014-01-01
Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $10(6) prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. ARYANA with complete source code can be obtained from http://github.com/aryana-aligner.
AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis
Aniba, Mohamed Radhouene; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn
2010-01-01
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys. PMID:20530533
A new method to cluster genomes based on cumulative Fourier power spectrum.
Dong, Rui; Zhu, Ziyue; Yin, Changchuan; He, Rong L; Yau, Stephen S-T
2018-06-20
Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum). Copyright © 2018. Published by Elsevier B.V.
eHive: an artificial intelligence workflow system for genomic analysis.
Severin, Jessica; Beal, Kathryn; Vilella, Albert J; Fitzgerald, Stephen; Schuster, Michael; Gordon, Leo; Ureta-Vidal, Abel; Flicek, Paul; Herrero, Javier
2010-05-11
The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Increased spruce tree growth in Central Europe since 1960s.
Cienciala, Emil; Altman, Jan; Doležal, Jiří; Kopáček, Jiří; Štěpánek, Petr; Ståhl, Göran; Tumajer, Jan
2018-04-01
Tree growth response to recent environmental changes is of key interest for forest ecology. This study addressed the following questions with respect to Norway spruce (Picea abies, L. Karst.) in Central Europe: Has tree growth accelerated during the last five decades? What are the main environmental drivers of the observed tree radial stem growth and how much variability can be explained by them? Using a nationwide dendrochronological sampling of Norway spruce in the Czech Republic (1246 trees, 266 plots), novel regional tree-ring width chronologies for 40(±10)- and 60(±10)-year old trees were assembled, averaged across three elevation zones (break points at 500 and 700m). Correspondingly averaged drivers, including temperature, precipitation, nitrogen (N) deposition and ambient CO 2 concentration, were used in a general linear model (GLM) to analyze the contribution of these in explaining tree ring width variability for the period from 1961 to 2013. Spruce tree radial stem growth responded strongly to the changing environment in Central Europe during the period, with a mean tree ring width increase of 24 and 32% for the 40- and 60-year old trees, respectively. The indicative General Linear Model analysis identified CO 2 , precipitation during the vegetation season, spring air temperature (March-May) and N-deposition as the significant covariates of growth, with the latter including interactions with elevation zones. The regression models explained 57% and 55% of the variability in the two tree ring width chronologies, respectively. Growth response to N-deposition showed the highest variability along the elevation gradient with growth stimulation/limitation at sites below/above 700m. A strong sensitivity of stem growth to CO 2 was also indicated, suggesting that the effect of rising ambient CO 2 concentration (direct or indirect by increased water use efficiency) should be considered in analyses of long-term growth together with climatic factors and N-deposition. Copyright © 2017 Elsevier B.V. All rights reserved.
CTE Alignment with 21st Century Skills
ERIC Educational Resources Information Center
Drysielski, Robert
2015-01-01
Career and technical education generally has focused on helping people to understand the relationship between education and work to acquire employment skills. There is a need for action to ensure that the programs being offered in Career and Technical Education (CTE) align with the needs of the 21st century workforce. This research will attempted…
Adult Basic Education: Aligning Adult Basic Education and Postsecondary Education
ERIC Educational Resources Information Center
Texas Higher Education Coordinating Board, 2008
2008-01-01
In 2007, the 80th Texas Legislature included a rider to the General Appropriations Act for the Texas Higher Education Coordinating Board. The rider directed the agency to coordinate with the Texas Education Agency to develop and implement plans to align adult basic education with postsecondary education. The Coordinating Board, in collaboration…
36 CFR 223.2 - Disposal of timber for administrative use.
Code of Federal Regulations, 2013 CFR
2013-07-01
... PRODUCTS General Provisions § 223.2 Disposal of timber for administrative use. Trees, portions of trees, or... of value for the protection or the administration of Federal lands. (b) For fuel in Federal camps...
36 CFR 223.2 - Disposal of timber for administrative use.
Code of Federal Regulations, 2011 CFR
2011-07-01
... PRODUCTS General Provisions § 223.2 Disposal of timber for administrative use. Trees, portions of trees, or... of value for the protection or the administration of Federal lands. (b) For fuel in Federal camps...
36 CFR 223.2 - Disposal of timber for administrative use.
Code of Federal Regulations, 2014 CFR
2014-07-01
... PRODUCTS General Provisions § 223.2 Disposal of timber for administrative use. Trees, portions of trees, or... of value for the protection or the administration of Federal lands. (b) For fuel in Federal camps...
36 CFR 223.2 - Disposal of timber for administrative use.
Code of Federal Regulations, 2012 CFR
2012-07-01
... PRODUCTS General Provisions § 223.2 Disposal of timber for administrative use. Trees, portions of trees, or... of value for the protection or the administration of Federal lands. (b) For fuel in Federal camps...
Influence of Wind Speed on RGB-D Images in Tree Plantations
Andújar, Dionisio; Dorado, José; Bengochea-Guevara, José María; Conesa-Muñoz, Jesús; Fernández-Quintanilla, César; Ribeiro, Ángela
2017-01-01
Weather conditions can affect sensors’ readings when sampling outdoors. Although sensors are usually set up covering a wide range of conditions, their operational range must be established. In recent years, depth cameras have been shown as a promising tool for plant phenotyping and other related uses. However, the use of these devices is still challenged by prevailing field conditions. Although the influence of lighting conditions on the performance of these cameras has already been established, the effect of wind is still unknown. This study establishes the associated errors when modeling some tree characteristics at different wind speeds. A system using a Kinect v2 sensor and a custom software was tested from null wind speed up to 10 m·s−1. Two tree species with contrasting architecture, poplars and plums, were used as model plants. The results showed different responses depending on tree species and wind speed. Estimations of Leaf Area (LA) and tree volume were generally more consistent at high wind speeds in plum trees. Poplars were particularly affected by wind speeds higher than 5 m·s−1. On the contrary, height measurements were more consistent for poplars than for plum trees. These results show that the use of depth cameras for tree characterization must take into consideration wind conditions in the field. In general, 5 m·s−1 (18 km·h−1) could be established as a conservative limit for good estimations. PMID:28430119
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rivasseau, Vincent, E-mail: vincent.rivasseau@th.u-psud.fr, E-mail: adrian.tanasa@ens-lyon.org; Tanasa, Adrian, E-mail: vincent.rivasseau@th.u-psud.fr, E-mail: adrian.tanasa@ens-lyon.org
The Loop Vertex Expansion (LVE) is a quantum field theory (QFT) method which explicitly computes the Borel sum of Feynman perturbation series. This LVE relies in a crucial way on symmetric tree weights which define a measure on the set of spanning trees of any connected graph. In this paper we generalize this method by defining new tree weights. They depend on the choice of a partition of a set of vertices of the graph, and when the partition is non-trivial, they are no longer symmetric under permutation of vertices. Nevertheless we prove they have the required positivity property tomore » lead to a convergent LVE; in fact we formulate this positivity property precisely for the first time. Our generalized tree weights are inspired by the Brydges-Battle-Federbush work on cluster expansions and could be particularly suited to the computation of connected functions in QFT. Several concrete examples are explicitly given.« less
Universal statistics of terminal dynamics before collapse
NASA Astrophysics Data System (ADS)
Lenner, Nicolas; Eule, Stephan; Wolf, Fred
Recent biological developments have both drastically increased the precision as well as amount of generated data, allowing for a switching from pure mean value characterization of the process under consideration to an analysis of the whole ensemble, exploiting the stochastic nature of biology. We focus on the general class of non-equilibrium processes with distinguished terminal points as can be found in cell fate decision, check points or cognitive neuroscience. Aligning the data to a terminal point (e.g. represented as an absorbing boundary) allows to device a general methodology to characterize and reverse engineer the terminating history. Using a small noise approximation we derive mean variance and covariance of the aligned data for general finite time singularities.
Coalescent histories for caterpillar-like families.
Rosenberg, Noah A
2013-01-01
A coalescent history is an assignment of branches of a gene tree to branches of a species tree on which coalescences in the gene tree occur. The number of coalescent histories for a pair consisting of a labeled gene tree topology and a labeled species tree topology is important in gene tree probability computations, and more generally, in studying evolutionary possibilities for gene trees on species trees. Defining the Tr-caterpillar-like family as a sequence of n-taxon trees constructed by replacing the r-taxon subtree of n-taxon caterpillars by a specific r-taxon labeled topology Tr, we examine the number of coalescent histories for caterpillar-like families with matching gene tree and species tree labeled topologies. For each Tr with size r≤8, we compute the number of coalescent histories for n-taxon trees in the Tr-caterpillar-like family. Next, as n→∞, we find that the limiting ratio of the numbers of coalescent histories for the Tr family and caterpillars themselves is correlated with the number of labeled histories for Tr. The results support a view that large numbers of coalescent histories occur when a tree has both a relatively balanced subtree and a high tree depth, contributing to deeper understanding of the combinatorics of gene trees and species trees.
NASA Technical Reports Server (NTRS)
Wheeler, Ward C.
2003-01-01
A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Using confidence intervals to evaluate the focus alignment of spectrograph detector arrays.
Sawyer, Travis W; Hawkins, Kyle S; Damento, Michael
2017-06-20
High-resolution spectrographs extract detailed spectral information of a sample and are frequently used in astronomy, laser-induced breakdown spectroscopy, and Raman spectroscopy. These instruments employ dispersive elements such as prisms and diffraction gratings to spatially separate different wavelengths of light, which are then detected by a charge-coupled device (CCD) or complementary metal-oxide-semiconductor (CMOS) detector array. Precise alignment along the optical axis (focus position) of the detector array is critical to maximize the instrumental resolution; however, traditional approaches of scanning the detector through focus lack a quantitative measure of precision, limiting the repeatability and relying on one's experience. Here we propose a method to evaluate the focus alignment of spectrograph detector arrays by establishing confidence intervals to measure the alignment precision. We show that propagation of uncertainty can be used to estimate the variance in an alignment, thus providing a quantitative and repeatable means to evaluate the precision and confidence of an alignment. We test the approach by aligning the detector array of a prototype miniature echelle spectrograph. The results indicate that the procedure effectively quantifies alignment precision, enabling one to objectively determine when an alignment has reached an acceptable level. This quantitative approach also provides a foundation for further optimization, including automated alignment. Furthermore, the procedure introduced here can be extended to other alignment techniques that rely on numerically fitting data to a model, providing a general framework for evaluating the precision of alignment methods.
Sambles, Christine M; Salmon, Deborah L; Florance, Hannah; Howard, Thomas P; Smirnoff, Nicholas; Nielsen, Lene R; McKinney, Lea V; Kjær, Erik D; Buggs, Richard J A; Studholme, David J; Grant, Murray
2017-12-19
European common ash, Fraxinus excelsior, is currently threatened by Ash dieback (ADB) caused by the fungus, Hymenoscyphus fraxineus. To detect and identify metabolites that may be products of pathways important in contributing to resistance against H. fraxineus, we performed untargeted metabolomic profiling on leaves from five high-susceptibility and five low-susceptibility F. excelsior individuals identified during Danish field trials. We describe in this study, two datasets. The first is untargeted LC-MS metabolomics raw data from ash leaves with high-susceptibility and low-susceptibility to ADB in positive and negative mode. These data allow the application of peak picking, alignment, gap-filling and retention-time correlation analyses to be performed in alternative ways. The second, a processed dataset containing abundances of aligned features across all samples enables further mining of the data. Here we illustrate the utility of this dataset which has previously been used to identify putative iridoid glycosides, well known anti-herbivory terpenoid derivatives, and show differential abundance in tolerant and susceptible ash samples.
Sambles, Christine M.; Salmon, Deborah L.; Florance, Hannah; Howard, Thomas P.; Smirnoff, Nicholas; Nielsen, Lene R.; McKinney, Lea V.; Kjær, Erik D.; Buggs, Richard J. A.; Studholme, David J.; Grant, Murray
2017-01-01
European common ash, Fraxinus excelsior, is currently threatened by Ash dieback (ADB) caused by the fungus, Hymenoscyphus fraxineus. To detect and identify metabolites that may be products of pathways important in contributing to resistance against H. fraxineus, we performed untargeted metabolomic profiling on leaves from five high-susceptibility and five low-susceptibility F. excelsior individuals identified during Danish field trials. We describe in this study, two datasets. The first is untargeted LC-MS metabolomics raw data from ash leaves with high-susceptibility and low-susceptibility to ADB in positive and negative mode. These data allow the application of peak picking, alignment, gap-filling and retention-time correlation analyses to be performed in alternative ways. The second, a processed dataset containing abundances of aligned features across all samples enables further mining of the data. Here we illustrate the utility of this dataset which has previously been used to identify putative iridoid glycosides, well known anti-herbivory terpenoid derivatives, and show differential abundance in tolerant and susceptible ash samples. PMID:29257137
B{yields}X{sub s{gamma}} rate and CP asymmetry within the aligned two-Higgs-doublet model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jung, Martin; Pich, Antonio; Tuzon, Paula
In the two-Higgs-doublet model the alignment of the Yukawa matrices in flavor space guarantees the absence of flavor-changing neutral currents at tree level, while introducing new sources for CP violation parametrized in a very economical way [Antonio Pich and Paula Tuzon, Phys. Rev. D 80, 091702 (2009)]. This implies a potentially large influence in a number of processes, b{yields}s{gamma} being a prominent example where rather high experimental and theoretical precision meet. We analyze the CP rate asymmetry in this inclusive decay and determine the resulting constraints on the model parameters. We demonstrate the compatibility with previously obtained limits [Martin Jung,more » Antonio Pich, and Paula Tuzon, J. High Energy Phys. 11 (2010) 003]. Moreover, we extend the phenomenological analysis of the branching ratio, and examine the influence of resulting correlations on the like-sign dimuon charge asymmetry in B decays.« less
Fairweather-Tait, Susan J
2011-08-01
Dietary reference values for micronutrients vary considerably among countries, and harmonization is needed to facilitate nutrition policy and public health strategies at the European and global levels. The EURopean micronutrient RECommendations Aligned (EURRECA) Network of Excellence is developing generic instruments for systematically deriving and updating micronutrient reference values and dietary recommendations. These include best practice guidelines, interlinked web pages, online databases, and decision trees. Journal supplements have been published on micronutrient intakes and status, and an ongoing activity of EURRECA is the completion of systematic reviews on associations between intakes, status, and various health outcomes for priority micronutrients (ie, iron, zinc, folate, vitamin B-12, and iodine), which were selected by using a triage technique. Future activities include meta-analyses to identify dose-response relations and the variability, factorial estimates of requirements, bioavailability from whole diets, effects of genotype, and modeling techniques for addressing dietary recommendations for combinations of nutrients with common health endpoints.
Power law tails in phylogenetic systems.
Qin, Chongli; Colwell, Lucy J
2018-01-23
Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
Nonbinary Tree-Based Phylogenetic Networks.
Jetten, Laura; van Iersel, Leo
2018-01-01
Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can, for example, represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.
More on the Best Evolutionary Rate for Phylogenetic Analysis
Massingham, Tim; Goldman, Nick
2017-01-01
Abstract The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations. We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well. Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data. PMID:28595363
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.
Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K
2012-04-01
The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
Dimensional Reduction for the General Markov Model on Phylogenetic Trees.
Sumner, Jeremy G
2017-03-01
We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.
Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards
2006-01-01
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...