computing pangenome trees: Topics by Science.gov

Sample records for computing pangenome trees

GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas.

PubMed

Vinuesa, Pablo; Ochoa-Sánchez, Luz E; Contreras-Moreira, Bruno

2018-01-01

The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia , demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at https://github.com/vinuesa/get_phylomarkers. A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at https://hub.docker.com/r/csicunam/get_homologues/, which can be easily run on any platform.
Endosymbiotic gene transfer from prokaryotic pangenomes: Inherited chimerism in eukaryotes.

PubMed

Ku, Chuan; Nelson-Sathi, Shijulal; Roettger, Mayo; Garg, Sriram; Hazkani-Covo, Einat; Martin, William F

2015-08-18

Endosymbiotic theory in eukaryotic-cell evolution rests upon a foundation of three cornerstone partners--the plastid (a cyanobacterium), the mitochondrion (a proteobacterium), and its host (an archaeon)--and carries a corollary that, over time, the majority of genes once present in the organelle genomes were relinquished to the chromosomes of the host (endosymbiotic gene transfer). However, notwithstanding eukaryote-specific gene inventions, single-gene phylogenies have never traced eukaryotic genes to three single prokaryotic sources, an issue that hinges crucially upon factors influencing phylogenetic inference. In the age of genomes, single-gene trees, once used to test the predictions of endosymbiotic theory, now spawn new theories that stand to eventually replace endosymbiotic theory with descriptive, gene tree-based variants featuring supernumerary symbionts: prokaryotic partners distinct from the cornerstone trio and whose existence is inferred solely from single-gene trees. We reason that the endosymbiotic ancestors of mitochondria and chloroplasts brought into the eukaryotic--and plant and algal--lineage a genome-sized sample of genes from the proteobacterial and cyanobacterial pangenomes of their respective day and that, even if molecular phylogeny were artifact-free, sampling prokaryotic pangenomes through endosymbiotic gene transfer would lead to inherited chimerism. Recombination in prokaryotes (transduction, conjugation, transformation) differs from recombination in eukaryotes (sex). Prokaryotic recombination leads to pangenomes, and eukaryotic recombination leads to vertical inheritance. Viewed from the perspective of endosymbiotic theory, the critical transition at the eukaryote origin that allowed escape from Muller's ratchet--the origin of eukaryotic recombination, or sex--might have required surprisingly little evolutionary innovation.
Computational pan-genomics: status, promises and challenges.

PubMed

2018-01-01

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains. © The Author 2016. Published by Oxford University Press.
Clustering of Pan- and Core-genome of Lactobacillus provides Novel Evolutionary Insights for Differentiation.

PubMed

Inglin, Raffael C; Meile, Leo; Stevens, Marc J A

2018-04-24

Bacterial taxonomy aims to classify bacteria based on true evolutionary events and relies on a polyphasic approach that includes phenotypic, genotypic and chemotaxonomic analyses. Until now, complete genomes are largely ignored in taxonomy. The genus Lactobacillus consists of 173 species and many genomes are available to study taxonomy and evolutionary events. We analyzed and clustered 98 completely sequenced genomes of the genus Lactobacillus and 234 draft genomes of 5 different Lactobacillus species, i.e. L. reuteri, L. delbrueckii, L. plantarum, L. rhamnosus and L. helveticus. The core-genome of the genus Lactobacillus contains 266 genes and the pan-genome 20'800 genes. Clustering of the Lactobacillus pan- and core-genome resulted in two highly similar trees. This shows that evolutionary history is traceable in the core-genome and that clustering of the core-genome is sufficient to explore relationships. Clustering of core- and pan-genomes at species' level resulted in similar trees as well. Detailed analyses of the core-genomes showed that the functional class "genetic information processing" is conserved in the core-genome but that "signaling and cellular processes" is not. The latter class encodes functions that are involved in environmental interactions. Evolution of lactobacilli seems therefore directed by the environment. The type species L. delbrueckii was analyzed in detail and its pan-genome based tree contained two major clades whose members contained different genes yet identical functions. In addition, evidence for horizontal gene transfer between strains of L. delbrueckii, L. plantarum, and L. rhamnosus, and between species of the genus Lactobacillus is presented. Our data provide evidence for evolution of some lactobacilli according to a parapatric-like model for species differentiation. Core-genome trees are useful to detect evolutionary relationships in lactobacilli and might be useful in taxonomic analyses. Lactobacillus' evolution is directed by the environment and HGT.
EUPAN enables pan-genome studies of a large number of eukaryotic genomes.

PubMed

Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun

2017-08-01

Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
PanACEA: a bioinformatics tool for the exploration and visualization of bacterial pan-chromosomes.

PubMed

Clarke, Thomas H; Brinkac, Lauren M; Inman, Jason M; Sutton, Granger; Fouts, Derrick E

2018-06-27

Bacterial pan-genomes, comprised of conserved and variable genes across multiple sequenced bacterial genomes, allow for identification of genomic regions that are phylogenetically discriminating or functionally important. Pan-genomes consist of large amounts of data, which can restrict researchers ability to locate and analyze these regions. Multiple software packages are available to visualize pan-genomes, but currently their ability to address these concerns are limited by using only pre-computed data sets, prioritizing core over variable gene clusters, or by not accounting for pan-chromosome positioning in the viewer. We introduce PanACEA (Pan-genome Atlas with Chromosome Explorer and Analyzer), which utilizes locally-computed interactive web-pages to view ordered pan-genome data. It consists of multi-tiered, hierarchical display pages that extend from pan-chromosomes to both core and variable regions to single genes. Regions and genes are functionally annotated to allow for rapid searching and visual identification of regions of interest with the option that user-supplied genomic phylogenies and metadata can be incorporated. PanACEA's memory and time requirements are within the capacities of standard laptops. The capability of PanACEA as a research tool is demonstrated by highlighting a variable region important in differentiating strains of Enterobacter hormaechei. PanACEA can rapidly translate the results of pan-chromosome programs into an intuitive and interactive visual representation. It will empower researchers to visually explore and identify regions of the pan-chromosome that are most biologically interesting, and to obtain publication quality images of these regions.
PanWeb: A web interface for pan-genomic analysis.

PubMed

Pantoja, Yan; Pinheiro, Kenny; Veras, Allan; Araújo, Fabrício; Lopes de Sousa, Ailton; Guimarães, Luis Carlos; Silva, Artur; Ramos, Rommel T J

2017-01-01

With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of deposited prokaryote genomes has enabled the investigation of several genomic characteristics that are intrinsic to certain species. Thus, a new approach to comparative genomics, termed pan-genomics, was developed. In pan-genomics, various organisms of the same species or genus are compared. Currently, there are many tools that can perform pan-genomic analyses, such as PGAP (Pan-Genome Analysis Pipeline), Panseq (Pan-Genome Sequence Analysis Program) and PGAT (Prokaryotic Genome Analysis Tool). Among these software tools, PGAP was developed in the Perl scripting language and its reliance on UNIX platform terminals and its requirement for an extensive parameterized command line can become a problem for users without previous computational knowledge. Thus, the aim of this study was to develop a web application, known as PanWeb, that serves as a graphical interface for PGAP. In addition, using the output files of the PGAP pipeline, the application generates graphics using custom-developed scripts in the R programming language. PanWeb is freely available at http://www.computationalbiology.ufpa.br/panweb.
ITEP: an integrated toolkit for exploration of microbial pan-genomes.

PubMed

Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D

2014-01-03

Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Pan-genome and phylogeny of Bacillus cereus sensu lato.

PubMed

Bazinet, Adam L

2017-08-02

Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.
Pan-genome analysis of Aeromonas hydrophila, Aeromonas veronii and Aeromonas caviae indicates phylogenomic diversity and greater pathogenic potential for Aeromonas hydrophila.

PubMed

Ghatak, Sandeep; Blom, Jochen; Das, Samir; Sanjukta, Rajkumari; Puro, Kekungu; Mawlong, Michael; Shakuntala, Ingudam; Sen, Arnab; Goesmann, Alexander; Kumar, Ashok; Ngachan, S V

2016-07-01

Aeromonas species are important pathogens of fishes and aquatic animals capable of infecting humans and other animals via food. Due to the paucity of pan-genomic studies on aeromonads, the present study was undertaken to analyse the pan-genome of three clinically important Aeromonas species (A. hydrophila, A. veronii, A. caviae). Results of pan-genome analysis revealed an open pan-genome for all three species with pan-genome sizes of 9181, 7214 and 6884 genes for A. hydrophila, A. veronii and A. caviae, respectively. Core-genome: pan-genome ratio (RCP) indicated greater genomic diversity for A. hydrophila and interestingly RCP emerged as an effective indicator to gauge genomic diversity which could possibly be extended to other organisms too. Phylogenomic network analysis highlighted the influence of homologous recombination and lateral gene transfer in the evolution of Aeromonas spp. Prediction of virulence factors indicated no significant difference among the three species though analysis of pathogenic potential and acquired antimicrobial resistance genes revealed greater hazards from A. hydrophila. In conclusion, the present study highlighted the usefulness of whole genome analyses to infer evolutionary cues for Aeromonas species which indicated considerable phylogenomic diversity for A. hydrophila and hitherto unknown genomic evidence for pathogenic potential of A. hydrophila compared to A. veronii and A. caviae.
Initiation of a pan-genomic research project for Xylella fastidiosa

USDA-ARS?s Scientific Manuscript database

Differences in genomic structure and nucleotide polymorphism among strains form the genetic basis for adaptability of a bacterial species. This can be described by a bacterial pan-genome, which is defined as the full complement of genes in all strains of a species. The pan-genome is composed of a "c...
New Insights into the Diversity of the Genus Faecalibacterium.

PubMed

Benevides, Leandro; Burman, Sriti; Martin, Rebeca; Robert, Véronique; Thomas, Muriel; Miquel, Sylvie; Chain, Florian; Sokol, Harry; Bermudez-Humaran, Luis G; Morrison, Mark; Langella, Philippe; Azevedo, Vasco A; Chatel, Jean-Marc; Soares, Siomar

2017-01-01

Faecalibacterium prausnitzii is a commensal bacterium, ubiquitous in the gastrointestinal tracts of animals and humans. This species is a functionally important member of the microbiota and studies suggest it has an impact on the physiology and health of the host. F. prausnitzii is the only identified species in the genus Faecalibacterium , but a recent study clustered strains of this species in two different phylogroups. Here, we propose the existence of distinct species in this genus through the use of comparative genomics. Briefly, we performed analyses of 16S rRNA gene phylogeny, phylogenomics, whole genome Multi-Locus Sequence Typing (wgMLST), Average Nucleotide Identity (ANI), gene synteny, and pangenome to better elucidate the phylogenetic relationships among strains of Faecalibacterium . For this, we used 12 newly sequenced, assembled, and curated genomes of F. prausnitzii , which were isolated from feces of healthy volunteers from France and Australia, and combined these with published data from 5 strains downloaded from public databases. The phylogenetic analysis of the 16S rRNA sequences, together with the wgMLST profiles and a phylogenomic tree based on comparisons of genome similarity, all supported the clustering of Faecalibacterium strains in different genospecies. Additionally, the global analysis of gene synteny among all strains showed a highly fragmented profile, whereas the intra-cluster analyses revealed larger and more conserved collinear blocks. Finally, ANI analysis substantiated the presence of three distinct clusters-A, B, and C-composed of five, four, and four strains, respectively. The pangenome analysis of each cluster corroborated the classification of these clusters into three distinct species, each containing less variability than that found within the global pangenome of all strains. Here, we propose that comparison of pangenome subsets and their associated α values may be used as an alternative approach, together with ANI, in the in silico classification of new species. Altogether, our results provide evidence not only for the reconsideration of the phylogenetic and genomic relatedness among strains currently assigned to F. prausnitzii , but also the need for lineage (strain-based) differentiation of this taxon to better define how specific members might be associated with positive or negative host interactions.
Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors

PubMed Central

Kiu, Raymond; Caim, Shabhonam; Alexander, Sarah; Pachori, Purnima; Hall, Lindsay J.

2017-01-01

Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an “open” pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene) in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens-associated exotoxins genes including α-toxin (plc), enterotoxin (cpe), and Perfringolysin O (pfo or pfoA), although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56) of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes (tet) and anti-defensins genes (mprF) were consistently detected in silico (tet: 75%; mprF: 100%). However, pre-antibiotic era strain genomes did not encode for tet, thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen. PMID:29312194
Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors.

PubMed

Kiu, Raymond; Caim, Shabhonam; Alexander, Sarah; Pachori, Purnima; Hall, Lindsay J

2017-01-01

Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an "open" pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene) in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens -associated exotoxins genes including α-toxin ( plc ), enterotoxin ( cpe ), and Perfringolysin O ( pfo or pfoA ), although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56) of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes ( tet ) and anti-defensins genes ( mprF ) were consistently detected in silico ( tet : 75%; mprF : 100%). However, pre-antibiotic era strain genomes did not encode for tet , thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen.
Genome-Wide Analysis Provides Evidence on the Genetic Relatedness of the Emergent Xylella fastidiosa Genotype in Italy to Isolates from Central America.

PubMed

Giampetruzzi, Annalisa; Saponari, Maria; Loconsole, Giuliana; Boscia, Donato; Savino, Vito Nicola; Almeida, Rodrigo P P; Zicca, Stefania; Landa, Blanca B; Chacón-Diaz, Carlos; Saldarelli, Pasquale

2017-07-01

Xylella fastidiosa is a plant-pathogenic bacterium recently introduced in Europe that is causing decline in olive trees in the South of Italy. Genetic studies have consistently shown that the bacterial genotype recovered from infected olive trees belongs to the sequence type ST53 within subspecies pauca. This genotype, ST53, has also been reported to occur in Costa Rica. The ancestry of ST53 was recently clarified, showing it contains alleles that are monophyletic with those of subsp. pauca in South America. To more robustly determine the phylogenetic placement of ST53 within X. fastidiosa, we performed a comparative analysis based on single nucleotide polymorphisms (SNPs) and the study of the pan-genome of the 27 currently public available whole genome sequences of X. fastidiosa. The resulting maximum-parsimony and maximum likelihood trees constructed using the SNPs and the pan-genome analysis are consistent with previously described X. fastidiosa taxonomy, distinguishing the subsp. fastidiosa, multiplex, pauca, sandyi, and morus. Within the subsp. pauca, the Italian and three Costa Rican isolates, all belonging to ST53, formed a compact phylotype in a clade divergent from the South American pauca isolates, also distinct from the recently described coffee isolate CFBP8072 imported into Europe from Ecuador. These findings were also supported by the gene characterization of a conjugative plasmid shared by all the four ST53 isolates. Furthermore, isolates of the ST53 clade possess an exclusive locus encoding a putative ATP-binding protein belonging to the family of histidine kinase-like ATPase gene, which is not present in isolates from the subspecies multiplex, sandyi, and pauca, but was detected in ST21 isolates of the subspecies fastidiosa from Costa Rica. The clustering and distinctiveness of the ST53 isolates supports the hypothesis of their common origin, and the limited genetic diversity among these isolates suggests this is an emerging clade within subsp. pauca.
The pangenome of the genus Clostridium.

PubMed

Udaondo, Zulema; Duque, Estrella; Ramos, Juan-Luis

2017-07-01

The pangenome for the genus Clostridium sensu stricto, which was obtained using highly curated and annotated genomes from 16 species is presented; some of these cause disease, while others are used for the production of added-value chemicals. Multilocus sequencing analysis revealed that species of this genus group into at least two clades that include non-pathogenic and pathogenic strains, suggesting that pathogenicity is dispersed across the phylogenetic tree. The core genome of the genus includes 546 protein families, which mainly comprise those involved in protein translation and DNA repair. The GS-GOGAT may represent the central pathway for generating organic nitrogen from inorganic nitrogen sources. Glycerol and glucose metabolism genes are well represented in the core genome together with a set of energy conservation systems. A metabolic network comprising proteins/enzymes, RNAs and metabolites, whose topological structure is a non-random and scale-free network with hierarchically structured modules was built. These modules shed light on the interactions between RNAs, proteins and metabolites, revealing biological features of transcription and translation, cell wall biosynthesis, C1 metabolism and N metabolism. Network analysis identified four nodes that function as hubs and bottlenecks, namely, coenzyme A, HPr kinases, S-adenosylmethionine and the ribonuclease P-protein, suggesting pivotal roles for them in Clostridium. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
RPAN: rice pan-genome browser for ∼3000 rice genomes.

PubMed

Sun, Chen; Hu, Zhiqiang; Zheng, Tianqing; Lu, Kuangchen; Zhao, Yue; Wang, Wensheng; Shi, Jianxin; Wang, Chunchao; Lu, Jinyuan; Zhang, Dabing; Li, Zhikang; Wei, Chaochun

2017-01-25

A pan-genome is the union of the gene sets of all the individuals of a clade or a species and it provides a new dimension of genome complexity with the presence/absence variations (PAVs) of genes among these genomes. With the progress of sequencing technologies, pan-genome study is becoming affordable for eukaryotes with large-sized genomes. The Asian cultivated rice, Oryza sativa L., is one of the major food sources for the world and a model organism in plant biology. Recently, the 3000 Rice Genome Project (3K RGP) sequenced more than 3000 rice genomes with a mean sequencing depth of 14.3×, which provided a tremendous resource for rice research. In this paper, we present a genome browser, Rice Pan-genome Browser (RPAN), as a tool to search and visualize the rice pan-genome derived from 3K RGP. RPAN contains a database of the basic information of 3010 rice accessions, including genomic sequences, gene annotations, PAV information and gene expression data of the rice pan-genome. At least 12 000 novel genes absent in the reference genome were included. RPAN also provides multiple search and visualization functions. RPAN can be a rich resource for rice biology and rice breeding. It is available at http://cgm.sjtu.edu.cn/3kricedb/ or http://www.rmbreeding.cn/pan3k. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
NGSPanPipe: A Pipeline for Pan-genome Identification in Microbial Strains from Experimental Reads.

PubMed

Kulsum, Umay; Kapil, Arti; Singh, Harpreet; Kaur, Punit

2018-01-01

Recent advancements in sequencing technologies have decreased both time span and cost for sequencing the whole bacterial genome. High-throughput Next-Generation Sequencing (NGS) technology has led to the generation of enormous data concerning microbial populations publically available across various repositories. As a consequence, it has become possible to study and compare the genomes of different bacterial strains within a species or genus in terms of evolution, ecology and diversity. Studying the pan-genome provides insights into deciphering microevolution, global composition and diversity in virulence and pathogenesis of a species. It can also assist in identifying drug targets and proposing vaccine candidates. The effective analysis of these large genome datasets necessitates the development of robust tools. Current methods to develop pan-genome do not support direct input of raw reads from the sequencer machine but require preprocessing of reads as an assembled protein/gene sequence file or the binary matrix of orthologous genes/proteins. We have designed an easy-to-use integrated pipeline, NGSPanPipe, which can directly identify the pan-genome from short reads. The output from the pipeline is compatible with other pan-genome analysis tools. We evaluated our pipeline with other methods for developing pan-genome, i.e. reference-based assembly and de novo assembly using simulated reads of Mycobacterium tuberculosis. The single script pipeline (pipeline.pl) is applicable for all bacterial strains. It integrates multiple in-house Perl scripts and is freely accessible from https://github.com/Biomedinformatics/NGSPanPipe .
Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts.

PubMed

De Maayer, Pieter; Chan, Wai Yin; Rubagotti, Enrico; Venter, Stephanus N; Toth, Ian K; Birch, Paul R J; Coutinho, Teresa A

2014-05-27

Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicated in diseases of a broad range of plant hosts and humans. By analysing the pan-genome of eight sequenced P. ananatis strains isolated from different sources we identified factors potentially underlying its ability to colonize and interact with hosts in both the plant and animal Kingdoms. The pan-genome of the eight compared P. ananatis strains consisted of a core genome comprised of 3,876 protein coding sequences (CDSs) and a sizeable accessory genome consisting of 1,690 CDSs. We estimate that ~106 unique CDSs would be added to the pan-genome with each additional P. ananatis genome sequenced in the future. The accessory fraction is derived mainly from integrated prophages and codes mostly for proteins of unknown function. Comparison of the translated CDSs on the P. ananatis pan-genome with the proteins encoded on all sequenced bacterial genomes currently available revealed that P. ananatis carries a number of CDSs with orthologs restricted to bacteria associated with distinct hosts, namely plant-, animal- and insect-associated bacteria. These CDSs encode proteins with putative roles in transport and metabolism of carbohydrate and amino acid substrates, adherence to host tissues, protection against plant and animal defense mechanisms and the biosynthesis of potential pathogenicity determinants including insecticidal peptides, phytotoxins and type VI secretion system effectors. P. ananatis has an 'open' pan-genome typical of bacterial species that colonize several different environments. The pan-genome incorporates a large number of genes encoding proteins that may enable P. ananatis to colonize, persist in and potentially cause disease symptoms in a wide range of plant and animal hosts.
Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gordon, Sean P.; Contreras-Moreira, Bruno; Woods, Daniel P.

While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely tomore » be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.« less

Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure.

PubMed

Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P

2017-12-19

While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.
Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure

DOE PAGES

Gordon, Sean P.; Contreras-Moreira, Bruno; Woods, Daniel P.; ...

2017-12-19

While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely tomore » be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.« less
Design and verification of a pangenome microarray oligonucleotide probe set for Dehalococcoides spp.

PubMed

Hug, Laura A; Salehi, Maryam; Nuin, Paulo; Tillier, Elisabeth R; Edwards, Elizabeth A

2011-08-01

Dehalococcoides spp. are an industrially relevant group of Chloroflexi bacteria capable of reductively dechlorinating contaminants in groundwater environments. Existing Dehalococcoides genomes revealed a high level of sequence identity within this group, including 98 to 100% 16S rRNA sequence identity between strains with diverse substrate specificities. Common molecular techniques for identification of microbial populations are often not applicable for distinguishing Dehalococcoides strains. Here we describe an oligonucleotide microarray probe set designed based on clustered Dehalococcoides genes from five different sources (strain DET195, CBDB1, BAV1, and VS genomes and the KB-1 metagenome). This "pangenome" probe set provides coverage of core Dehalococcoides genes as well as strain-specific genes while optimizing the potential for hybridization to closely related, previously unknown Dehalococcoides strains. The pangenome probe set was compared to probe sets designed independently for each of the five Dehalococcoides strains. The pangenome probe set demonstrated better predictability and higher detection of Dehalococcoides genes than strain-specific probe sets on nontarget strains with <99% average nucleotide identity. An in silico analysis of the expected probe hybridization against the recently released Dehalococcoides strain GT genome and additional KB-1 metagenome sequence data indicated that the pangenome probe set performs more robustly than the combined strain-specific probe sets in the detection of genes not included in the original design. The pangenome probe set represents a highly specific, universal tool for the detection and characterization of Dehalococcoides from contaminated sites. It has the potential to become a common platform for Dehalococcoides-focused research, allowing meaningful comparisons between microarray experiments regardless of the strain examined.
PanFP: Pangenome-based functional profiles for microbial communities

DOE PAGES

Jun, Se -Ran; Hauser, Loren John; Schadt, Christopher Warren; ...

2015-09-26

For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost effective way to screen samples of interestmore » for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. As a result, we present a computational method called pangenome based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8 0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed reference OTU picking strategies against specific reference sequence databases. In conclusion, we developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub.« less
PanFP: pangenome-based functional profiles for microbial communities.

PubMed

Jun, Se-Ran; Robeson, Michael S; Hauser, Loren J; Schadt, Christopher W; Gorin, Andrey A

2015-09-26

For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost-effective way to screen samples of interest for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. We present a computational method called pangenome-based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU's taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome's functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8-0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed-reference OTU picking strategies against specific reference sequence databases. We developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub ( https://github.com/srjun/PanFP ).
Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria.

PubMed

Thorpe, Harry A; Bayliss, Sion C; Sheppard, Samuel K; Feil, Edward J

2018-04-01

The concept of the "pan-genome," which refers to the total complement of genes within a given sample or species, is well established in bacterial genomics. Rapid and scalable pipelines are available for managing and interpreting pan-genomes from large batches of annotated assemblies. However, despite overwhelming evidence that variation in intergenic regions in bacteria can directly influence phenotypes, most current approaches for analyzing pan-genomes focus exclusively on protein-coding sequences. To address this we present Piggy, a novel pipeline that emulates Roary except that it is based only on intergenic regions. A key utility provided by Piggy is the detection of highly divergent ("switched") intergenic regions (IGRs) upstream of genes. We demonstrate the use of Piggy on large datasets of clinically important lineages of Staphylococcus aureus and Escherichia coli. For S. aureus, we show that highly divergent (switched) IGRs are associated with differences in gene expression and we establish a multilocus reference database of IGR alleles (igMLST; implemented in BIGSdb).
Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

PubMed Central

Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

2016-01-01

Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038
Genomic taxonomy of vibrios

PubMed Central

Thompson, Cristiane C; Vicente, Ana Carolina P; Souza, Rangel C; Vasconcelos, Ana Tereza R; Vesth, Tammi; Alves, Nelson; Ussery, David W; Iida, Tetsuya; Thompson, Fabiano L

2009-01-01

Background Vibrio taxonomy has been based on a polyphasic approach. In this study, we retrieve useful taxonomic information (i.e. data that can be used to distinguish different taxonomic levels, such as species and genera) from 32 genome sequences of different vibrio species. We use a variety of tools to explore the taxonomic relationship between the sequenced genomes, including Multilocus Sequence Analysis (MLSA), supertrees, Average Amino Acid Identity (AAI), genomic signatures, and Genome BLAST atlases. Our aim is to analyse the usefulness of these tools for species identification in vibrios. Results We have generated four new genome sequences of three Vibrio species, i.e., V. alginolyticus 40B, V. harveyi-like 1DA3, and V. mimicus strains VM573 and VM603, and present a broad analyses of these genomes along with other sequenced Vibrio species. The genome atlas and pangenome plots provide a tantalizing image of the genomic differences that occur between closely related sister species, e.g. V. cholerae and V. mimicus. The vibrio pangenome contains around 26504 genes. The V. cholerae core genome and pangenome consist of 1520 and 6923 genes, respectively. Pangenomes might allow different strains of V. cholerae to occupy different niches. MLSA and supertree analyses resulted in a similar phylogenetic picture, with a clear distinction of four groups (Vibrio core group, V. cholerae-V. mimicus, Aliivibrio spp., and Photobacterium spp.). A Vibrio species is defined as a group of strains that share > 95% DNA identity in MLSA and supertree analysis, > 96% AAI, ≤ 10 genome signature dissimilarity, and > 61% proteome identity. Strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA and supertree. Conclusion The combination of different analytical and bioinformatics tools will enable the most accurate species identification through genomic computational analysis. This endeavour will culminate in the birth of the online genomic taxonomy whereby researchers and end-users of taxonomy will be able to identify their isolates through a web-based server. This novel approach to microbial systematics will result in a tremendous advance concerning biodiversity discovery, description, and understanding. PMID:19860885
SOPanG: online text searching over a pan-genome.

PubMed

Cislak, Aleksander; Grabowski, Szymon; Holub, Jan

2018-06-22

The many thousands of high-quality genomes available nowadays imply a shift from single genome to pan-genomic analyses. A basic algorithmic building brick for such a scenario is online search over a collection of similar texts, a problem with surprisingly few solutions presented so far. We present SOPanG, a simple tool for exact pattern matching over an elastic-degenerate string, a recently proposed simplified model for the pan-genome. Thanks to bit-parallelism, it achieves pattern matching speeds above 400MB/s, more than an order of magnitude higher than of other software. SOPanG is available for free from: https://github.com/MrAlexSee/sopang. Supplementary data are available at Bioinformatics online.
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

PubMed

Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

2015-12-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Calculating orthologs in bacteria and Archaea: a divide and conquer approach.

PubMed

Halachev, Mihail R; Loman, Nicholas J; Pallen, Mark J

2011-01-01

Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.
Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups.

PubMed

Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F; Espinoza-Rojas, Daniela A; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G; Figueroa, Jaime E; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J

2017-01-01

Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis , functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection.
Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups

PubMed Central

Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F.; Espinoza-Rojas, Daniela A.; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G.; Figueroa, Jaime E.; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J.

2017-01-01

Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection. PMID:29164068
Inter- and intra-specific pan-genomes of Borrelia burgdorferi sensu lato: genome stability and adaptive radiation

PubMed Central

2013-01-01

Background Lyme disease is caused by spirochete bacteria from the Borrelia burgdorferi sensu lato (B. burgdorferi s.l.) species complex. To reconstruct the evolution of B. burgdorferi s.l. and identify the genomic basis of its human virulence, we compared the genomes of 23 B. burgdorferi s.l. isolates from Europe and the United States, including B. burgdorferi sensu stricto (B. burgdorferi s.s., 14 isolates), B. afzelii (2), B. garinii (2), B. “bavariensis” (1), B. spielmanii (1), B. valaisiana (1), B. bissettii (1), and B. “finlandensis” (1). Results Robust B. burgdorferi s.s. and B. burgdorferi s.l. phylogenies were obtained using genome-wide single-nucleotide polymorphisms, despite recombination. Phylogeny-based pan-genome analysis showed that the rate of gene acquisition was higher between species than within species, suggesting adaptive speciation. Strong positive natural selection drives the sequence evolution of lipoproteins, including chromosomally-encoded genes 0102 and 0404, cp26-encoded ospC and b08, and lp54-encoded dbpA, a07, a22, a33, a53, a65. Computer simulations predicted rapid adaptive radiation of genomic groups as population size increases. Conclusions Intra- and inter-specific pan-genome sizes of B. burgdorferi s.l. expand linearly with phylogenetic diversity. Yet gene-acquisition rates in B. burgdorferi s.l. are among the lowest in bacterial pathogens, resulting in high genome stability and few lineage-specific genes. Genome adaptation of B. burgdorferi s.l. is driven predominantly by copy-number and sequence variations of lipoprotein genes. New genomic groups are likely to emerge if the current trend of B. burgdorferi s.l. population expansion continues. PMID:24112474
The pangenome of hexaploid bread wheat.

PubMed

Montenegro, Juan D; Golicz, Agnieszka A; Bayer, Philipp E; Hurgobin, Bhavna; Lee, HueyTyng; Chan, Chon-Kit Kenneth; Visendi, Paul; Lai, Kaitao; Doležel, Jaroslav; Batley, Jacqueline; Edwards, David

2017-06-01

There is an increasing understanding that variation in gene presence-absence plays an important role in the heritability of agronomic traits; however, there have been relatively few studies on variation in gene presence-absence in crop species. Hexaploid wheat is one of the most important food crops in the world and intensive breeding has reduced the genetic diversity of elite cultivars. Major efforts have produced draft genome assemblies for the cultivar Chinese Spring, but it is unknown how well this represents the genome diversity found in current modern elite cultivars. In this study we build an improved reference for Chinese Spring and explore gene diversity across 18 wheat cultivars. We predict a pangenome size of 140 500 ± 102 genes, a core genome of 81 070 ± 1631 genes and an average of 128 656 genes in each cultivar. Functional annotation of the variable gene set suggests that it is enriched for genes that may be associated with important agronomic traits. In addition to variation in gene presence, more than 36 million intervarietal single nucleotide polymorphisms were identified across the pangenome. This study of the wheat pangenome provides insight into genome diversity in elite wheat as a basis for genomics-based improvement of this important crop. A wheat pangenome, GBrowse, is available at http://appliedbioinformatics.com.au/cgi-bin/gb2/gbrowse/WheatPan/, and data are available to download from http://wheatgenome.info/wheat_genome_databases.php. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

PubMed

Spring-Pearson, Senanu M; Stone, Joshua K; Doyle, Adina; Allender, Christopher J; Okinaka, Richard T; Mayo, Mark; Broomall, Stacey M; Hill, Jessica M; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; McNew, Lauren A; Rosenzweig, C Nicole; Gibbons, Henry S; Currie, Bart J; Wagner, David M; Keim, Paul; Tuanyok, Apichai

2015-01-01

The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.
The impact of serotype-specific vaccination on phylodynamic parameters of Streptococcus pneumoniae and the pneumococcal pan-genome.

PubMed

Azarian, Taj; Grant, Lindsay R; Arnold, Brian J; Hammitt, Laura L; Reid, Raymond; Santosham, Mathuram; Weatherholtz, Robert; Goklish, Novalene; Thompson, Claudette M; Bentley, Stephen D; O'Brien, Katherine L; Hanage, William P; Lipsitch, Marc

2018-04-01

In the United States, the introduction of the heptavalent pneumococcal conjugate vaccine (PCV) largely eliminated vaccine serotypes (VT); non-vaccine serotypes (NVT) subsequently increased in carriage and disease. Vaccination also disrupts the composition of the pneumococcal pangenome, which includes mobile genetic elements and polymorphic non-capsular antigens important for virulence, transmission, and pneumococcal ecology. Antigenic proteins are of interest for future vaccines; yet, little is known about how the they are affected by PCV use. To investigate the evolutionary impact of vaccination, we assessed recombination, evolution, and pathogen demographic history of 937 pneumococci collected from 1998-2012 among Navajo and White Mountain Apache Native American communities. We analyzed changes in the pneumococcal pangenome, focusing on metabolic loci and 19 polymorphic protein antigens. We found the impact of PCV on the pneumococcal population could be observed in reduced diversity, a smaller pangenome, and changing frequencies of accessory clusters of orthologous groups (COGs). Post-PCV7, diversity rebounded through clonal expansion of NVT lineages and inferred in-migration of two previously unobserved lineages. Accessory COGs frequencies trended toward pre-PCV7 values with increasing time since vaccine introduction. Contemporary frequencies of protein antigen variants are better predicted by pre-PCV7 values (1998-2000) than the preceding period (2006-2008), suggesting balancing selection may have acted in maintaining variant frequencies in this population. Overall, we present the largest genomic analysis of pneumococcal carriage in the United States to date, which includes a snapshot of a true vaccine-naïve community prior to the introduction of PCV7. These data improve our understanding of pneumococcal evolution and emphasize the need to consider pangenome composition when inferring the impact of vaccination and developing future protein-based pneumococcal vaccines.
The impact of serotype-specific vaccination on phylodynamic parameters of Streptococcus pneumoniae and the pneumococcal pan-genome

PubMed Central

Hammitt, Laura L.; Santosham, Mathuram; Goklish, Novalene; Thompson, Claudette M.; Bentley, Stephen D.; O’Brien, Katherine L.

2018-01-01

In the United States, the introduction of the heptavalent pneumococcal conjugate vaccine (PCV) largely eliminated vaccine serotypes (VT); non-vaccine serotypes (NVT) subsequently increased in carriage and disease. Vaccination also disrupts the composition of the pneumococcal pangenome, which includes mobile genetic elements and polymorphic non-capsular antigens important for virulence, transmission, and pneumococcal ecology. Antigenic proteins are of interest for future vaccines; yet, little is known about how the they are affected by PCV use. To investigate the evolutionary impact of vaccination, we assessed recombination, evolution, and pathogen demographic history of 937 pneumococci collected from 1998–2012 among Navajo and White Mountain Apache Native American communities. We analyzed changes in the pneumococcal pangenome, focusing on metabolic loci and 19 polymorphic protein antigens. We found the impact of PCV on the pneumococcal population could be observed in reduced diversity, a smaller pangenome, and changing frequencies of accessory clusters of orthologous groups (COGs). Post-PCV7, diversity rebounded through clonal expansion of NVT lineages and inferred in-migration of two previously unobserved lineages. Accessory COGs frequencies trended toward pre-PCV7 values with increasing time since vaccine introduction. Contemporary frequencies of protein antigen variants are better predicted by pre-PCV7 values (1998–2000) than the preceding period (2006–2008), suggesting balancing selection may have acted in maintaining variant frequencies in this population. Overall, we present the largest genomic analysis of pneumococcal carriage in the United States to date, which includes a snapshot of a true vaccine-naïve community prior to the introduction of PCV7. These data improve our understanding of pneumococcal evolution and emphasize the need to consider pangenome composition when inferring the impact of vaccination and developing future protein-based pneumococcal vaccines. PMID:29617440
Comparative functional pan-genome analyses to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon metabolism in the genus Mycobacterium.

PubMed

Kweon, Ohgew; Kim, Seong-Jae; Blom, Jochen; Kim, Sung-Kwan; Kim, Bong-Soo; Baek, Dong-Heon; Park, Su Inn; Sutherland, John B; Cerniglia, Carl E

2015-02-14

The bacterial genus Mycobacterium is of great interest in the medical and biotechnological fields. Despite a flood of genome sequencing and functional genomics data, significant gaps in knowledge between genome and phenome seriously hinder efforts toward the treatment of mycobacterial diseases and practical biotechnological applications. In this study, we propose the use of systematic, comparative functional pan-genomic analysis to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon (PAH) metabolism in the genus Mycobacterium. Phylogenetic, phenotypic, and genomic information for 27 completely genome-sequenced mycobacteria was systematically integrated to reconstruct a mycobacterial phenotype network (MPN) with a pan-genomic concept at a network level. In the MPN, mycobacterial phenotypes show typical scale-free relationships. PAH degradation is an isolated phenotype with the lowest connection degree, consistent with phylogenetic and environmental isolation of PAH degraders. A series of functional pan-genomic analyses provide conserved and unique types of genomic evidence for strong epistatic and pleiotropic impacts on evolutionary trajectories of the PAH-degrading phenotype. Under strong natural selection, the detailed gene gain/loss patterns from horizontal gene transfer (HGT)/deletion events hypothesize a plausible evolutionary path, an epistasis-based birth and pleiotropy-dependent death, for PAH metabolism in the genus Mycobacterium. This study generated a practical mycobacterial compendium of phenotypic and genomic changes, focusing on the PAH-degrading phenotype, with a pan-genomic perspective of the evolutionary events and the environmental challenges. Our findings suggest that when selection acts on PAH metabolism, only a small fraction of possible trajectories is likely to be observed, owing mainly to a combination of the ambiguous phenotypic effects of PAHs and the corresponding pleiotropy- and epistasis-dependent evolutionary adaptation. Evolutionary constraints on the selection of trajectories, like those seen in PAH-degrading phenotypes, are likely to apply to the evolution of other phenotypes in the genus Mycobacterium.
A Genomic View of Lactobacilli and Pediococci Demonstrates that Phylogeny Matches Ecology and Physiology

PubMed Central

Zheng, Jinshui; Ruan, Lifang; Sun, Ming

2015-01-01

Lactobacilli are used widely in food, feed, and health applications. The taxonomy of the genus Lactobacillus, however, is confounded by the apparent lack of physiological markers for phylogenetic groups of lactobacilli and the unclear relationships between the diverse phylogenetic groups. This study used the core and pan-genomes of 174 type strains of Lactobacillus and Pediococcus to establish phylogenetic relationships and to identify metabolic properties differentiating phylogenetic groups. The core genome phylogenetic tree separated homofermentative lactobacilli and pediococci from heterofermentative lactobacilli. Aldolase and phosphofructokinase were generally present in homofermentative but not in heterofermentative lactobacilli; a two-domain alcohol dehydrogenase and mannitol dehydrogenase were present in most heterofermentative lactobacilli but absent in most homofermentative organisms. Other genes were predominantly present in homofermentative lactobacilli (pyruvate formate lyase) or heterofermentative lactobacilli (lactaldehyde dehydrogenase and glycerol dehydratase). Cluster analysis of the phylogenomic tree and the average nucleotide identity grouped the genus Lactobacillus sensu lato into 24 phylogenetic groups, including pediococci, with stable intra- and intergroup relationships. Individual groups may be differentiated by characteristic metabolic properties. The link between phylogeny and physiology that is proposed in this study facilitates future studies on the ecology, physiology, and industrial applications of lactobacilli. PMID:26253671

Pangenomic Definition of Prokaryotic Species and the Phylogenetic Structure of Prochlorococcus spp.

PubMed

Moldovan, Mikhail A; Gelfand, Mikhail S

2018-01-01

The pangenome is the collection of all groups of orthologous genes (OGGs) from a set of genomes. We apply the pangenome analysis to propose a definition of prokaryotic species based on identification of lineage-specific gene sets. While being similar to the classical biological definition based on allele flow, it does not rely on DNA similarity levels and does not require analysis of homologous recombination. Hence this definition is relatively objective and independent of arbitrary thresholds. A systematic analysis of 110 accepted species with the largest numbers of sequenced strains yields results largely consistent with the existing nomenclature. However, it has revealed that abundant marine cyanobacteria Prochlorococcus marinus should be divided into two species. As a control we have confirmed the paraphyletic origin of Yersinia pseudotuberculosis (with embedded, monophyletic Y. pestis ) and Burkholderia pseudomallei (with B. mallei ). We also demonstrate that by our definition and in accordance with recent studies Escherichia coli and Shigella spp. are one species.
De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits.

PubMed

Li, Ying-hui; Zhou, Guangyu; Ma, Jianxin; Jiang, Wenkai; Jin, Long-guo; Zhang, Zhouhao; Guo, Yong; Zhang, Jinbo; Sui, Yi; Zheng, Liangtao; Zhang, Shan-shan; Zuo, Qiyang; Shi, Xue-hui; Li, Yan-fei; Zhang, Wan-ke; Hu, Yiyao; Kong, Guanyi; Hong, Hui-long; Tan, Bing; Song, Jian; Liu, Zhang-xiong; Wang, Yaoshen; Ruan, Hang; Yeung, Carol K L; Liu, Jian; Wang, Hailong; Zhang, Li-juan; Guan, Rong-xia; Wang, Ke-jing; Li, Wen-bin; Chen, Shou-yi; Chang, Ru-zhen; Jiang, Zhi; Jackson, Scott A; Li, Ruiqiang; Qiu, Li-juan

2014-10-01

Wild relatives of crops are an important source of genetic diversity for agriculture, but their gene repertoire remains largely unexplored. We report the establishment and analysis of a pan-genome of Glycine soja, the wild relative of cultivated soybean Glycine max, by sequencing and de novo assembly of seven phylogenetically and geographically representative accessions. Intergenomic comparisons identified lineage-specific genes and genes with copy number variation or large-effect mutations, some of which show evidence of positive selection and may contribute to variation of agronomic traits such as biotic resistance, seed composition, flowering and maturity time, organ size and final biomass. Approximately 80% of the pan-genome was present in all seven accessions (core), whereas the rest was dispensable and exhibited greater variation than the core genome, perhaps reflecting a role in adaptation to diverse environments. This work will facilitate the harnessing of untapped genetic diversity from wild soybean for enhancement of elite cultivars.
Pan-Genomic Analysis Provides Insights into the Genomic Variation and Evolution of Salmonella Paratyphi A

PubMed Central

Chen, Chunxia; Cui, Xiaoying; Yu, Jun; Xiao, Jingfa; Kan, Biao

2012-01-01

Salmonella Paratyphi A (S. Paratyphi A) is a highly adapted, human-specific pathogen that causes paratyphoid fever. Cases of paratyphoid fever have recently been increasing, and the disease is becoming a major public health concern, especially in Eastern and Southern Asia. To investigate the genomic variation and evolution of S. Paratyphi A, a pan-genomic analysis was performed on five newly sequenced S. Paratyphi A strains and two other reference strains. A whole genome comparison revealed that the seven genomes are collinear and that their organization is highly conserved. The high rate of substitutions in part of the core genome indicates that there are frequent homologous recombination events. Based on the changes in the pan-genome size and cluster number (both in the core functional genes and core pseudogenes), it can be inferred that the sharply increasing number of pseudogene clusters may have strong correlation with the inactivation of functional genes, and indicates that the S. Paratyphi A genome is being degraded. PMID:23028950
Exploring the Genomic Traits of Non-toxigenic Vibrio parahaemolyticus Strains Isolated in Southern Chile

PubMed Central

Castillo, Daniel; Pérez-Reytor, Diliana; Plaza, Nicolás; Ramírez-Araya, Sebastián; Blondel, Carlos J.; Corsini, Gino; Bastías, Roberto; Loyola, David E.; Jaña, Víctor; Pavez, Leonardo; García, Katherine

2018-01-01

Vibrio parahaemolyticus is the leading cause of seafood-borne gastroenteritis worldwide. As reported in other countries, after the rise and fall of the pandemic strain in Chile, other post-pandemic strains have been associated with clinical cases, including strains lacking the major toxins TDH and TRH. Since the presence or absence of tdh and trh genes has been used for diagnostic purposes and as a proxy of the virulence of V. parahaemolyticus isolates, the understanding of virulence in V. parahaemolyticus strains lacking toxins is essential to detect these strains present in water and marine products to avoid possible food-borne infection. In this study, we characterized the genome of four environmental and two clinical non-toxigenic strains (tdh-, trh-, and T3SS2-). Using whole-genome sequencing, phylogenetic, and comparative genome analysis, we identified the core and pan-genome of V. parahaemolyticus of strains of southern Chile. The phylogenetic tree based on the core genome showed low genetic diversity but the analysis of the pan-genome revealed that all strains harbored genomic islands carrying diverse virulence and fitness factors or prophage-like elements that encode toxins like Zot and RTX. Interestingly, the three strains carrying Zot-like toxin have a different sequence, although the alignment showed some conserved areas with the zot sequence found in V. cholerae. In addition, we identified an unexpected diversity in the genetic architecture of the T3SS1 gene cluster and the presence of the T3SS2 gene cluster in a non-pandemic environmental strain. Our study sheds light on the diversity of V. parahaemolyticus strains from the southern Pacific which increases our current knowledge regarding the global diversity of this organism. PMID:29472910
Consensus pan-genome assembly of the specialised wine bacterium Oenococcus oeni.

PubMed

Sternes, Peter R; Borneman, Anthony R

2016-04-27

Oenococcus oeni is a lactic acid bacterium that is specialised for growth in the ecological niche of wine, where it is noted for its ability to perform the secondary, malolactic fermentation that is often required for many types of wine. Expanding the understanding of strain-dependent genetic variations in its small and streamlined genome is important for realising its full potential in industrial fermentation processes. Whole genome comparison was performed on 191 strains of O. oeni; from this rich source of genomic information consensus pan-genome assemblies of the invariant (core) and variable (flexible) regions of this organism were established. Genetic variation in amino acid biosynthesis and sugar transport and utilisation was found to be common between strains. Furthermore, we characterised previously-unreported intra-specific genetic variations in the natural competence of this microbe. By assembling a consensus pan-genome from a large number of strains, this study provides a tool for researchers to readily compare protein-coding genes across strains and infer functional relationships between genes in conserved syntenic regions. This establishes a foundation for further genetic, and thus phenotypic, research of this industrially-important species.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jun, Se -Ran; Hauser, Loren John; Schadt, Christopher Warren

For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost effective way to screen samples of interestmore » for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. As a result, we present a computational method called pangenome based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8 0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed reference OTU picking strategies against specific reference sequence databases. In conclusion, we developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub.« less
AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

PubMed Central

Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

2015-01-01

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462
Pan-genome analysis of human gastric pathogen H. pylori: comparative genomics and pathogenomics approaches to identify regions associated with pathogenicity and prediction of potential core therapeutic targets.

PubMed

Ali, Amjad; Naz, Anam; Soares, Siomar C; Bakhtiar, Marriam; Tiwari, Sandeep; Hassan, Syed S; Hanan, Fazal; Ramos, Rommel; Pereira, Ulisses; Barh, Debmalya; Figueiredo, Henrique César Pereira; Ussery, David W; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

2015-01-01

Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70%) around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193) constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.
Comparative Genomics of 12 Strains of Erwinia amylovora Identifies a Pan-Genome with a Large Conserved Core

PubMed Central

Mann, Rachel A.; Smits, Theo H. M.; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E.; Plummer, Kim M.; Beer, Steven V.; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

2013-01-01

The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1Ea and a putative secondary metabolite pathway only present in Rubus-infecting strains. PMID:23409014
Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

PubMed

Mann, Rachel A; Smits, Theo H M; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E; Plummer, Kim M; Beer, Steven V; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

2013-01-01

The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea) and a putative secondary metabolite pathway only present in Rubus-infecting strains.
Caldicellulosiruptor Core and Pangenomes Reveal Determinants for Noncellulosomal Thermophilic Deconstruction of Plant Biomass

PubMed Central

Blumer-Schuette, Sara E.; Giannone, Richard J.; Zurawski, Jeffrey V.; Ozdemir, Inci; Ma, Qin; Yin, Yanbin; Xu, Ying; Kataeva, Irina; Poole, Farris L.; Adams, Michael W. W.; Hamilton-Brehm, Scott D.; Elkins, James G.; Larimer, Frank W.; Land, Miriam L.; Hauser, Loren J.; Cottingham, Robert W.; Hettich, Robert L.

2012-01-01

Extremely thermophilic bacteria of the genus Caldicellulosiruptor utilize carbohydrate components of plant cell walls, including cellulose and hemicellulose, facilitated by a diverse set of glycoside hydrolases (GHs). From a biofuel perspective, this capability is crucial for deconstruction of plant biomass into fermentable sugars. While all species from the genus grow on xylan and acid-pretreated switchgrass, growth on crystalline cellulose is variable. The basis for this variability was examined using microbiological, genomic, and proteomic analyses of eight globally diverse Caldicellulosiruptor species. The open Caldicellulosiruptor pangenome (4,009 open reading frames [ORFs]) encodes 106 GHs, representing 43 GH families, but only 26 GHs from 17 families are included in the core (noncellulosic) genome (1,543 ORFs). Differentiating the strongly cellulolytic Caldicellulosiruptor species from the others is a specific genomic locus that encodes multidomain cellulases from GH families 9 and 48, which are associated with cellulose-binding modules. This locus also encodes a novel adhesin associated with type IV pili, which was identified in the exoproteome bound to crystalline cellulose. Taking into account the core genomes, pangenomes, and individual genomes, the ancestral Caldicellulosiruptor was likely cellulolytic and evolved, in some cases, into species that lost the ability to degrade crystalline cellulose while maintaining the capacity to hydrolyze amorphous cellulose and hemicellulose. PMID:22636774
Locating hardware faults in a data communications network of a parallel computer

DOEpatents

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-01-12

Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
Calling Chromosome Alterations, DNA Methylation Statuses, and Mutations in Tumors by Simple Targeted Next-Generation Sequencing: A Solution for Transferring Integrated Pangenomic Studies into Routine Practice?

PubMed

Garinet, Simon; Néou, Mario; de La Villéon, Bruno; Faillot, Simon; Sakat, Julien; Da Fonseca, Juliana P; Jouinot, Anne; Le Tourneau, Christophe; Kamal, Maud; Luscap-Rondof, Windy; Boeva, Valentina; Gaujoux, Sebastien; Vidaud, Michel; Pasmant, Eric; Letourneur, Franck; Bertherat, Jérôme; Assié, Guillaume

2017-09-01

Pangenomic studies identified distinct molecular classes for many cancers, with major clinical applications. However, routine use requires cost-effective assays. We assessed whether targeted next-generation sequencing (NGS) could call chromosomal alterations and DNA methylation status. A training set of 77 tumors and a validation set of 449 (43 tumor types) were analyzed by targeted NGS and single-nucleotide polymorphism (SNP) arrays. Thirty-two tumors were analyzed by NGS after bisulfite conversion, and compared to methylation array or methylation-specific multiplex ligation-dependent probe amplification. Considering allelic ratios, correlation was strong between targeted NGS and SNP arrays (r = 0.88). In contrast, considering DNA copy number, for variations of one DNA copy, correlation was weaker between read counts and SNP array (r = 0.49). Thus, we generated TARGOMICs, optimized for detecting chromosome alterations by combining allelic ratios and read counts generated by targeted NGS. Sensitivity for calling normal, lost, and gained chromosomes was 89%, 72%, and 31%, respectively. Specificity was 81%, 93%, and 98%, respectively. These results were confirmed in the validation set. Finally, TARGOMICs could efficiently align and compute proportions of methylated cytosines from bisulfite-converted DNA from targeted NGS. In conclusion, beyond calling mutations, targeted NGS efficiently calls chromosome alterations and methylation status in tumors. A single run and minor design/protocol adaptations are sufficient. Optimizing targeted NGS should expand translation of genomics to clinical routine. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Comparative genomics of Lactobacillus salivarius strains focusing on their host adaptation.

PubMed

Lee, Jun-Yeong; Han, Geon Goo; Kim, Eun Bae; Choi, Yun-Jaie

2017-12-01

Lactobacillus salivarius is an important member of the animal gut microflora and is a promising probiotic bacterium. However, there is a lack of research on the genomic diversity of L. salivarius species. In this study, we generated 21 L. salivarius draft genomes, and investigated the pan-genome of L. salivarius strains isolated from humans, pigs and chickens using all available genomes, focusing on host adaptation. Phylogenetic clustering showed a distinct categorization of L. salivarius strains depending on their hosts. In the pan-genome, 15 host-specific genes and 16 dual-host-shared genes that only one host isolate did not possess were identified. Comparison of 56 extracellular protein encoding genes and 124 orthologs related to exopolysaccharide production in the pan-genome revealed that extracellular components of the assayed bacteria have been globally acquired and mutated under the selection pressure for host adaptation. We also found the three host-specific genes that are responsible for energy production in L. salivarius. These results showed that L. salivarius has evolved to adapt to host habitats in two ways, by gaining the abilities for niche adhesion and efficient utilization of nutrients. Our study offers a deeper understanding of the probiotic species L. salivarius, and provides a basis for future studies on L. salivarius and other mutualistic bacteria. Copyright © 2017 Elsevier GmbH. All rights reserved.
A Pan-Genomic Approach to Understand the Basis of Host Adaptation in Achromobacter

PubMed Central

Jeukens, Julie; Freschi, Luca; Vincent, Antony T.; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Charette, Steve J.

2017-01-01

Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the cystic fibrosis lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of Achromobacter xylosoxidans, Achromobacter insuavis, Achromobacter dolens, and Achromobacter ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared with other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus’s resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. PMID:28383665
The pangenome of (Antarctic) Pseudoalteromonas bacteria: evolutionary and functional insights.

PubMed

Bosi, Emanuele; Fondi, Marco; Orlandini, Valerio; Perrin, Elena; Maida, Isabel; de Pascale, Donatella; Tutino, Maria Luisa; Parrilli, Ermenegilda; Lo Giudice, Angelina; Filloux, Alain; Fani, Renato

2017-01-17

Pseudoalteromonas is a genus of ubiquitous marine bacteria used as model organisms to study the biological mechanisms involved in the adaptation to cold conditions. A remarkable feature shared by these bacteria is their ability to produce secondary metabolites with a strong antimicrobial and antitumor activity. Despite their biotechnological relevance, representatives of this genus are still lacking (with few exceptions) an extensive genomic characterization, including features involved in the evolution of secondary metabolites production. Indeed, biotechnological applications would greatly benefit from such analysis. Here, we analyzed the genomes of 38 strains belonging to different Pseudoalteromonas species and isolated from diverse ecological niches, including extreme ones (i.e. Antarctica). These sequences were used to reconstruct the largest Pseudoalteromonas pangenome computed so far, including also the two main groups of Pseudoalteromonas strains (pigmented and not pigmented strains). The downstream analyses were conducted to describe the genomic diversity, both at genus and group levels. This allowed highlighting a remarkable genomic heterogeneity, even for closely related strains. We drafted all the main evolutionary steps that led to the current structure and gene content of Pseudoalteromonas representatives. These, most likely, included an extensive genome reduction and a strong contribution of Horizontal Gene Transfer (HGT), which affected biotechnologically relevant gene sets and occurred in a strain-specific fashion. Furthermore, this study also identified the genomic determinants related to some of the most interesting features of the Pseudoalteromonas representatives, such as the production of secondary metabolites, the adaptation to cold temperatures and the resistance to abiotic compounds. This study poses the bases for a comprehensive understanding of the evolutionary trajectories followed in time by this peculiar bacterial genus and for a focused exploitation of their biotechnological potential.
Beyond genomic variation--comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage.

PubMed

Lin, Ke; Zhang, Ningwen; Severing, Edouard I; Nijveen, Harm; Cheng, Feng; Visser, Richard G F; Wang, Xiaowu; de Ridder, Dick; Bonnema, Guusje

2014-03-31

Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops. To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA). By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa.
The evolution of metabolic networks of E. coli

PubMed Central

2011-01-01

Background Despite the availability of numerous complete genome sequences from E. coli strains, published genome-scale metabolic models exist only for two commensal E. coli strains. These models have proven useful for many applications, such as engineering strains for desired product formation, and we sought to explore how constructing and evaluating additional metabolic models for E. coli strains could enhance these efforts. Results We used the genomic information from 16 E. coli strains to generate an E. coli pangenome metabolic network by evaluating their collective 76,990 ORFs. Each of these ORFs was assigned to one of 17,647 ortholog groups including ORFs associated with reactions in the most recent metabolic model for E. coli K-12. For orthologous groups that contain an ORF already represented in the MG1655 model, the gene to protein to reaction associations represented in this model could then be easily propagated to other E. coli strain models. All remaining orthologous groups were evaluated to see if new metabolic reactions could be added to generate a pangenome-scale metabolic model (iEco1712_pan). The pangenome model included reactions from a metabolic model update for E. coli K-12 MG1655 (iEco1339_MG1655) and enabled development of five additional strain-specific genome-scale metabolic models. These additional models include a second K-12 strain (iEco1335_W3110) and four pathogenic strains (two enterohemorrhagic E. coli O157:H7 and two uropathogens). When compared to the E. coli K-12 models, the metabolic models for the enterohemorrhagic (iEco1344_EDL933 and iEco1345_Sakai) and uropathogenic strains (iEco1288_CFT073 and iEco1301_UTI89) contained numerous lineage-specific gene and reaction differences. All six E. coli models were evaluated by comparing model predictions to carbon source utilization measurements under aerobic and anaerobic conditions, and to batch growth profiles in minimal media with 0.2% (w/v) glucose. An ancestral genome-scale metabolic model based on conserved ortholog groups in all 16 E. coli genomes was also constructed, reflecting the conserved ancestral core of E. coli metabolism (iEco1053_core). Comparative analysis of all six strain-specific E. coli models revealed that some of the pathogenic E. coli strains possess reactions in their metabolic networks enabling higher biomass yields on glucose. Finally the lineage-specific metabolic traits were compared to the ancestral core model predictions to derive new insight into the evolution of metabolism within this species. Conclusion Our findings demonstrate that a pangenome-scale metabolic model can be used to rapidly construct additional E. coli strain-specific models, and that quantitative models of different strains of E. coli can accurately predict strain-specific phenotypes. Such pangenome and strain-specific models can be further used to engineer metabolic phenotypes of interest, such as designing new industrial E. coli strains. PMID:22044664
BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

PubMed

Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

2017-01-01

Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

PubMed Central

Cheng, Gong; Zhang, Guocai; Xu, Liang

2017-01-01

Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317

Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition

PubMed Central

Lefébure, Tristan; Stanhope, Michael J

2007-01-01

Background The genus Streptococcus is one of the most diverse and important human and agricultural pathogens. This study employs comparative evolutionary analyses of 26 Streptococcus genomes to yield an improved understanding of the relative roles of recombination and positive selection in pathogen adaptation to their hosts. Results Streptococcus genomes exhibit extreme levels of evolutionary plasticity, with high levels of gene gain and loss during species and strain evolution. S. agalactiae has a large pan-genome, with little recombination in its core-genome, while S. pyogenes has a smaller pan-genome and much more recombination of its core-genome, perhaps reflecting the greater habitat, and gene pool, diversity for S. agalactiae compared to S. pyogenes. Core-genome recombination was evident in all lineages (18% to 37% of the core-genome judged to be recombinant), while positive selection was mainly observed during species differentiation (from 11% to 34% of the core-genome). Positive selection pressure was unevenly distributed across lineages and biochemical main role categories. S. suis was the lineage with the greatest level of positive selection pressure, the largest number of unique loci selected, and the largest amount of gene gain and loss. Conclusion Recombination is an important evolutionary force in shaping Streptococcus genomes, not only in the acquisition of significant portions of the genome as lineage specific loci, but also in facilitating rapid evolution of the core-genome. Positive selection, although undoubtedly a slower process, has nonetheless played an important role in adaptation of the core-genome of different Streptococcus species to different hosts. PMID:17475002
Big cat phylogenies, consensus trees, and computational thinking.

PubMed

Sul, Seung-Jin; Williams, Tiffani L

2011-07-01

Phylogenetics seeks to deduce the pattern of relatedness between organisms by using a phylogeny or evolutionary tree. For a given set of organisms or taxa, there may be many evolutionary trees depicting how these organisms evolved from a common ancestor. As a result, consensus trees are a popular approach for summarizing the shared evolutionary relationships in a group of trees. We examine these consensus techniques by studying how the pantherine lineage of cats (clouded leopard, jaguar, leopard, lion, snow leopard, and tiger) evolved, which is hotly debated. While there are many phylogenetic resources that describe consensus trees, there is very little information, written for biologists, regarding the underlying computational techniques for building them. The pantherine cats provide us with a small, relevant example to explore the computational techniques (such as sorting numbers, hashing functions, and traversing trees) for constructing consensus trees. Our hope is that life scientists enjoy peeking under the computational hood of consensus tree construction and share their positive experiences with others in their community.
Fueling Future with Algal Genomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor

Algae constitute a major component of fundamental eukaryotic diversity, play profound roles in the carbon cycle, and are prominent candidates for biofuel production. The US Department of Energy Joint Genome Institute (JGI) is leading the world in algal genome sequencing (http://jgi.doe.gov/Algae) and contributes of the algal genome projects worldwide (GOLD database, 2012). The sequenced algal genomes offer catalogs of genes, networks, and pathways. The sequenced first of its kind genomes of a haptophyte E.huxleyii, chlorarachniophyte B.natans, and cryptophyte G.theta fill the gaps in the eukaryotic tree of life and carry unique genes and pathways as well as molecular fossils ofmore » secondary endosymbiosis. Natural adaptation to conditions critical for industrial production is encoded in algal genomes, for example, growth of A.anophagefferens at very high cell densities during the harmful algae blooms or a global distribution across diverse environments of E.huxleyii, able to live on sparse nutrients due to its expanded pan-genome. Communications and signaling pathways can be derived from simple symbiotic systems like lichens or complex marine algae metagenomes. Collectively these datasets derived from algal genomics contribute to building a comprehensive parts list essential for algal biofuel development.« less
Identifying failure in a tree network of a parallel computer

DOEpatents

Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

2010-08-24

Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.
Comparative genome analysis of the Flavobacteriales bacterium strain UJ101, isolated from the gut of Atergatis reticulatus.

PubMed

Yang, Jhung-Ahn; Yang, Sung-Hyun; Kim, Junghee; Kwon, Kae Kyoung; Oh, Hyun-Myung

2017-07-01

Here we report the comparative genomic analysis of strain UJ101 with 15 strains from the family Flavobacteriaceae, using the CGExplorer program. Flavobacteriales bacterium strain UJ101 was isolated from a xanthid crab, Atergatis reticulatus, from the East Sea near Korea. The complete genome of strain UJ101 is a 3,074,209 bp, single, circular chromosome with 30.74% GC content. While the UJ101 genome contains a number of annotated genes for many metabolic pathways, such as the Embden-Meyerhof pathway, the pentose phosphate pathway, the tricarboxylic acid (TCA) cycle, and the glyoxylate cycle, genes for the Entner-Douddoroff pathway are not found in the UJ101 genome. Overall, carbon fixation processes were absent but nitrate reduction and denitrification pathways were conserved. The UJ101 genome was compared to genomes from other marine animals (three invertebrate strains and 5 fish strains) and other marine animal- derived genera. Notable results by genome comparisons showed that UJ101 is capable of denitrification and nitrate reduction, and that biotin-thiamine pathway participation varies among marine bacteria; fish-dwelling bacteria, freeliving bacteria, invertebrate-dwelling bacteria, and strain UJ101. Pan-genome analysis of the 16 strains in this study included 7,220 non-redundant genes that covered 62% of the pan-genome. A core-genome of 994 genes was present and consisted of 8% of the genes from the pan-genome. Strain UJ101 is a symbiotic hetero-organotroph isolated from xanthid crab, and is a metabolic generalist with nitrate-reducing abilities but without the ability to synthesize biotin. There is a general tendency of UJ101 and some fish pathogens to prefer thiamine-dependent glycolysis to gluconeogenesis. Biotin and thiamine auxotrophy or prototrophy may be used as important markers in microbial community studies.
Pan-genome multilocus sequence typing and outbreak-specific reference-based single nucleotide polymorphism analysis to resolve two concurrent Staphylococcus aureus outbreaks in neonatal services.

PubMed

Roisin, S; Gaudin, C; De Mendonça, R; Bellon, J; Van Vaerenbergh, K; De Bruyne, K; Byl, B; Pouseele, H; Denis, O; Supply, P

2016-06-01

We used a two-step whole genome sequencing analysis for resolving two concurrent outbreaks in two neonatal services in Belgium, caused by exfoliative toxin A-encoding-gene-positive (eta+) methicillin-susceptible Staphylococcus aureus with an otherwise sporadic spa-type t209 (ST-109). Outbreak A involved 19 neonates and one healthcare worker in a Brussels hospital from May 2011 to October 2013. After a first episode interrupted by decolonization procedures applied over 7 months, the outbreak resumed concomitantly with the onset of outbreak B in a hospital in Asse, comprising 11 neonates and one healthcare worker from mid-2012 to January 2013. Pan-genome multilocus sequence typing, defined on the basis of 42 core and accessory reference genomes, and single-nucleotide polymorphisms mapped on an outbreak-specific de novo assembly were used to compare 28 available outbreak isolates and 19 eta+/spa-type t209 isolates identified by routine or nationwide surveillance. Pan-genome multilocus sequence typing showed that the outbreaks were caused by independent clones not closely related to any of the surveillance isolates. Isolates from only ten cases with overlapping stays in outbreak A, including four pairs of twins, showed no or only a single nucleotide polymorphism variation, indicating limited sequential transmission. Detection of larger genomic variation, even from the start of the outbreak, pointed to sporadic seeding from a pre-existing exogenous source, which persisted throughout the whole course of outbreak A. Whole genome sequencing analysis can provide unique fine-tuned insights into transmission pathways of complex outbreaks even at their inception, which, with timely use, could valuably guide efforts for early source identification. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Legionella pneumophila pangenome reveals strain-specific virulence factors.

PubMed

D'Auria, Giuseppe; Jiménez-Hernández, Nuria; Peris-Bondia, Francesc; Moya, Andrés; Latorre, Amparo

2010-03-17

Legionella pneumophila subsp. pneumophila is a gram-negative gamma-Proteobacterium and the causative agent of Legionnaires' disease, a form of epidemic pneumonia. It has a water-related life cycle. In industrialized cities L. pneumophila is commonly encountered in refrigeration towers and water pipes. Infection is always via infected aerosols to humans. Although many efforts have been made to eradicate Legionella from buildings, it still contaminates the water systems. The town of Alcoy (Valencian Region, Spain) has had recurrent outbreaks since 1999. The strain "Alcoy 2300/99" is a particularly persistent and recurrent strain that was isolated during one of the most significant outbreaks between the years 1999-2000. We have sequenced the genome of the particularly persistent L. pneumophila strain Alcoy 2300/99 and have compared it with four previously sequenced strains known as Philadelphia (USA), Lens (France), Paris (France) and Corby (England).Pangenome analysis facilitated the identification of strain-specific features, as well as some that are shared by two or more strains. We identified: (1) three islands related to anti-drug resistance systems; (2) a system for transport and secretion of heavy metals; (3) three systems related to DNA transfer; (4) two CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems, known to provide resistance against phage infections, one similar in the Lens and Alcoy strains, and another specific to the Paris strain; and (5) seven islands of phage-related proteins, five of which seem to be strain-specific and two shared. The dispensable genome disclosed by the pangenomic analysis seems to be a reservoir of new traits that have mainly been acquired by horizontal gene transfer and could confer evolutionary advantages over strains lacking them.
A pan-genomic approach to understand the basis of host adaptation in Achromobacter.

PubMed

Jeukens, J; Freschi, L; Vincent, A T; Emond-Rheault, J G; Kukavica-Ibrulj, I; Charette, S J; Levesque, R C

2017-04-05

Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis (CF) lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the CF lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of A. xylosoxidans A insuavis A. dolens and A. ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared to other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus's resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus.

PubMed

Hurgobin, Bhavna; Golicz, Agnieszka A; Bayer, Philipp E; Chan, Chon-Kit Kenneth; Tirnaz, Soodeh; Dolatabadian, Aria; Schiessl, Sarah V; Samans, Birgit; Montenegro, Juan D; Parkin, Isobel A P; Pires, J Chris; Chalhoub, Boulos; King, Graham J; Snowdon, Rod; Batley, Jacqueline; Edwards, David

2018-07-01

Homoeologous exchanges (HEs) have been shown to generate novel gene combinations and phenotypes in a range of polyploid species. Gene presence/absence variation (PAV) is also a major contributor to genetic diversity. In this study, we show that there is an association between these two events, particularly in recent Brassica napus synthetic accessions, and that these represent a novel source of genetic diversity, which can be captured for the improvement of this important crop species. By assembling the pangenome of B. napus, we show that 38% of the genes display PAV behaviour, with some of these variable genes predicted to be involved in important agronomic traits including flowering time, disease resistance, acyl lipid metabolism and glucosinolate metabolism. This study is a first and provides a detailed characterization of the association between HEs and PAVs in B. napus at the pangenome level. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Heterogeneous Compression of Large Collections of Evolutionary Trees.

PubMed

Matthews, Suzanne J

2015-01-01

Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
Pan-Genomic Analysis Permits Differentiation of Virulent and Non-virulent Strains of Xanthomonas arboricola That Cohabit Prunus spp. and Elucidate Bacterial Virulence Factors

PubMed Central

Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M.; Cubero, Jaime

2017-01-01

Xanthomonas arboricola is a plant-associated bacterial species that causes diseases on several plant hosts. One of the most virulent pathovars within this species is X. arboricola pv. pruni (Xap), the causal agent of bacterial spot disease of stone fruit trees and almond. Recently, a non-virulent Xap-look-a-like strain isolated from Prunus was characterized and its genome compared to pathogenic strains of Xap, revealing differences in the profile of virulence factors, such as the genes related to the type III secretion system (T3SS) and type III effectors (T3Es). The existence of this atypical strain arouses several questions associated with the abundance, the pathogenicity, and the evolutionary context of X. arboricola on Prunus hosts. After an initial characterization of a collection of Xanthomonas strains isolated from Prunus bacterial spot outbreaks in Spain during the past decade, six Xap-look-a-like strains, that did not clustered with the pathogenic strains of Xap according to a multi locus sequence analysis, were identified. Pathogenicity of these strains was analyzed and the genome sequences of two Xap-look-a-like strains, CITA 14 and CITA 124, non-virulent to Prunus spp., were obtained and compared to those available genomes of X. arboricola associated with this host plant. Differences were found among the genomes of the virulent and the Prunus non-virulent strains in several characters related to the pathogenesis process. Additionally, a pan-genomic analysis that included the available genomes of X. arboricola, revealed that the atypical strains associated with Prunus were related to a group of non-virulent or low virulent strains isolated from a wide host range. The repertoire of the genes related to T3SS and T3Es varied among the strains of this cluster and those strains related to the most virulent pathovars of the species, corylina, juglandis, and pruni. This variability provides information about the potential evolutionary process associated to the acquisition of pathogenicity and host specificity in X. arboricola. Finally, based in the genomic differences observed between the virulent and the non-virulent strains isolated from Prunus, a sensitive and specific real-time PCR protocol was designed to detect and identify Xap strains. This method avoids miss-identifications due to atypical strains of X. arboricola that can cohabit Prunus. PMID:28450852
Computer programs for optical dendrometer measurements of standing tree profiles

Treesearch

Jacob R. Beard; Thomas G. Matney; Emily B. Schultz

2015-01-01

Tree profile equations are effective volume predictors. Diameter data for building these equations are collected from felled trees using diameter tapes and calipers or from standing trees using optical dendrometers. Developing and implementing a profile function from the collected data is a tedious and error prone task. This study created a computer program, Profile...
Pangenome and taxonomic analysis of Salmonella enterica subspecies enterica

USDA-ARS?s Scientific Manuscript database

Salmonella enterica subspecies enterica (S. enterica ssp. I) contains almost all the major pathogens in this genus. We sequenced 354 new S. enterica ssp. I genomes using paired end 100 base reads to ~80-fold coverage. These genomes were chosen to maximize genetic diversity, representing at least 100...
Decoding the ecological function of accessory genome

USDA-ARS?s Scientific Manuscript database

Shiga toxin-producing Escherichia coli O157:H7 primarily resides in cattle asymptomatically, and can be transmitted to humans through food. A study by Lupolova et al applied a machine-learning approach to complex pan-genome information and predicted that only a small subset of bovine isolates have t...
Apps for Angiosperms: The Usability of Mobile Computers and Printed Field Guides for UK Wild Flower and Winter Tree Identification

ERIC Educational Resources Information Center

Stagg, Bethan C.; Donkin, Maria E.

2017-01-01

We investigated usability of mobile computers and field guide books with adult botanical novices, for the identification of wildflowers and deciduous trees in winter. Identification accuracy was significantly higher for wildflowers using a mobile computer app than field guide books but significantly lower for deciduous trees. User preference…
Application of a fast skyline computation algorithm for serendipitous searching problems

NASA Astrophysics Data System (ADS)

Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary

2018-02-01

Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

NASA Astrophysics Data System (ADS)

Galelli, S.; Castelletti, A.

2013-02-01

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

NASA Astrophysics Data System (ADS)

Galelli, S.; Castelletti, A.

2013-07-01

Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Tree-space statistics and approximations for large-scale analysis of anatomical trees.

PubMed

Feragen, Aasa; Owen, Megan; Petersen, Jens; Wille, Mathilde M W; Thomsen, Laura H; Dirksen, Asger; de Bruijne, Marleen

2013-01-01

Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric space of leaf-labeled trees. This tree-space is a geodesic metric space where any two trees are connected by a unique shortest path, which corresponds to a tree deformation. However, tree-space is not a manifold, and the usual strategy of performing statistical analysis in a tangent space and projecting onto tree-space is not available. Using tree-space and its shortest paths, a variety of statistical properties, such as mean, principal component, hypothesis testing and linear discriminant analysis can be defined. For some of these properties it is still an open problem how to compute them; others (like the mean) can be computed, but efficient alternatives are helpful in speeding up algorithms that use means iteratively, like hypothesis testing. In this paper, we take advantage of a very large dataset (N = 8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than healthy ones. Software is available from http://image.diku.dk/aasa/software.php.
Extended Full Computation-Tree Logic with Sequence Modal Operator: Representing Hierarchical Tree Structures

NASA Astrophysics Data System (ADS)

Kamide, Norihiro; Kaneiwa, Ken

An extended full computation-tree logic, CTLS*, is introduced as a Kripke semantics with a sequence modal operator. This logic can appropriately represent hierarchical tree structures where sequence modal operators in CTLS* are applied to tree structures. An embedding theorem of CTLS* into CTL* is proved. The validity, satisfiability and model-checking problems of CTLS* are shown to be decidable. An illustrative example of biological taxonomy is presented using CTLS* formulas.

Phenetic Comparison of Prokaryotic Genomes Using k-mers

PubMed Central

Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques

2017-01-01

Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508
Pruning a decision tree for selecting computer-related assistive devices for people with disabilities.

PubMed

Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh

2012-07-01

Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner.
Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference.

PubMed

Chernomor, Olga; Minh, Bui Quang; von Haeseler, Arndt

2015-12-01

In phylogenomic analysis the collection of trees with identical score (maximum likelihood or parsimony score) may hamper tree search algorithms. Such collections are coined phylogenetic terraces. For sparse supermatrices with a lot of missing data, the number of terraces and the number of trees on the terraces can be very large. If terraces are not taken into account, a lot of computation time might be unnecessarily spent to evaluate many trees that in fact have identical score. To save computation time during the tree search, it is worthwhile to quickly identify such cases. The score of a species tree is the sum of scores for all the so-called induced partition trees. Therefore, if the topological rearrangement applied to a species tree does not change the induced partition trees, the score of these partition trees is unchanged. Here, we provide the conditions under which the three most widely used topological rearrangements (nearest neighbor interchange, subtree pruning and regrafting, and tree bisection and reconnection) change the topologies of induced partition trees. During the tree search, these conditions allow us to quickly identify whether we can save computation time on the evaluation of newly encountered trees. We also introduce the concept of partial terraces and demonstrate that they occur more frequently than the original "full" terrace. Hence, partial terrace is the more important factor of timesaving compared to full terrace. Therefore, taking into account the above conditions and the partial terrace concept will help to speed up the tree search in phylogenomic inference.
Understanding the Scalability of Bayesian Network Inference using Clique Tree Growth Curves

NASA Technical Reports Server (NTRS)

Mengshoel, Ole Jakob

2009-01-01

Bayesian networks (BNs) are used to represent and efficiently compute with multi-variate probability distributions in a wide range of disciplines. One of the main approaches to perform computation in BNs is clique tree clustering and propagation. In this approach, BN computation consists of propagation in a clique tree compiled from a Bayesian network. There is a lack of understanding of how clique tree computation time, and BN computation time in more general, depends on variations in BN size and structure. On the one hand, complexity results tell us that many interesting BN queries are NP-hard or worse to answer, and it is not hard to find application BNs where the clique tree approach in practice cannot be used. On the other hand, it is well-known that tree-structured BNs can be used to answer probabilistic queries in polynomial time. In this article, we develop an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN's non-root nodes to the number of root nodes, or (ii) the expected number of moral edges in their moral graphs. Our approach is based on combining analytical and experimental results. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for each set. For the special case of bipartite BNs, we consequently have two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, we systematically increase the degree of the root nodes in bipartite Bayesian networks, and find that root clique growth is well-approximated by Gompertz growth curves. It is believed that this research improves the understanding of the scaling behavior of clique tree clustering, provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms, and presents an aid for analytical trade-off studies of clique tree clustering using growth curves.
Comprehensive genomic analysis of a plant growth-promoting rhizobacterium Pantoea agglomerans strain P5.

PubMed

Shariati J, Vahid; Malboobi, Mohammad Ali; Tabrizi, Zeinab; Tavakol, Elahe; Owilia, Parviz; Safari, Maryam

2017-11-15

In this study, we provide a comparative genomic analysis of Pantoea agglomerans strain P5 and 10 closely related strains based on phylogenetic analyses. A next-generation shotgun strategy was implemented using the Illumina HiSeq 2500 technology followed by core- and pan-genome analysis. The genome of P. agglomerans strain P5 contains an assembly size of 5082485 bp with 55.4% G + C content. P. agglomerans consists of 2981 core and 3159 accessory genes for Coding DNA Sequences (CDSs) based on the pan-genome analysis. Strain P5 can be grouped closely with strains PG734 and 299 R using pan and core genes, respectively. All the predicted and annotated gene sequences were allocated to KEGG pathways. Accordingly, genes involved in plant growth-promoting (PGP) ability, including phosphate solubilization, IAA and siderophore production, acetoin and 2,3-butanediol synthesis and bacterial secretion, were assigned. This study provides an in-depth view of the PGP characteristics of strain P5, highlighting its potential use in agriculture as a biofertilizer.
Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”

PubMed Central

Tettelin, Hervé; Masignani, Vega; Cieslewicz, Michael J.; Donati, Claudio; Medini, Duccio; Ward, Naomi L.; Angiuoli, Samuel V.; Crabtree, Jonathan; Jones, Amanda L.; Durkin, A. Scott; DeBoy, Robert T.; Davidsen, Tanja M.; Mora, Marirosa; Scarselli, Maria; Margarit y Ros, Immaculada; Peterson, Jeremy D.; Hauser, Christopher R.; Sundaram, Jaideep P.; Nelson, William C.; Madupu, Ramana; Brinkac, Lauren M.; Dodson, Robert J.; Rosovitz, Mary J.; Sullivan, Steven A.; Daugherty, Sean C.; Haft, Daniel H.; Selengut, Jeremy; Gwinn, Michelle L.; Zhou, Liwei; Zafar, Nikhat; Khouri, Hoda; Radune, Diana; Dimitrov, George; Watkins, Kisha; O'Connor, Kevin J. B.; Smith, Shannon; Utterback, Teresa R.; White, Owen; Rubens, Craig E.; Grandi, Guido; Madoff, Lawrence C.; Kasper, Dennis L.; Telford, John L.; Wessels, Michael R.; Rappuoli, Rino; Fraser, Claire M.

2005-01-01

The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. PMID:16172379
Biogeography of the Sulfolobus islandicus pan-genome

PubMed Central

Reno, Michael L.; Held, Nicole L.; Fields, Christopher J.; Burke, Patricia V.; Whitaker, Rachel J.

2009-01-01

Variation in gene content has been hypothesized to be the primary mode of adaptive evolution in microorganisms; however, very little is known about the spatial and temporal distribution of variable genes. Through population-scale comparative genomics of 7 Sulfolobus islandicus genomes from 3 locations, we demonstrate the biogeographical structure of the pan-genome of this species, with no evidence of gene flow between geographically isolated populations. The evolutionary independence of each population allowed us to assess genome dynamics over very recent evolutionary time, beginning ≈910,000 years ago. On this time scale, genome variation largely consists of recent strain-specific integration of mobile elements. Localized sectors of parallel gene loss are identified; however, the balance between the gain and loss of genetic material suggests that S. islandicus genomes acquire material slowly over time, primarily from closely related Sulfolobus species. Examination of the genome dynamics through population genomics in S. islandicus exposes the process of allopatric speciation in thermophilic Archaea and brings us closer to a generalized framework for understanding microbial genome evolution in a spatial context. PMID:19435847
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
A Comparative Pan-Genome Perspective of Niche-Adaptable Cell-Surface Protein Phenotypes in Lactobacillus rhamnosus

PubMed Central

Kant, Ravi; Sigvart-Mattila, Pia; Paulin, Lars; Mecklin, Jukka-Pekka; Saarela, Maria; Palva, Airi; von Ossowski, Ingemar

2014-01-01

Lactobacillus rhamnosus is a ubiquitously adaptable Gram-positive bacterium and as a typical commensal can be recovered from various microbe-accessible bodily orifices and cavities. Then again, other isolates are food-borne, with some of these having been long associated with naturally fermented cheeses and yogurts. Additionally, because of perceived health benefits to humans and animals, numerous L. rhamnosus strains have been selected for use as so-called probiotics and are often taken in the form of dietary supplements and functional foods. At the genome level, it is anticipated that certain genetic variances will have provided the niche-related phenotypes that augment the flexible adaptiveness of this species, thus enabling its strains to grow and survive in their respective host environments. For this present study, we considered it functionally informative to examine and catalogue the genotype-phenotype variation existing at the cell surface between different L. rhamnosus strains, with the presumption that this might be relatable to habitat preferences and ecological adaptability. Here, we conducted a pan-genomic study involving 13 genomes from L. rhamnosus isolates with various origins. In using a benchmark strain (gut-adapted L. rhamnosus GG) for our pan-genome comparison, we had focused our efforts on a detailed examination and description of gene products for certain functionally relevant surface-exposed proteins, each of which in effect might also play a part in niche adaptability among the other strains. Perhaps most significantly of the surface protein loci we had analyzed, it would appear that the spaCBA operon (known to encode SpaCBA-called pili having a mucoadhesive phenotype) is a genomic rarity and an uncommon occurrence in L. rhamnosus. However, for any of the so-piliated L. rhamnosus strains, they will likely possess an increased niche-specific fitness, which functionally might presumably be manifested by a protracted transient colonization of the gut mucosa or some similar microhabitat. PMID:25032833
Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference

PubMed Central

Minh, Bui Quang; von Haeseler, Arndt

2015-01-01

Abstract In phylogenomic analysis the collection of trees with identical score (maximum likelihood or parsimony score) may hamper tree search algorithms. Such collections are coined phylogenetic terraces. For sparse supermatrices with a lot of missing data, the number of terraces and the number of trees on the terraces can be very large. If terraces are not taken into account, a lot of computation time might be unnecessarily spent to evaluate many trees that in fact have identical score. To save computation time during the tree search, it is worthwhile to quickly identify such cases. The score of a species tree is the sum of scores for all the so-called induced partition trees. Therefore, if the topological rearrangement applied to a species tree does not change the induced partition trees, the score of these partition trees is unchanged. Here, we provide the conditions under which the three most widely used topological rearrangements (nearest neighbor interchange, subtree pruning and regrafting, and tree bisection and reconnection) change the topologies of induced partition trees. During the tree search, these conditions allow us to quickly identify whether we can save computation time on the evaluation of newly encountered trees. We also introduce the concept of partial terraces and demonstrate that they occur more frequently than the original “full” terrace. Hence, partial terrace is the more important factor of timesaving compared to full terrace. Therefore, taking into account the above conditions and the partial terrace concept will help to speed up the tree search in phylogenomic inference. PMID:26448206
Labeled trees and the efficient computation of derivations

NASA Technical Reports Server (NTRS)

Grossman, Robert; Larson, Richard G.

1989-01-01

The effective parallel symbolic computation of operators under composition is discussed. Examples include differential operators under composition and vector fields under the Lie bracket. Data structures consisting of formal linear combinations of rooted labeled trees are discussed. A multiplication on rooted labeled trees is defined, thereby making the set of these data structures into an associative algebra. An algebra homomorphism is defined from the original algebra of operators into this algebra of trees. An algebra homomorphism from the algebra of trees into the algebra of differential operators is then described. The cancellation which occurs when noncommuting operators are expressed in terms of commuting ones occurs naturally when the operators are represented using this data structure. This leads to an algorithm which, for operators which are derivations, speeds up the computation exponentially in the degree of the operator. It is shown that the algebra of trees leads naturally to a parallel version of the algorithm.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.

PubMed

Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F

2018-03-01

Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Autumn Algorithm-Computation of Hybridization Networks for Realistic Phylogenetic Trees.

PubMed

Huson, Daniel H; Linz, Simone

2018-01-01

A minimum hybridization network is a rooted phylogenetic network that displays two given rooted phylogenetic trees using a minimum number of reticulations. Previous mathematical work on their calculation has usually assumed the input trees to be bifurcating, correctly rooted, or that they both contain the same taxa. These assumptions do not hold in biological studies and "realistic" trees have multifurcations, are difficult to root, and rarely contain the same taxa. We present a new algorithm for computing minimum hybridization networks for a given pair of "realistic" rooted phylogenetic trees. We also describe how the algorithm might be used to improve the rooting of the input trees. We introduce the concept of "autumn trees", a nice framework for the formulation of algorithms based on the mathematics of "maximum acyclic agreement forests". While the main computational problem is hard, the run-time depends mainly on how different the given input trees are. In biological studies, where the trees are reasonably similar, our parallel implementation performs well in practice. The algorithm is available in our open source program Dendroscope 3, providing a platform for biologists to explore rooted phylogenetic networks. We demonstrate the utility of the algorithm using several previously studied data sets.
PanFunPro: Bacterial Pan-Genome Analysis Based on the Functional Profiles (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lukjancenko, Oksana

2012-06-01

Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
PanFunPro: Bacterial Pan-Genome Analysis Based on the Functional Profiles (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

ScienceCinema

Lukjancenko, Oksana

2018-01-10

Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Analysis of mutational spectra by denaturant capillary electrophoresis

PubMed Central

Ekstrøm, Per O.; Khrapko, Konstantin; Li-Sucholeiki, Xiao-Cheng; Hunter, Ian W.; Thilly, William G.

2009-01-01

Numbers and kinds of point mutant within DNA from cells, tissues and human population may be discovered for nearly any 75–250bp DNA sequence. High fidelity DNA amplification incorporating a thermally stable DNA “clamp” is followed by separation by denaturing capillary electrophoresis (DCE). DCE allows for peak collection and verification sequencing. DCE in a mode of cycling temperature, e.g.+/− 5°C, CyDCE, permits high resolution of mutant sequences using computer defined analytes without preliminary optimization experiments. DNA sequencers have been modified to permit higher throughput CyDCE and a massively parallel,~25,000 capillary system, has been designed for pangenomic scans in large human populations. DCE has been used to define quantitative point mutational spectra for study a wide variety of genetic phenomena: errors of DNA polymerases, mutations induced in human cells by chemicals and irradiation, testing of human gene-common disease associations and the discovery of origins of point mutations in human development and carcinogenesis. PMID:18600220
Update on RefSeq microbial genomes resources.

PubMed

Tatusova, Tatiana; Ciufo, Stacy; Federhen, Scott; Fedorov, Boris; McVeigh, Richard; O'Neill, Kathleen; Tolstoy, Igor; Zaslavsky, Leonid

2015-01-01

NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10,000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30,000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
Tutorial: Advanced fault tree applications using HARP

NASA Technical Reports Server (NTRS)

Dugan, Joanne Bechta; Bavuso, Salvatore J.; Boyd, Mark A.

1993-01-01

Reliability analysis of fault tolerant computer systems for critical applications is complicated by several factors. These modeling difficulties are discussed and dynamic fault tree modeling techniques for handling them are described and demonstrated. Several advanced fault tolerant computer systems are described, and fault tree models for their analysis are presented. HARP (Hybrid Automated Reliability Predictor) is a software package developed at Duke University and NASA Langley Research Center that is capable of solving the fault tree models presented.
Structural system reliability calculation using a probabilistic fault tree analysis method

NASA Technical Reports Server (NTRS)

Torng, T. Y.; Wu, Y.-T.; Millwater, H. R.

1992-01-01

The development of a new probabilistic fault tree analysis (PFTA) method for calculating structural system reliability is summarized. The proposed PFTA procedure includes: developing a fault tree to represent the complex structural system, constructing an approximation function for each bottom event, determining a dominant sampling sequence for all bottom events, and calculating the system reliability using an adaptive importance sampling method. PFTA is suitable for complicated structural problems that require computer-intensive computer calculations. A computer program has been developed to implement the PFTA.

Reliability computation using fault tree analysis

NASA Technical Reports Server (NTRS)

Chelson, P. O.

1971-01-01

A method is presented for calculating event probabilities from an arbitrary fault tree. The method includes an analytical derivation of the system equation and is not a simulation program. The method can handle systems that incorporate standby redundancy and it uses conditional probabilities for computing fault trees where the same basic failure appears in more than one fault path.
UrbanCrowns: an assessment and monitoring tool for urban trees

Treesearch

Matthew F. Winn; Philip A. Araman; Sang-Mook Lee

2011-01-01

UrbanCrowns is a Windows®-based computer program used to assess the crown characteristics of urban trees. The software analyzes side-view digital photographs of trees to compute several crown metrics, including crown height, crown diameter, live crown ratio, crown volume, crown density, and crown transparency. Potential uses of the UrbanCrowns program include...
Automatic translation of digraph to fault-tree models

NASA Technical Reports Server (NTRS)

Iverson, David L.

1992-01-01

The author presents a technique for converting digraph models, including those models containing cycles, to a fault-tree format. A computer program which automatically performs this translation using an object-oriented representation of the models has been developed. The fault-trees resulting from translations can be used for fault-tree analysis and diagnosis. Programs to calculate fault-tree and digraph cut sets and perform diagnosis with fault-tree models have also been developed. The digraph to fault-tree translation system has been successfully tested on several digraphs of varying size and complexity. Details of some representative translation problems are presented. Most of the computation performed by the program is dedicated to finding minimal cut sets for digraph nodes in order to break cycles in the digraph. Fault-trees produced by the translator have been successfully used with NASA's Fault-Tree Diagnosis System (FTDS) to produce automated diagnostic systems.
Asymptotic Properties of the Number of Matching Coalescent Histories for Caterpillar-Like Families of Species Trees.

PubMed

Disanto, Filippo; Rosenberg, Noah A

2016-01-01

Coalescent histories provide lists of species tree branches on which gene tree coalescences can take place, and their enumerative properties assist in understanding the computational complexity of calculations central in the study of gene trees and species trees. Here, we solve an enumerative problem left open by Rosenberg (IEEE/ACM Transactions on Computational Biology and Bioinformatics 10: 1253-1262, 2013) concerning the number of coalescent histories for gene trees and species trees with a matching labeled topology that belongs to a generic caterpillar-like family. By bringing a generating function approach to the study of coalescent histories, we prove that for any caterpillar-like family with seed tree t , the sequence (h n ) n ≥ 0 describing the number of matching coalescent histories of the n th tree of the family grows asymptotically as a constant multiple of the Catalan numbers. Thus, h n ∼ β t c n , where the asymptotic constant β t > 0 depends on the shape of the seed tree t. The result extends a claim demonstrated only for seed trees with at most eight taxa to arbitrary seed trees, expanding the set of cases for which detailed enumerative properties of coalescent histories can be determined. We introduce a procedure that computes from t the constant β t as well as the algebraic expression for the generating function of the sequence (h n ) n ≥ 0 .
Using trees to compute approximate solutions to ordinary differential equations exactly

NASA Technical Reports Server (NTRS)

Grossman, Robert

1991-01-01

Some recent work is reviewed which relates families of trees to symbolic algorithms for the exact computation of series which approximate solutions of ordinary differential equations. It turns out that the vector space whose basis is the set of finite, rooted trees carries a natural multiplication related to the composition of differential operators, making the space of trees an algebra. This algebraic structure can be exploited to yield a variety of algorithms for manipulating vector fields and the series and algebras they generate.
The decision tree approach to classification

NASA Technical Reports Server (NTRS)

Wu, C.; Landgrebe, D. A.; Swain, P. H.

1975-01-01

A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Reassessment of the Listeria monocytogenes pan-genome reveals dynamic integration hotspots and mobile genetic elements as major components of the accessory genome.

PubMed

Kuenne, Carsten; Billion, André; Mraheil, Mobarak Abu; Strittmatter, Axel; Daniel, Rolf; Goesmann, Alexander; Barbuddhe, Sukhadeo; Hain, Torsten; Chakraborty, Trinad

2013-01-22

Listeria monocytogenes is an important food-borne pathogen and model organism for host-pathogen interaction, thus representing an invaluable target considering research on the forces governing the evolution of such microbes. The diversity of this species has not been exhaustively explored yet, as previous efforts have focused on analyses of serotypes primarily implicated in human listeriosis. We conducted complete genome sequencing of 11 strains employing 454 GS FLX technology, thereby achieving full coverage of all serotypes including the first complete strains of serotypes 1/2b, 3c, 3b, 4c, 4d, and 4e. These were comparatively analyzed in conjunction with publicly available data and assessed for pathogenicity in the Galleria mellonella insect model. The species pan-genome of L. monocytogenes is highly stable but open, suggesting an ability to adapt to new niches by generating or including new genetic information. The majority of gene-scale differences represented by the accessory genome resulted from nine hyper variable hotspots, a similar number of different prophages, three transposons (Tn916, Tn554, IS3-like), and two mobilizable islands. Only a subset of strains showed CRISPR/Cas bacteriophage resistance systems of different subtypes, suggesting a supplementary function in maintenance of chromosomal stability. Multiple phylogenetic branches of the genus Listeria imply long common histories of strains of each lineage as revealed by a SNP-based core genome tree highlighting the impact of small mutations for the evolution of species L. monocytogenes. Frequent loss or truncation of genes described to be vital for virulence or pathogenicity was confirmed as a recurring pattern, especially for strains belonging to lineages III and II. New candidate genes implicated in virulence function were predicted based on functional domains and phylogenetic distribution. A comparative analysis of small regulatory RNA candidates supports observations of a differential distribution of trans-encoded RNA, hinting at a diverse range of adaptations and regulatory impact. This study determined commonly occurring hyper variable hotspots and mobile elements as primary effectors of quantitative gene-scale evolution of species L. monocytogenes, while gene decay and SNPs seem to represent major factors influencing long-term evolution. The discovery of common and disparately distributed genes considering lineages, serogroups, serotypes and strains of species L. monocytogenes will assist in diagnostic, phylogenetic and functional research, supported by the comparative genomic GECO-LisDB analysis server (http://bioinfo.mikrobio.med.uni-giessen.de/geco2lisdb).
Computing all hybridization networks for multiple binary phylogenetic input trees.

PubMed

Albrecht, Benjamin

2015-07-30

The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.
The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

PubMed Central

Yu, Yun; Degnan, James H.; Nakhleh, Luay

2012-01-01

Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa. PMID:22536161
Global tree network for computing structures enabling global processing operations

DOEpatents

Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.

2010-01-19

A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.
Multi-level tree analysis of pulmonary artery/vein trees in non-contrast CT images

NASA Astrophysics Data System (ADS)

Gao, Zhiyun; Grout, Randall W.; Hoffman, Eric A.; Saha, Punam K.

2012-02-01

Diseases like pulmonary embolism and pulmonary hypertension are associated with vascular dystrophy. Identifying such pulmonary artery/vein (A/V) tree dystrophy in terms of quantitative measures via CT imaging significantly facilitates early detection of disease or a treatment monitoring process. A tree structure, consisting of nodes and connected arcs, linked to the volumetric representation allows multi-level geometric and volumetric analysis of A/V trees. Here, a new theory and method is presented to generate multi-level A/V tree representation of volumetric data and to compute quantitative measures of A/V tree geometry and topology at various tree hierarchies. The new method is primarily designed on arc skeleton computation followed by a tree construction based topologic and geometric analysis of the skeleton. The method starts with a volumetric A/V representation as input and generates its topologic and multi-level volumetric tree representations long with different multi-level morphometric measures. A new recursive merging and pruning algorithms are introduced to detect bad junctions and noisy branches often associated with digital geometric and topologic analysis. Also, a new notion of shortest axial path is introduced to improve the skeletal arc joining two junctions. The accuracy of the multi-level tree analysis algorithm has been evaluated using computer generated phantoms and pulmonary CT images of a pig vessel cast phantom while the reproducibility of method is evaluated using multi-user A/V separation of in vivo contrast-enhanced CT images of a pig lung at different respiratory volumes.
The use of copulas to practical estimation of multivariate stochastic differential equation mixed effects models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rupšys, P.

A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees.

PubMed

Wise, Michael J

2016-01-01

Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.
dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees

PubMed Central

2016-01-01

Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa. PMID:27898695
Efficient tree tensor network states (TTNS) for quantum chemistry: Generalizations of the density matrix renormalization group algorithm

NASA Astrophysics Data System (ADS)

Nakatani, Naoki; Chan, Garnet Kin-Lic

2013-04-01

We investigate tree tensor network states for quantum chemistry. Tree tensor network states represent one of the simplest generalizations of matrix product states and the density matrix renormalization group. While matrix product states encode a one-dimensional entanglement structure, tree tensor network states encode a tree entanglement structure, allowing for a more flexible description of general molecules. We describe an optimal tree tensor network state algorithm for quantum chemistry. We introduce the concept of half-renormalization which greatly improves the efficiency of the calculations. Using our efficient formulation we demonstrate the strengths and weaknesses of tree tensor network states versus matrix product states. We carry out benchmark calculations both on tree systems (hydrogen trees and π-conjugated dendrimers) as well as non-tree molecules (hydrogen chains, nitrogen dimer, and chromium dimer). In general, tree tensor network states require much fewer renormalized states to achieve the same accuracy as matrix product states. In non-tree molecules, whether this translates into a computational savings is system dependent, due to the higher prefactor and computational scaling associated with tree algorithms. In tree like molecules, tree network states are easily superior to matrix product states. As an illustration, our largest dendrimer calculation with tree tensor network states correlates 110 electrons in 110 active orbitals.
Coalescent histories for caterpillar-like families.

PubMed

Rosenberg, Noah A

2013-01-01

A coalescent history is an assignment of branches of a gene tree to branches of a species tree on which coalescences in the gene tree occur. The number of coalescent histories for a pair consisting of a labeled gene tree topology and a labeled species tree topology is important in gene tree probability computations, and more generally, in studying evolutionary possibilities for gene trees on species trees. Defining the Tr-caterpillar-like family as a sequence of n-taxon trees constructed by replacing the r-taxon subtree of n-taxon caterpillars by a specific r-taxon labeled topology Tr, we examine the number of coalescent histories for caterpillar-like families with matching gene tree and species tree labeled topologies. For each Tr with size r≤8, we compute the number of coalescent histories for n-taxon trees in the Tr-caterpillar-like family. Next, as n→∞, we find that the limiting ratio of the numbers of coalescent histories for the Tr family and caterpillars themselves is correlated with the number of labeled histories for Tr. The results support a view that large numbers of coalescent histories occur when a tree has both a relatively balanced subtree and a high tree depth, contributing to deeper understanding of the combinatorics of gene trees and species trees.
A fast bottom-up algorithm for computing the cut sets of noncoherent fault trees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corynen, G.C.

1987-11-01

An efficient procedure for finding the cut sets of large fault trees has been developed. Designed to address coherent or noncoherent systems, dependent events, shared or common-cause events, the method - called SHORTCUT - is based on a fast algorithm for transforming a noncoherent tree into a quasi-coherent tree (COHERE), and on a new algorithm for reducing cut sets (SUBSET). To assure sufficient clarity and precision, the procedure is discussed in the language of simple sets, which is also developed in this report. Although the new method has not yet been fully implemented on the computer, we report theoretical worst-casemore » estimates of its computational complexity. 12 refs., 10 figs.« less
Locating hardware faults in a parallel computer

DOEpatents

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-04-13

Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.
A Computational Model for Path Loss in Wireless Sensor Networks in Orchard Environments

PubMed Central

Anastassiu, Hristos T.; Vougioukas, Stavros; Fronimos, Theodoros; Regen, Christian; Petrou, Loukas; Zude, Manuela; Käthner, Jana

2014-01-01

A computational model for radio wave propagation through tree orchards is presented. Trees are modeled as collections of branches, geometrically approximated by cylinders, whose dimensions are determined on the basis of measurements in a cherry orchard. Tree canopies are modeled as dielectric spheres of appropriate size. A single row of trees was modeled by creating copies of a representative tree model positioned on top of a rectangular, lossy dielectric slab that simulated the ground. The complete scattering model, including soil and trees, enhanced by periodicity conditions corresponding to the array, was characterized via a commercial computational software tool for simulating the wave propagation by means of the Finite Element Method. The attenuation of the simulated signal was compared to measurements taken in the cherry orchard, using two ZigBee receiver-transmitter modules. Near the top of the tree canopies (at 3 m), the predicted attenuation was close to the measured one—just slightly underestimated. However, at 1.5 m the solver underestimated the measured attenuation significantly, especially when leaves were present and, as distances grew longer. This suggests that the effects of scattering from neighboring tree rows need to be incorporated into the model. However, complex geometries result in ill conditioned linear systems that affect the solver's convergence. PMID:24625738
Theory and Programs for Dynamic Modeling of Tree Rings from Climate

Treesearch

Paul C. van Deusen; Jennifer Koretz

1988-01-01

Computer programs written in GAUSS(TM) for IBM compatible personal computers are described that perform dynamic tree ring modeling with climate data; the underlying theory is also described. The programs and a separate users manual are available from the authors, although users must have the GAUSS software package on their personal computer. An example application of...

ANTLR Tree Grammar Generator and Extensions

NASA Technical Reports Server (NTRS)

Craymer, Loring

2005-01-01

A computer program implements two extensions of ANTLR (Another Tool for Language Recognition), which is a set of software tools for translating source codes between different computing languages. ANTLR supports predicated- LL(k) lexer and parser grammars, a notation for annotating parser grammars to direct tree construction, and predicated tree grammars. [ LL(k) signifies left-right, leftmost derivation with k tokens of look-ahead, referring to certain characteristics of a grammar.] One of the extensions is a syntax for tree transformations. The other extension is the generation of tree grammars from annotated parser or input tree grammars. These extensions can simplify the process of generating source-to-source language translators and they make possible an approach, called "polyphase parsing," to translation between computing languages. The typical approach to translator development is to identify high-level semantic constructs such as "expressions," "declarations," and "definitions" as fundamental building blocks in the grammar specification used for language recognition. The polyphase approach is to lump ambiguous syntactic constructs during parsing and then disambiguate the alternatives in subsequent tree transformation passes. Polyphase parsing is believed to be useful for generating efficient recognizers for C++ and other languages that, like C++, have significant ambiguities.
User's Manual for Total-Tree Multiproduct Cruise Program

Treesearch

Alexander Clark; Thomas M. Burgan; Richard C. Field; Peter E. Dress

1985-01-01

This interactive computer program uses standard tree-cruise data to estimate the weight and volume of the total tree, saw logs, plylogs, chipping logs, pulpwood, crown firewood, and logging residue in timber stands.Input is cumulative cruise data for tree counts by d.b.h. and height. Output is in tables: board-foot volume by d.b.h.; total-tree and tree-component...
Coalescence computations for large samples drawn from populations of time-varying sizes

PubMed Central

Polanski, Andrzej; Szczesna, Agnieszka; Garbulowski, Mateusz; Kimmel, Marek

2017-01-01

We present new results concerning probability distributions of times in the coalescence tree and expected allele frequencies for coalescent with large sample size. The obtained results are based on computational methodologies, which involve combining coalescence time scale changes with techniques of integral transformations and using analytical formulae for infinite products. We show applications of the proposed methodologies for computing probability distributions of times in the coalescence tree and their limits, for evaluation of accuracy of approximate expressions for times in the coalescence tree and expected allele frequencies, and for analysis of large human mitochondrial DNA dataset. PMID:28170404
Simple chained guide trees give high-quality protein multiple sequence alignments

PubMed Central

Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

2014-01-01

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495
The explicit computation of integration algorithms and first integrals for ordinary differential equations with polynomials coefficients using trees

NASA Technical Reports Server (NTRS)

Crouch, P. E.; Grossman, Robert

1992-01-01

This note is concerned with the explicit symbolic computation of expressions involving differential operators and their actions on functions. The derivation of specialized numerical algorithms, the explicit symbolic computation of integrals of motion, and the explicit computation of normal forms for nonlinear systems all require such computations. More precisely, if R = k(x(sub 1),...,x(sub N)), where k = R or C, F denotes a differential operator with coefficients from R, and g member of R, we describe data structures and algorithms for efficiently computing g. The basic idea is to impose a multiplicative structure on the vector space with basis the set of finite rooted trees and whose nodes are labeled with the coefficients of the differential operators. Cancellations of two trees with r + 1 nodes translates into cancellation of O(N(exp r)) expressions involving the coefficient functions and their derivatives.
Tree volume and biomass equations for the Lake States.

Treesearch

Jerold T. Hahn

1984-01-01

Presents species specific equations and methods for computing tree height, cubic foot, and board foot volume, and biomass for the Lake States (Michigan, Minnesota, and Wisconsin). Height equations compute either total or merchantable height to a variable top d.o.b. from d.b.h., site index, and basal area. Volumes and biomass are computed from d.b.h. and height.
The Reliability and Stability of an Inferred Phylogenetic Tree from Empirical Data.

PubMed

Katsura, Yukako; Stanley, Craig E; Kumar, Sudhir; Nei, Masatoshi

2017-03-01

The reliability of a phylogenetic tree obtained from empirical data is usually measured by the bootstrap probability (Pb) of interior branches of the tree. If the bootstrap probability is high for most branches, the tree is considered to be reliable. If some interior branches show relatively low bootstrap probabilities, we are not sure that the inferred tree is really reliable. Here, we propose another quantity measuring the reliability of the tree called the stability of a subtree. This quantity refers to the probability of obtaining a subtree (Ps) of an inferred tree obtained. We then show that if the tree is to be reliable, both Pb and Ps must be high. We also show that Ps is given by a bootstrap probability of the subtree with the closest outgroup sequence, and computer program RESTA for computing the Pb and Ps values will be presented. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model

PubMed Central

Xu, Bo; Yang, Ziheng

2016-01-01

The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902
Benefits and costs of street trees in Lisbon

Treesearch

A.L. Soares; F.C. Rego; E.G. McPherson; J.R. Simpson; P.J. Peper; Q. Xiao

2011-01-01

It is well known that urban trees produce various types of benefits and costs. The computer tool i-Tree STRATUM helps quantify tree structure and function, as well as the value of some of these tree services in different municipalities. This study describes one of the first applications of STRATUM outside the U.S. Lisbonâs street trees are dominated by Celtis australis...
Oxygen Distributions-Evaluation of Computational Methods, Using a Stochastic Model for Large Tumour Vasculature, to Elucidate the Importance of Considering a Complete Vascular Network.

PubMed

Lagerlöf, Jakob H; Bernhardt, Peter

2016-01-01

To develop a general model that utilises a stochastic method to generate a vessel tree based on experimental data, and an associated irregular, macroscopic tumour. These will be used to evaluate two different methods for computing oxygen distribution. A vessel tree structure, and an associated tumour of 127 cm3, were generated, using a stochastic method and Bresenham's line algorithm to develop trees on two different scales and fusing them together. The vessel dimensions were adjusted through convolution and thresholding and each vessel voxel was assigned an oxygen value. Diffusion and consumption were modelled using a Green's function approach together with Michaelis-Menten kinetics. The computations were performed using a combined tree method (CTM) and an individual tree method (ITM). Five tumour sub-sections were compared, to evaluate the methods. The oxygen distributions of the same tissue samples, using different methods of computation, were considerably less similar (root mean square deviation, RMSD≈0.02) than the distributions of different samples using CTM (0.001< RMSD<0.01). The deviations of ITM from CTM increase with lower oxygen values, resulting in ITM severely underestimating the level of hypoxia in the tumour. Kolmogorov Smirnov (KS) tests showed that millimetre-scale samples may not represent the whole. The stochastic model managed to capture the heterogeneous nature of hypoxic fractions and, even though the simplified computation did not considerably alter the oxygen distribution, it leads to an evident underestimation of tumour hypoxia, and thereby radioresistance. For a trustworthy computation of tumour oxygenation, the interaction between adjacent microvessel trees must not be neglected, why evaluation should be made using high resolution and the CTM, applied to the entire tumour.
Oxygen Distributions—Evaluation of Computational Methods, Using a Stochastic Model for Large Tumour Vasculature, to Elucidate the Importance of Considering a Complete Vascular Network

PubMed Central

Bernhardt, Peter

2016-01-01

Purpose To develop a general model that utilises a stochastic method to generate a vessel tree based on experimental data, and an associated irregular, macroscopic tumour. These will be used to evaluate two different methods for computing oxygen distribution. Methods A vessel tree structure, and an associated tumour of 127 cm3, were generated, using a stochastic method and Bresenham’s line algorithm to develop trees on two different scales and fusing them together. The vessel dimensions were adjusted through convolution and thresholding and each vessel voxel was assigned an oxygen value. Diffusion and consumption were modelled using a Green’s function approach together with Michaelis-Menten kinetics. The computations were performed using a combined tree method (CTM) and an individual tree method (ITM). Five tumour sub-sections were compared, to evaluate the methods. Results The oxygen distributions of the same tissue samples, using different methods of computation, were considerably less similar (root mean square deviation, RMSD≈0.02) than the distributions of different samples using CTM (0.001< RMSD<0.01). The deviations of ITM from CTM increase with lower oxygen values, resulting in ITM severely underestimating the level of hypoxia in the tumour. Kolmogorov Smirnov (KS) tests showed that millimetre-scale samples may not represent the whole. Conclusions The stochastic model managed to capture the heterogeneous nature of hypoxic fractions and, even though the simplified computation did not considerably alter the oxygen distribution, it leads to an evident underestimation of tumour hypoxia, and thereby radioresistance. For a trustworthy computation of tumour oxygenation, the interaction between adjacent microvessel trees must not be neglected, why evaluation should be made using high resolution and the CTM, applied to the entire tumour. PMID:27861529
Quasi-Optimal Elimination Trees for 2D Grids with Singularities

DOE PAGES

Paszyńska, A.; Paszyński, M.; Jopek, K.; ...

2015-01-01

We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log ⁡ N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less
Quasi-Optimal Elimination Trees for 2D Grids with Singularities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paszyńska, A.; Paszyński, M.; Jopek, K.

We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log ⁡ N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less
Calibrated birth-death phylogenetic time-tree priors for bayesian inference.

PubMed

Heled, Joseph; Drummond, Alexei J

2015-05-01

Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Decision tree and ensemble learning algorithms with their applications in bioinformatics.

PubMed

Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping

2011-01-01

Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
An automated approach to the design of decision tree classifiers

NASA Technical Reports Server (NTRS)

Argentiero, P.; Chin, R.; Beaudet, P.

1982-01-01

An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Phylogenetic tree and community structure from a Tangled Nature model.

PubMed

Canko, Osman; Taşkın, Ferhat; Argın, Kamil

2015-10-07

In evolutionary biology, the taxonomy and origination of species are widely studied subjects. An estimation of the evolutionary tree can be done via available DNA sequence data. The calculation of the tree is made by well-known and frequently used methods such as maximum likelihood and neighbor-joining. In order to examine the results of these methods, an evolutionary tree is pursued computationally by a mathematical model, called Tangled Nature. A relatively small genome space is investigated due to computational burden and it is found that the actual and predicted trees are in reasonably good agreement in terms of shape. Moreover, the speciation and the resulting community structure of the food-web are investigated by modularity. Copyright © 2015 Elsevier Ltd. All rights reserved.
Effects of competitor spacing in a new class of individual-tree indices of competition: semidistance-independent indices computed for Bitterlich versus fixed-area plots

Treesearch

Albert R. Stage; Thomas Ledermann

2008-01-01

We illustrate effects of competitor spacing for a new class of individual-tree indices of competition that we call semi-distance-independent. This new class is similar to the class of distance-independent indices except that the index is computed independently at each subsampling plot surrounding a subject tree for which growth is to be modelled. We derive the effects...
Efficiency of the neighbor-joining method in reconstructing deep and shallow evolutionary relationships in large phylogenies.

PubMed

Kumar, S; Gadagkar, S R

2000-12-01

The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages.
GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments

NASA Astrophysics Data System (ADS)

Chen, Zhanlong; Wu, Xin-cai; Wu, Liang

2008-12-01

Computation Grids enable the coordinated sharing of large-scale distributed heterogeneous computing resources that can be used to solve computationally intensive problems in science, engineering, and commerce. Grid spatial applications are made possible by high-speed networks and a new generation of Grid middleware that resides between networks and traditional GIS applications. The integration of the multi-sources and heterogeneous spatial information and the management of the distributed spatial resources and the sharing and cooperative of the spatial data and Grid services are the key problems to resolve in the development of the Grid GIS. The performance of the spatial index mechanism is the key technology of the Grid GIS and spatial database affects the holistic performance of the GIS in Grid Environments. In order to improve the efficiency of parallel processing of a spatial mass data under the distributed parallel computing grid environment, this paper presents a new grid slot hash parallel spatial index GSHR-Tree structure established in the parallel spatial indexing mechanism. Based on the hash table and dynamic spatial slot, this paper has improved the structure of the classical parallel R tree index. The GSHR-Tree index makes full use of the good qualities of R-Tree and hash data structure. This paper has constructed a new parallel spatial index that can meet the needs of parallel grid computing about the magnanimous spatial data in the distributed network. This arithmetic splits space in to multi-slots by multiplying and reverting and maps these slots to sites in distributed and parallel system. Each sites constructs the spatial objects in its spatial slot into an R tree. On the basis of this tree structure, the index data was distributed among multiple nodes in the grid networks by using large node R-tree method. The unbalance during process can be quickly adjusted by means of a dynamical adjusting algorithm. This tree structure has considered the distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the needs of the grid computing, GSHRTree has a flexible structure in order to satisfy new needs in the future. The GSHR-Tree provides the R-tree capabilities for large spatial datasets stored over interconnected servers. The analysis, including the experiments, confirmed the efficiency of our design choices. The scheme should fit the needs of new applications of spatial data, using endlessly larger datasets. Using the system response time of the parallel processing of spatial scope query algorithm as the performance evaluation factor, According to the result of the simulated the experiments, GSHR-Tree is performed to prove the reasonable design and the high performance of the indexing structure that the paper presented.

Complete and Draft Genome Sequences of Nine Lactobacillus sakei Strains Selected from the Three Known Phylogenetic Lineages and Their Main Clonal Complexes.

PubMed

Loux, Valentin; Coeuret, Gwendoline; Zagorec, Monique; Champomier Vergès, Marie-Christine; Chaillou, Stéphane

2018-04-19

We present here the complete and draft genome sequences of nine Lactobacillus sakei strains, selected from the entire range of clonal complexes from the three known lineages of the species. The strains were chosen to provide a wide view of pangenomic and plasmidic diversity for this important foodborne species. Copyright © 2018 Loux et al.
Comparative genomic and plasmid analysis of beer-spoiling and non-beer-spoiling Lactobacillus brevis isolates.

PubMed

Bergsveinson, Jordyn; Ziola, Barry

2017-12-01

Beer-spoilage-related lactic acid bacteria (BSR LAB) belong to multiple genera and species; however, beer-spoilage capacity is isolate-specific and partially acquired via horizontal gene transfer within the brewing environment. Thus, the extent to which genus-, species-, or environment- (i.e., brewery-) level genetic variability influences beer-spoilage phenotype is unknown. Publicly available Lactobacillus brevis genomes were analyzed via BlAst Diagnostic Gene findEr (BADGE) for BSR genes and assessed for pangenomic relationships. Also analyzed were functional coding capacities of plasmids of LAB inhabiting extreme niche environments. Considerable genetic variation was observed in L. brevis isolated from clinical samples, whereas 16 candidate genes distinguish BSR and non-BSR L. brevis genomes. These genes are related to nutrient scavenging of gluconate or pentoses, mannose, and metabolism of pectin. BSR L. brevis isolates also have higher average nucleotide identity and stronger pangenome association with one another, though isolation source (i.e., specific brewery) also appears to influence the plasmid coding capacity of BSR LAB. Finally, it is shown that niche-specific adaptation and phenotype are plasmid-encoded for both BSR and non-BSR LAB. The ultimate combination of plasmid-encoded genes dictates the ability of L. brevis to survive in the most extreme beer environment, namely, gassed (i.e., pressurized) beer.
An in silico pan-genomic probe for the molecular traits behind Lactobacillus ruminis gut autochthony.

PubMed

Kant, Ravi; Palva, Airi; von Ossowski, Ingemar

2017-01-01

As an ecological niche, the mammalian intestine provides the ideal habitat for a variety of bacterial microorganisms. Purportedly, some commensal genera and species offer a beneficial mix of metabolic, protective, and structural processes that help sustain the natural digestive health of the host. Among these sort of gut inhabitants is the Gram-positive lactic acid bacterium Lactobacillus ruminis, a strict anaerobe with both pili and flagella on its cell surface, but also known for being autochthonous (indigenous) to the intestinal environment. Given that the molecular basis of gut autochthony for this species is largely unexplored and unknown, we undertook a study at the genome level to pinpoint some of the adaptive traits behind its colonization behavior. In our pan-genomic probe of L. ruminis, the genomes of nine different strains isolated from human, bovine, porcine, and equine host guts were compiled and compared for in silico analysis. For this, we conducted a geno-phenotypic assessment of protein-coding genes, with an emphasis on those products involved with cell-surface morphology and anaerobic fermentation and respiration. We also categorized and examined the core and accessory genes that define the L. ruminis species and its strains. Here, we made an attempt to identify those genes having ecologically relevant phenotypes that might support or bring about intestinal indigenousness.
Analysis of the core genome and pangenome of Pseudomonas putida.

PubMed

Udaondo, Zulema; Molina, Lázaro; Segura, Ana; Duque, Estrella; Ramos, Juan L

2016-10-01

Pseudomonas putida are strict aerobes that proliferate in a range of temperate niches and are of interest for environmental applications due to their capacity to degrade pollutants and ability to promote plant growth. Furthermore solvent-tolerant strains are useful for biosynthesis of added-value chemicals. We present a comprehensive comparative analysis of nine strains and the first characterization of the Pseudomonas putida pangenome. The core genome of P. putida comprises approximately 3386 genes. The most abundant genes within the core genome are those that encode nutrient transporters. Other conserved genes include those for central carbon metabolism through the Entner-Doudoroff pathway, the pentose phosphate cycle, arginine and proline metabolism, and pathways for degradation of aromatic chemicals. Genes that encode transporters, enzymes and regulators for amino acid metabolism (synthesis and degradation) are all part of the core genome, as well as various electron transporters, which enable aerobic metabolism under different oxygen regimes. Within the core genome are 30 genes for flagella biosynthesis and 12 key genes for biofilm formation. Pseudomonas putida strains share 85% of the coding regions with Pseudomonas aeruginosa; however, in P. putida, virulence factors such as exotoxins and type III secretion systems are absent. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.
Physarum machines: encapsulating reaction-diffusion to compute spanning tree

NASA Astrophysics Data System (ADS)

Adamatzky, Andrew

2007-12-01

The Physarum machine is a biological computing device, which employs plasmodium of Physarum polycephalum as an unconventional computing substrate. A reaction-diffusion computer is a chemical computing device that computes by propagating diffusive or excitation wave fronts. Reaction-diffusion computers, despite being computationally universal machines, are unable to construct certain classes of proximity graphs without the assistance of an external computing device. I demonstrate that the problem can be solved if the reaction-diffusion system is enclosed in a membrane with few ‘growth points’, sites guiding the pattern propagation. Experimental approximation of spanning trees by P. polycephalum slime mold demonstrates the feasibility of the approach. Findings provided advance theory of reaction-diffusion computation by enriching it with ideas of slime mold computation.
A tree-parenchyma coupled model for lung ventilation simulation.

PubMed

Pozin, Nicolas; Montesantos, Spyridon; Katz, Ira; Pichelin, Marine; Vignon-Clementel, Irene; Grandmont, Céline

2017-11-01

In this article, we develop a lung ventilation model. The parenchyma is described as an elastic homogenized media. It is irrigated by a space-filling dyadic resistive pipe network, which represents the tracheobronchial tree. In this model, the tree and the parenchyma are strongly coupled. The tree induces an extra viscous term in the system constitutive relation, which leads, in the finite element framework, to a full matrix. We consider an efficient algorithm that takes advantage of the tree structure to enable a fast matrix-vector product computation. This framework can be used to model both free and mechanically induced respiration, in health and disease. Patient-specific lung geometries acquired from computed tomography scans are considered. Realistic Dirichlet boundary conditions can be deduced from surface registration on computed tomography images. The model is compared to a more classical exit compartment approach. Results illustrate the coupling between the tree and the parenchyma, at global and regional levels, and how conditions for the purely 0D model can be inferred. Different types of boundary conditions are tested, including a nonlinear Robin model of the surrounding lung structures. Copyright © 2017 John Wiley & Sons, Ltd.
Infinite tension limit of the pure spinor superstring

NASA Astrophysics Data System (ADS)

Berkovits, Nathan

2014-03-01

Mason and Skinner recently constructed a chiral infinite tension limit of the Ramond-Neveu-Schwarz superstring which was shown to compute the Cachazo-He-Yuan formulae for tree-level d = 10 Yang-Mills amplitudes and the NS-NS sector of tree-level d = 10 supergravity amplitudes. In this letter, their chiral infinite tension limit is generalized to the pure spinor superstring which computes a d = 10 superspace version of the Cachazo-He-Yuan formulae for tree-level d = 10 super-Yang-Mills and supergravity amplitudes.
A method of estimating cubic volume in felled trees

Treesearch

Frederick E. Hampf; C. Allen Bickford

1959-01-01

This paper describes a new method of computing the gross volume of felled trees or sections of trees. The method has many applications, but most are limited to forest research and management, especially in making surveys.
On the Number of Non-equivalent Ancestral Configurations for Matching Gene Trees and Species Trees.

PubMed

Disanto, Filippo; Rosenberg, Noah A

2017-09-14

An ancestral configuration is one of the combinatorially distinct sets of gene lineages that, for a given gene tree, can reach a given node of a specified species tree. Ancestral configurations have appeared in recursive algebraic computations of the conditional probability that a gene tree topology is produced under the multispecies coalescent model for a given species tree. For matching gene trees and species trees, we study the number of ancestral configurations, considered up to an equivalence relation introduced by Wu (Evolution 66:763-775, 2012) to reduce the complexity of the recursive probability computation. We examine the largest number of non-equivalent ancestral configurations possible for a given tree size n. Whereas the smallest number of non-equivalent ancestral configurations increases polynomially with n, we show that the largest number increases with [Formula: see text], where k is a constant that satisfies [Formula: see text]. Under a uniform distribution on the set of binary labeled trees with a given size n, the mean number of non-equivalent ancestral configurations grows exponentially with n. The results refine an earlier analysis of the number of ancestral configurations considered without applying the equivalence relation, showing that use of the equivalence relation does not alter the exponential nature of the increase with tree size.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Mathematical form models of tree trunks

Treesearch

Rudolfs Ozolins

2000-01-01

Assortment structure analysis of tree trunks is a characteristic and proper problem that can be solved by using mathematical modeling and standard computer programs. Mathematical form model of tree trunks consists of tapering curve equations and their parameters. Parameters for nine species were obtained by processing measurements of 2,794 model trees and studying the...
Exact solutions for species tree inference from discordant gene trees.

PubMed

Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

2013-10-01

Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.
Development of a model of the coronary arterial tree for the 4D XCAT phantom

NASA Astrophysics Data System (ADS)

Fung, George S. K.; Segars, W. Paul; Gullberg, Grant T.; Tsui, Benjamin M. W.

2011-09-01

A detailed three-dimensional (3D) model of the coronary artery tree with cardiac motion has great potential for applications in a wide variety of medical imaging research areas. In this work, we first developed a computer-generated 3D model of the coronary arterial tree for the heart in the extended cardiac-torso (XCAT) phantom, thereby creating a realistic computer model of the human anatomy. The coronary arterial tree model was based on two datasets: (1) a gated cardiac dual-source computed tomography (CT) angiographic dataset obtained from a normal human subject and (2) statistical morphometric data of porcine hearts. The initial proximal segments of the vasculature and the anatomical details of the boundaries of the ventricles were defined by segmenting the CT data. An iterative rule-based generation method was developed and applied to extend the coronary arterial tree beyond the initial proximal segments. The algorithm was governed by three factors: (1) statistical morphometric measurements of the connectivity, lengths and diameters of the arterial segments; (2) avoidance forces from other vessel segments and the boundaries of the myocardium, and (3) optimality principles which minimize the drag force at the bifurcations of the generated tree. Using this algorithm, the 3D computational model of the largest six orders of the coronary arterial tree was generated, which spread across the myocardium of the left and right ventricles. The 3D coronary arterial tree model was then extended to 4D to simulate different cardiac phases by deforming the original 3D model according to the motion vector map of the 4D cardiac model of the XCAT phantom at the corresponding phases. As a result, a detailed and realistic 4D model of the coronary arterial tree was developed for the XCAT phantom by imposing constraints of anatomical and physiological characteristics of the coronary vasculature. This new 4D coronary artery tree model provides a unique simulation tool that can be used in the development and evaluation of instrumentation and methods for imaging normal and pathological hearts with myocardial perfusion defects.
Using Histories to Implement Atomic Objects

NASA Technical Reports Server (NTRS)

Ng, Pui

1987-01-01

In this paper we describe an approach of implementing atomicity. Atomicity requires that computations appear to be all-or-nothing and executed in a serialization order. The approach we describe has three characteristics. First, it utilizes the semantics of an application to improve concurrency. Second, it reduces the complexity of application-dependent synchronization code by analyzing the process of writing it. In fact, the process can be automated with logic programming. Third, our approach hides the protocol used to arrive at a serialization order from the applications. As a result, different protocols can be used without affecting the applications. Our approach uses a history tree abstraction. The history tree captures the ordering relationship among concurrent computations. By determining what types of computations exist in the history tree and their parameters, a computation can determine whether it can proceed.
Decision-Tree Program

NASA Technical Reports Server (NTRS)

Buntine, Wray

1994-01-01

IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.
Million Trees Los Angeles: Carbon dioxide sink or source?

Treesearch

E.G. McPherson; A. Kendall; S. Albers

2015-01-01

This study seeks to answer the question, 'Will the Million Trees LA (MTLA) programme be a CO2 sink or source?' Using surveys, interviews, field sampling and computer simulation of tree growth and survival over a 40-year period, we developed the first process-based life cycle inventory of CO2 for a large tree...
Machine Learning Through Signature Trees. Applications to Human Speech.

ERIC Educational Resources Information Center

White, George M.

A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…
Differences in Computed Individual-Tree Volumes Caused by Differences in Field Measurements

Treesearch

James A. Westfall

2008-01-01

Individual-tree volumes are primarily predicted using volume equations that rely on measured tree attributes. In the northeastern United States, the Forest Inventory and Analysis program determines tree volume using dbh, bole height, proportion of cull, and species information. These measurements are subject to variability due to a host of factors. The sensitivity of...
SETs: stand evaluation tools: II. tree value conversion standards for hardwood sawtimber

Treesearch

Joseph J. Mendel; Paul S. DeBald; Martin E. Dale

1976-01-01

Tree quatity index tables are presented for 12 important hardwood species of the oak-hickory forest. From these, tree value conversion standards are developed for each species, log grade, merchantable height, and diameter at breast height. The method of calculating tree value conversion standards and adapting them to different conditions is explained. A computer...
On Tree-Based Phylogenetic Networks.

PubMed

Zhang, Louxin

2016-07-01

A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.

Walking tree heuristics for biological string alignment, gene location, and phylogenies

NASA Astrophysics Data System (ADS)

Cull, P.; Holloway, J. L.; Cavener, J. D.

1999-03-01

Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model's assumptions. We propose a family of heuristics called walking trees to align biologically reasonable strings. Both edit-distance and walking tree methods can locate specific genes within a large string when the genes' sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree "discovered" a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be efficiently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network.
Fault-Tree Compiler Program

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Martensen, Anna L.

1992-01-01

FTC, Fault-Tree Compiler program, is reliability-analysis software tool used to calculate probability of top event of fault tree. Five different types of gates allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language of FTC easy to understand and use. Program supports hierarchical fault-tree-definition feature simplifying process of description of tree and reduces execution time. Solution technique implemented in FORTRAN, and user interface in Pascal. Written to run on DEC VAX computer operating under VMS operating system.
Cloud computing method for dynamically scaling a process across physical machine boundaries

DOEpatents

Gillen, Robert E.; Patton, Robert M.; Potok, Thomas E.; Rojas, Carlos C.

2014-09-02

A cloud computing platform includes first device having a graph or tree structure with a node which receives data. The data is processed by the node or communicated to a child node for processing. A first node in the graph or tree structure determines the reconfiguration of a portion of the graph or tree structure on a second device. The reconfiguration may include moving a second node and some or all of its descendant nodes. The second and descendant nodes may be copied to the second device.
Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

NASA Technical Reports Server (NTRS)

Mengshoel, Ole J.

2010-01-01

One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms.
The growth of the UniTree mass storage system at the NASA Center for Computational Sciences

NASA Technical Reports Server (NTRS)

Tarshish, Adina; Salmon, Ellen

1993-01-01

In October 1992, the NASA Center for Computational Sciences made its Convex-based UniTree system generally available to users. The ensuing months saw the growth of near-online data from nil to nearly three terabytes, a doubling of the number of CPU's on the facility's Cray YMP (the primary data source for UniTree), and the necessity for an aggressive regimen for repacking sparse tapes and hierarchical 'vaulting' of old files to freestanding tape. Connectivity was enhanced as well with the addition of UltraNet HiPPI. This paper describes the increasing demands placed on the storage system's performance and throughput that resulted from the significant augmentation of compute-server processor power and network speed.
Finding Frequent Closed Itemsets in Sliding Window in Linear Time

NASA Astrophysics Data System (ADS)

Chen, Junbo; Zhou, Bo; Chen, Lu; Wang, Xinyu; Ding, Yiqun

One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+[6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.
Enumerating Substituted Benzene Isomers of Tree-Like Chemical Graphs.

PubMed

Li, Jinghui; Nagamochi, Hiroshi; Akutsu, Tatsuya

2018-01-01

Enumeration of chemical structures is useful for drug design, which is one of the main targets of computational biology and bioinformatics. A chemical graph with no other cycles than benzene rings is called tree-like, and becomes a tree possibly with multiple edges if we contract each benzene ring into a single virtual atom of valence 6. All tree-like chemical graphs with a given tree representation are called the substituted benzene isomers of . When we replace each virtual atom in with a benzene ring to obtain a substituted benzene isomer, distinct isomers of are caused by the difference in arrangements of atom groups around a benzene ring. In this paper, we propose an efficient algorithm that enumerates all substituted benzene isomers of a given tree representation . Our algorithm first counts the number of all the isomers of the tree representation by a dynamic programming method. To enumerate all the isomers, for each , our algorithm then generates the th isomer by backtracking the counting phase of the dynamic programming. We also implemented our algorithm for computational experiments.
Tree-structured information file and its subprogram subtree

NASA Technical Reports Server (NTRS)

Mesztenyi, C. K.

1970-01-01

Development documentation programs are considered. A document tree is defined as the syntactic representation of a document when it is divided into subdivisions such as chapters and sections; a developmental tree is also defined as a tree of information obtained during the course of the development of the computer program. A developmental subtree is emphasized and described. A printed subprogram is also included.
Tree and forest effects on air quality and human health in the United States

Treesearch

David J. Nowak; Satoshi Hirabayashi; Allison Bodine; Eric Greenfield

2014-01-01

Trees remove air pollution by the interception of particulate matter on plant surfaces and the absorption of gaseous pollutants through the leaf stomata. However, the magnitude and value of the effects of trees and forests on air quality and human health across the United States remains unknown. Computer simulations with local environmental data reveal that trees and...
A System to Derive Optimal Tree Diameter Increment Models from the Eastwide Forest Inventory Data Base (EFIDB)

Treesearch

Don C. Bragg

2002-01-01

This article is an introduction to the computer software used by the Potential Relative Increment (PRI) approach to optimal tree diameter growth modeling. These DOS programs extract qualified tree and plot data from the Eastwide Forest Inventory Data Base (EFIDB), calculate relative tree increment, sort for the highest relative increments by diameter class, and...
Computer simulations of melts of randomly branching polymers

NASA Astrophysics Data System (ADS)

Rosa, Angelo; Everaers, Ralf

2016-10-01

Randomly branching polymers with annealed connectivity are model systems for ring polymers and chromosomes. In this context, the branched structure represents transient folding induced by topological constraints. Here we present computer simulations of melts of annealed randomly branching polymers of 3 ≤ N ≤ 1800 segments in d = 2 and d = 3 dimensions. In all cases, we perform a detailed analysis of the observed tree connectivities and spatial conformations. Our results are in excellent agreement with an asymptotic scaling of the average tree size of R ˜ N1/d, suggesting that the trees behave as compact, territorial fractals. The observed swelling relative to the size of ideal trees, R ˜ N1/4, demonstrates that excluded volume interactions are only partially screened in melts of annealed trees. Overall, our results are in good qualitative agreement with the predictions of Flory theory. In particular, we find that the trees swell by the combination of modified branching and path stretching. However, the former effect is subdominant and difficult to detect in d = 3 dimensions.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Boolean logic tree of graphene-based chemical system for molecular computation and intelligent molecular search query.

PubMed

Huang, Wei Tao; Luo, Hong Qun; Li, Nian Bing

2014-05-06

The most serious, and yet unsolved, problem of constructing molecular computing devices consists in connecting all of these molecular events into a usable device. This report demonstrates the use of Boolean logic tree for analyzing the chemical event network based on graphene, organic dye, thrombin aptamer, and Fenton reaction, organizing and connecting these basic chemical events. And this chemical event network can be utilized to implement fluorescent combinatorial logic (including basic logic gates and complex integrated logic circuits) and fuzzy logic computing. On the basis of the Boolean logic tree analysis and logic computing, these basic chemical events can be considered as programmable "words" and chemical interactions as "syntax" logic rules to construct molecular search engine for performing intelligent molecular search query. Our approach is helpful in developing the advanced logic program based on molecules for application in biosensing, nanotechnology, and drug delivery.
Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments.

PubMed

Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

2015-09-18

A content-matched (CM) rangemonitoring query overmoving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CMrange monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods.
An in silico pan-genomic probe for the molecular traits behind Lactobacillus ruminis gut autochthony

PubMed Central

Kant, Ravi; Palva, Airi

2017-01-01

As an ecological niche, the mammalian intestine provides the ideal habitat for a variety of bacterial microorganisms. Purportedly, some commensal genera and species offer a beneficial mix of metabolic, protective, and structural processes that help sustain the natural digestive health of the host. Among these sort of gut inhabitants is the Gram-positive lactic acid bacterium Lactobacillus ruminis, a strict anaerobe with both pili and flagella on its cell surface, but also known for being autochthonous (indigenous) to the intestinal environment. Given that the molecular basis of gut autochthony for this species is largely unexplored and unknown, we undertook a study at the genome level to pinpoint some of the adaptive traits behind its colonization behavior. In our pan-genomic probe of L. ruminis, the genomes of nine different strains isolated from human, bovine, porcine, and equine host guts were compiled and compared for in silico analysis. For this, we conducted a geno-phenotypic assessment of protein-coding genes, with an emphasis on those products involved with cell-surface morphology and anaerobic fermentation and respiration. We also categorized and examined the core and accessory genes that define the L. ruminis species and its strains. Here, we made an attempt to identify those genes having ecologically relevant phenotypes that might support or bring about intestinal indigenousness. PMID:28414739
“Pathotyping” Multiplex PCR Assay for Haemophilus parasuis: a Tool for Prediction of Virulence

PubMed Central

Weinert, Lucy A.; Peters, Sarah E.; Wang, Jinhong; Hernandez-Garcia, Juan; Chaudhuri, Roy R.; Luan, Shi-Lu; Angen, Øystein; Aragon, Virginia; Williamson, Susanna M.; Rycroft, Andrew N.; Wren, Brendan W.; Maskell, Duncan J.; Tucker, Alexander W.

2017-01-01

ABSTRACT Haemophilus parasuis is a diverse bacterial species that is found in the upper respiratory tracts of pigs and can also cause Glässer's disease and pneumonia. A previous pangenome study of H. parasuis identified 48 genes that were associated with clinical disease. Here, we describe the development of a generalized linear model (termed a pathotyping model) to predict the potential virulence of isolates of H. parasuis based on a subset of 10 genes from the pangenome. A multiplex PCR (mPCR) was constructed based on these genes, the results of which were entered into the pathotyping model to yield a prediction of virulence. This new diagnostic mPCR was tested on 143 field isolates of H. parasuis that had previously been whole-genome sequenced and a further 84 isolates from the United Kingdom from cases of H. parasuis-related disease in pigs collected between 2013 and 2014. The combination of the mPCR and the pathotyping model predicted the virulence of an isolate with 78% accuracy for the original isolate collection and 90% for the additional isolate collection, providing an overall accuracy of 83% (81% sensitivity and 93% specificity) compared with that of the “current standard” of detailed clinical metadata. This new pathotyping assay has the potential to aid surveillance and disease control in addition to serotyping data. PMID:28615466
Global interrupt and barrier networks

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.

2008-10-28

A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
Efficient computation of hashes

NASA Astrophysics Data System (ADS)

Lopes, Raul H. C.; Franqueira, Virginia N. L.; Hobson, Peter R.

2014-06-01

The sequential computation of hashes at the core of many distributed storage systems and found, for example, in grid services can hinder efficiency in service quality and even pose security challenges that can only be addressed by the use of parallel hash tree modes. The main contributions of this paper are, first, the identification of several efficiency and security challenges posed by the use of sequential hash computation based on the Merkle-Damgard engine. In addition, alternatives for the parallel computation of hash trees are discussed, and a prototype for a new parallel implementation of the Keccak function, the SHA-3 winner, is introduced.
Using Educational Games and Simulation Software in a Computer Science Course: Learning Achievements and Student Flow Experiences

ERIC Educational Resources Information Center

Liu, Tsung-Yu

2016-01-01

This study investigates how educational games impact on students' academic performance and multimedia flow experiences in a computer science course. A curriculum consists of five basic learning units, that is, the stack, queue, sort, tree traversal, and binary search tree, was conducted for 110 university students during one semester. Two groups…
An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

PubMed Central

2016-01-01

Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621

Object-Oriented Algorithm For Evaluation Of Fault Trees

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Koen, B. V.

1992-01-01

Algorithm for direct evaluation of fault trees incorporates techniques of object-oriented programming. Reduces number of calls needed to solve trees with repeated events. Provides significantly improved software environment for such computations as quantitative analyses of safety and reliability of complicated systems of equipment (e.g., spacecraft or factories).
Tree growth visualization

Treesearch

L. Linsen; B.J. Karis; E.G. McPherson; B. Hamann

2005-01-01

In computer graphics, models describing the fractal branching structure of trees typically exploit the modularity of tree structures. The models are based on local production rules, which are applied iteratively and simultaneously to create a complex branching system. The objective is to generate three-dimensional scenes of often many realistic- looking and non-...
Multi-level Hierarchical Poly Tree computer architectures

NASA Technical Reports Server (NTRS)

Padovan, Joe; Gute, Doug

1990-01-01

Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Applications and Benefits for Big Data Sets Using Tree Distances and The T-SNE Algorithm

DTIC Science & Technology

2016-03-01

BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE ALGORITHM by Suyoung Lee March 2016 Thesis Advisor: Samuel E. Buttrey...REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE APPLICATIONS AND BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE...this work we use tree distance computed using Buttrey’s treeClust package in R, as discussed by Buttrey and Whitaker in 2015, to process mixed data
Traveling front solutions to directed diffusion-limited aggregation, digital search trees, and the Lempel-Ziv data compression algorithm.

PubMed

Majumdar, Satya N

2003-08-01

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
Traveling front solutions to directed diffusion-limited aggregation, digital search trees, and the Lempel-Ziv data compression algorithm

NASA Astrophysics Data System (ADS)

Majumdar, Satya N.

2003-08-01

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
Evaluation of Content-Matched Range Monitoring Queries over Moving Objects in Mobile Computing Environments

PubMed Central

Jung, HaRim; Song, MoonBae; Youn, Hee Yong; Kim, Ung Mo

2015-01-01

A content-matched (CM) range monitoring query over moving objects continually retrieves the moving objects (i) whose non-spatial attribute values are matched to given non-spatial query values; and (ii) that are currently located within a given spatial query range. In this paper, we propose a new query indexing structure, called the group-aware query region tree (GQR-tree) for efficient evaluation of CM range monitoring queries. The primary role of the GQR-tree is to help the server leverage the computational capabilities of moving objects in order to improve the system performance in terms of the wireless communication cost and server workload. Through a series of comprehensive simulations, we verify the superiority of the GQR-tree method over the existing methods. PMID:26393613
PROGRAM HTVOL: The Determination of Tree Crown Volume by Layers

Treesearch

Joseph C. Mawson; Jack Ward Thomas; Richard M. DeGraaf

1976-01-01

A FORTRAN IV computer program calculates, from a few field measurements, the volume of tree crowns. This volume is in layers of a specified thickness of trees or large shrubs. Each tree is assigned one of 15 solid forms, formed by using one of five side shapes (a circle, an ellipse, a neiloid, a triangle, or a parabolalike shape), and one of three bottom shapes (a...
Grow--a computer subroutine that projects the growth of trees in the Lake States' forests.

Treesearch

Gary J. Brand

1981-01-01

A computer subroutine, Grow, has been written in 1977 Standard FORTRAN to implement a distance-independent, individual tree growth model for Lake States' forests. Grow is a small and easy-to-use version of the growth model. All the user has to do is write a calling program to read initial conditions, call Grow, and summarize the results.
Assessing College Student Interest in Math and/or Computer Science in a Cross-National Sample Using Classification and Regression Trees

ERIC Educational Resources Information Center

Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas

2012-01-01

The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Measuring Financial Gains from Genetically Superior Trees

Treesearch

George Dutrow; Clark Row

1976-01-01

Planting genetically superior loblolly pines will probably yield high profits.Forest economists have made computer simulations that predict financial gains expected from a tree improvement program under actual field conditions.
Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

NASA Technical Reports Server (NTRS)

Otterstatter, Matthew R.

2005-01-01

The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.
Gap-free segmentation of vascular networks with automatic image processing pipeline.

PubMed

Hsu, Chih-Yang; Ghaffari, Mahsa; Alaraj, Ali; Flannery, Michael; Zhou, Xiaohong Joe; Linninger, Andreas

2017-03-01

Current image processing techniques capture large vessels reliably but often fail to preserve connectivity in bifurcations and small vessels. Imaging artifacts and noise can create gaps and discontinuity of intensity that hinders segmentation of vascular trees. However, topological analysis of vascular trees require proper connectivity without gaps, loops or dangling segments. Proper tree connectivity is also important for high quality rendering of surface meshes for scientific visualization or 3D printing. We present a fully automated vessel enhancement pipeline with automated parameter settings for vessel enhancement of tree-like structures from customary imaging sources, including 3D rotational angiography, magnetic resonance angiography, magnetic resonance venography, and computed tomography angiography. The output of the filter pipeline is a vessel-enhanced image which is ideal for generating anatomical consistent network representations of the cerebral angioarchitecture for further topological or statistical analysis. The filter pipeline combined with computational modeling can potentially improve computer-aided diagnosis of cerebrovascular diseases by delivering biometrics and anatomy of the vasculature. It may serve as the first step in fully automatic epidemiological analysis of large clinical datasets. The automatic analysis would enable rigorous statistical comparison of biometrics in subject-specific vascular trees. The robust and accurate image segmentation using a validated filter pipeline would also eliminate operator dependency that has been observed in manual segmentation. Moreover, manual segmentation is time prohibitive given that vascular trees have more than thousands of segments and bifurcations so that interactive segmentation consumes excessive human resources. Subject-specific trees are a first step toward patient-specific hemodynamic simulations for assessing treatment outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Estimating and validating ground-based timber harvesting production through computer simulation

Treesearch

Jingxin Wang; Chris B. LeDoux

2003-01-01

Estimating ground-based timber harvesting systems production with an object oriented methodology was investigated. The estimation model developed generates stands of trees, simulates chain saw, drive-to-tree feller-buncher, swing-to-tree single-grip harvester felling, and grapple skidder and forwarder extraction activities, and analyzes costs and productivity. It also...
Key algorithms used in GR02: A computer simulation model for predicting tree and stand growth

Treesearch

Garrett A. Hughes; Paul E. Sendak; Paul E. Sendak

1985-01-01

GR02 is an individual tree, distance-independent simulation model for predicting tree and stand growth over time. It performs five major functions during each run: (1) updates diameter at breast height, (2) updates total height, (3) estimates mortality, (4) determines regeneration, and (5) updates crown class.
Fels-Rand: an Xlisp-Stat program for the comparative analysis of data under phylogenetic uncertainty.

PubMed

Blomberg, S

2000-11-01

Currently available programs for the comparative analysis of phylogenetic data do not perform optimally when the phylogeny is not completely specified (i.e. the phylogeny contains polytomies). Recent literature suggests that a better way to analyse the data would be to create random trees from the known phylogeny that are fully-resolved but consistent with the known tree. A computer program is presented, Fels-Rand, that performs such analyses. A randomisation procedure is used to generate trees that are fully resolved but whose structure is consistent with the original tree. Statistics are then calculated on a large number of these randomly-generated trees. Fels-Rand uses the object-oriented features of Xlisp-Stat to manipulate internal tree representations. Xlisp-Stat's dynamic graphing features are used to provide heuristic tools to aid in analysis, particularly outlier analysis. The usefulness of Xlisp-Stat as a system for phylogenetic computation is discussed. Available from the author or at http://www.uq.edu.au/~ansblomb/Fels-Rand.sit.hqx. Xlisp-Stat is available from http://stat.umn.edu/~luke/xls/xlsinfo/xlsinfo.html. s.blomberg@abdn.ac.uk
Reconstructing evolutionary trees in parallel for massive sequences.

PubMed

Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

2017-12-14

Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling.

PubMed

Hird, Sarah; Kubatko, Laura; Carstens, Bryan

2010-11-01

We describe a method for estimating species trees that relies on replicated subsampling of large data matrices. One application of this method is phylogeographic research, which has long depended on large datasets that sample intensively from the geographic range of the focal species; these datasets allow systematicists to identify cryptic diversity and understand how contemporary and historical landscape forces influence genetic diversity. However, analyzing any large dataset can be computationally difficult, particularly when newly developed methods for species tree estimation are used. Here we explore the use of replicated subsampling, a potential solution to the problem posed by large datasets, with both a simulation study and an empirical analysis. In the simulations, we sample different numbers of alleles and loci, estimate species trees using STEM, and compare the estimated to the actual species tree. Our results indicate that subsampling three alleles per species for eight loci nearly always results in an accurate species tree topology, even in cases where the species tree was characterized by extremely rapid divergence. Even more modest subsampling effort, for example one allele per species and two loci, was more likely than not (>50%) to identify the correct species tree topology, indicating that in nearly all cases, computing the majority-rule consensus tree from replicated subsampling provides a good estimate of topology. These results were supported by estimating the correct species tree topology and reasonable branch lengths for an empirical 10-locus great ape dataset. Copyright © 2010 Elsevier Inc. All rights reserved.
Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model

DOE Office of Scientific and Technical Information (OSTI.GOV)

McClanahan, Richard; De Leon, Phillip L.

The majority of state-of-the-art speaker recognition systems (SR) utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of the identified systems, the posterior probabilities and sufficient statistics calculations represent a computational bottleneck in both enrollment and testing. We propose a multi-layered hash system, employing a tree-structured GMM–UBM which uses Runnalls’ Gaussian mixture reduction technique, in order to reduce the number of these calculations. Moreover, with this tree-structured hash, wemore » can trade-off reduction in computation with a corresponding degradation of equal error rate (EER). As an example, we also reduce this computation by a factor of 15× while incurring less than 10% relative degradation of EER (or 0.3% absolute EER) when evaluated with NIST 2010 speaker recognition evaluation (SRE) telephone data.« less
Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model

DOE PAGES

McClanahan, Richard; De Leon, Phillip L.

2014-08-20

The majority of state-of-the-art speaker recognition systems (SR) utilize speaker models that are derived from an adapted universal background model (UBM) in the form of a Gaussian mixture model (GMM). This is true for GMM supervector systems, joint factor analysis systems, and most recently i-vector systems. In all of the identified systems, the posterior probabilities and sufficient statistics calculations represent a computational bottleneck in both enrollment and testing. We propose a multi-layered hash system, employing a tree-structured GMM–UBM which uses Runnalls’ Gaussian mixture reduction technique, in order to reduce the number of these calculations. Moreover, with this tree-structured hash, wemore » can trade-off reduction in computation with a corresponding degradation of equal error rate (EER). As an example, we also reduce this computation by a factor of 15× while incurring less than 10% relative degradation of EER (or 0.3% absolute EER) when evaluated with NIST 2010 speaker recognition evaluation (SRE) telephone data.« less

iGLASS: An Improvement to the GLASS Method for Estimating Species Trees from Gene Trees

PubMed Central

Rosenberg, Noah A.

2012-01-01

Abstract Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree. PMID:22216756
Estimating the weight of Douglas-fir tree boles and logs with an iterative computer model.

Treesearch

Dale R. Waddell; Dale L Weyermann; Michael B. Lambert

1987-01-01

A computer model that estimates the green weights of standing trees was developed and validated for old-growth Douglas-fir. The model calculates the green weight for the entire bole, for the bole to any merchantable top, and for any log length within the bole. The model was validated by estimating the bias and accuracy of an independent subsample selected from the...
Method and system for dynamic probabilistic risk assessment

NASA Technical Reports Server (NTRS)

Dugan, Joanne Bechta (Inventor); Xu, Hong (Inventor)

2013-01-01

The DEFT methodology, system and computer readable medium extends the applicability of the PRA (Probabilistic Risk Assessment) methodology to computer-based systems, by allowing DFT (Dynamic Fault Tree) nodes as pivot nodes in the Event Tree (ET) model. DEFT includes a mathematical model and solution algorithm, supports all common PRA analysis functions and cutsets. Additional capabilities enabled by the DFT include modularization, phased mission analysis, sequence dependencies, and imperfect coverage.
The symbolic computation of series solutions to ordinary differential equations using trees (extended abstract)

NASA Technical Reports Server (NTRS)

Grossman, Robert

1991-01-01

Algorithms previously developed by the author give formulas which can be used for the efficient symbolic computation of series expansions to solutions of nonlinear systems of ordinary differential equations. As a by product of this analysis, formulas are derived which relate to trees to the coefficients of the series expansions, similar to the work of Leroux and Viennot, and Lamnabhi, Leroux and Viennot.
Mirroring co-evolving trees in the light of their topologies.

PubMed

Hajirasouliha, Iman; Schönhuth, Alexander; de Juan, David; Valencia, Alfonso; Sahinalp, S Cenk

2012-05-01

Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm
Metabolite identification through multiple kernel learning on fragmentation trees.

PubMed

Shen, Huibin; Dührkop, Kai; Böcker, Sebastian; Rousu, Juho

2014-06-15

Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites ranked at the top position of the candidates list. © The Author 2014. Published by Oxford University Press.
Fast correspondences search in anatomical trees

NASA Astrophysics Data System (ADS)

dos Santos, Thiago R.; Gergel, Ingmar; Meinzer, Hans-Peter; Maier-Hein, Lena

2010-03-01

Registration of multiple medical images commonly comprises the steps feature extraction, correspondences search and transformation computation. In this paper, we present a new method for a fast and pose independent search of correspondences using as features anatomical trees such as the bronchial system in the lungs or the vessel system in the liver. Our approach scores the similarities between the trees' nodes (bifurcations) taking into account both, topological properties extracted from their graph representations and anatomical properties extracted from the trees themselves. The node assignment maximizes the global similarity (sum of the scores of each pair of assigned nodes), assuring that the matches are distributed throughout the trees. Furthermore, the proposed method is able to deal with distortions in the data, such as noise, motion, artifacts, and problems associated with the extraction method, such as missing or false branches. According to an evaluation on swine lung data sets, the method requires less than one second on average to compute the matching and yields a high rate of correct matches compared to state of the art work.
Multispectral sensing of citrus young tree decline

NASA Technical Reports Server (NTRS)

Edwards, G. J.; Ducharme, E. P.; Schehl, T.

1975-01-01

Computer processing of MSS data to identify and map citrus trees affected by young tree decline is analyzed. The data were obtained at 1500-feet altitude in six discrete spectral bands covering regions from 0.53 to 1.3 millimicrons as well as from instrumental ground truths of tree crowns. Measurable spectral reflectance intensity differences are observed in the leaves of healthy and diseased trees, especially at wavelengths of 500 to 600 nm and 700 to 800 nm. The overall accuracy of the method is found to be 89%.
Fast Image Texture Classification Using Decision Trees

NASA Technical Reports Server (NTRS)

Thompson, David R.

2011-01-01

Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.
Genomic, transcriptomic, and proteomic approaches towards understanding the molecular mechanisms of salt tolerance in Frankia strains isolated from Casuarina trees.

PubMed

Oshone, Rediet; Ngom, Mariama; Chu, Feixia; Mansour, Samira; Sy, Mame Ourèye; Champion, Antony; Tisa, Louis S

2017-08-18

Soil salinization is a worldwide problem that is intensifying because of the effects of climate change. An effective method for the reclamation of salt-affected soils involves initiating plant succession using fast growing, nitrogen fixing actinorhizal trees such as the Casuarina. The salt tolerance of Casuarina is enhanced by the nitrogen-fixing symbiosis that they form with the actinobacterium Frankia. Identification and molecular characterization of salt-tolerant Casuarina species and associated Frankia is imperative for the successful utilization of Casuarina trees in saline soil reclamation efforts. In this study, salt-tolerant and salt-sensitive Casuarina associated Frankia strains were identified and comparative genomics, transcriptome profiling, and proteomics were employed to elucidate the molecular mechanisms of salt and osmotic stress tolerance. Salt-tolerant Frankia strains (CcI6 and Allo2) that could withstand up to 1000 mM NaCl and a salt-sensitive Frankia strain (CcI3) which could withstand only up to 475 mM NaCl were identified. The remaining isolates had intermediate levels of salt tolerance with MIC values ranging from 650 mM to 750 mM. Comparative genomic analysis showed that all of the Frankia isolates from Casuarina belonged to the same species (Frankia casuarinae). Pangenome analysis revealed a high abundance of singletons among all Casuarina isolates. The two salt-tolerant strains contained 153 shared single copy genes (most of which code for hypothetical proteins) that were not found in the salt-sensitive(CcI3) and moderately salt-tolerant (CeD) strains. RNA-seq analysis of one of the two salt-tolerant strains (Frankia sp. strain CcI6) revealed hundreds of genes differentially expressed under salt and/or osmotic stress. Among the 153 genes, 7 and 7 were responsive to salt and osmotic stress, respectively. Proteomic profiling confirmed the transcriptome results and identified 19 and 8 salt and/or osmotic stress-responsive proteins in the salt-tolerant (CcI6) and the salt-sensitive (CcI3) strains, respectively. Genetic differences between salt-tolerant and salt-sensitive Frankia strains isolated from Casuarina were identified. Transcriptome and proteome profiling of a salt-tolerant strain was used to determine molecular differences correlated with differential salt-tolerance and several candidate genes were identified. Mechanisms involving transcriptional and translational regulation, cell envelop remodeling, and previously uncharacterized proteins appear to be important for salt tolerance. Physiological and mutational analyses will further shed light on the molecular mechanism of salt tolerance in Casuarina associated Frankia isolates.
Networks and Spanning Trees: The Juxtaposition of Prüfer and Boruvka

ERIC Educational Resources Information Center

Lodder, Jerry

2014-01-01

This paper outlines a method for teaching topics in undergraduate mathematics or computer science via historical curricular modules. The contents of one module, "Networks and Spanning Trees," are discussed from the original work of Arthur Cayley, Heinz Prüfer, and Otakar Boruvka that motivates the enumeration and application of trees in…
Characterizing individual particles on tree leaves using computer automated scanning electron microscopy

Treesearch

D. L. Johnson; D. J. Nowak; V. A. Jouraeva

1999-01-01

Leaves from twenty-three deciduous tree species and five conifer species were collected within a limited geographic range (1 km radius) and evaluated for possible application of scanning electron microscopy and X-ray microanalysis techniques of individual particle analysis (IPA). The goal was to identify tree species with leaves suitable for the automated...
Tree failures and accidents in recreation areas: a guide to data management for hazard control

Treesearch

Lee A. Paine; James W. Clarke

1978-01-01

A data management system has been developed for storage and retrieval of tree failure and hazard data, with provision for computer analyses and presentation of results in useful tables. This system emphasizes important relationships between tree characteristics, environmental factors, and the resulting hazard. The analysis programs permit easy selection of subsets of...
Optimal tree-stem bucking of northeastern species of China

Treesearch

Jingxin Wang; Chris B. LeDoux; Joseph McNeel

2004-01-01

An application of optimal tree-stem bucking to the northeastern tree species of China is reported. The bucking procedures used in this region are summarized, which are the basic guidelines for the optimal bucking design. The directed graph approach was adopted to generate the bucking patterns by using the network analysis labeling algorithm. A computer-based bucking...
The k-d Tree: A Hierarchical Model for Human Cognition.

ERIC Educational Resources Information Center

Vandendorpe, Mary M.

This paper discusses a model of information storage and retrieval, the k-d tree (Bentley, 1975), a binary, hierarchical tree with multiple associate terms, which has been explored in computer research, and it is suggested that this model could be useful for describing human cognition. Included are two models of human long-term memory--networks and…
Selecting reference cities for i-Tree Streets

Treesearch

E.G. McPherson

2010-01-01

The i-Tree Streets (formerly STRATUM) computer program quantifies municipal forest structure, function, and value using tree growth and geographic data from sixteen U.S. reference cities, one for each of sixteen climate zones. Selecting the reference city that best matches a subject city is problematic when the subject city is outside the U.S., lays on the border...
A benefit-cost analysis of ten tree species in Modesto, California, U.S.A

Treesearch

E.G. McPherson

2003-01-01

Tree work records for ten species were analyzed to estimate average annual management costs by dbh class for six activity areas. Average annual benefits were calculated by dbh class for each species with computer modeling. Average annual net benefits per tree were greatest for London plane (Platanus acerifolia) ($178.57), hackberry (...
The Variable Regions of Lactobacillus rhamnosus Genomes Reveal the Dynamic Evolution of Metabolic and Host-Adaptation Repertoires

PubMed Central

Ceapa, Corina; Davids, Mark; Ritari, Jarmo; Lambert, Jolanda; Wels, Michiel; Douillard, François P.; Smokvina, Tamara; de Vos, Willem M.; Knol, Jan; Kleerebezem, Michiel

2016-01-01

Lactobacillus rhamnosus is a diverse Gram-positive species with strains isolated from different ecological niches. Here, we report the genome sequence analysis of 40 diverse strains of L. rhamnosus and their genomic comparison, with a focus on the variable genome. Genomic comparison of 40 L. rhamnosus strains discriminated the conserved genes (core genome) and regions of plasticity involving frequent rearrangements and horizontal transfer (variome). The L. rhamnosus core genome encompasses 2,164 genes, out of 4,711 genes in total (the pan-genome). The accessory genome is dominated by genes encoding carbohydrate transport and metabolism, extracellular polysaccharides (EPS) biosynthesis, bacteriocin production, pili production, the cas system, and the associated clustered regularly interspaced short palindromic repeat (CRISPR) loci, and more than 100 transporter functions and mobile genetic elements like phages, plasmid genes, and transposons. A clade distribution based on amino acid differences between core (shared) proteins matched with the clade distribution obtained from the presence–absence of variable genes. The phylogenetic and variome tree overlap indicated that frequent events of gene acquisition and loss dominated the evolutionary segregation of the strains within this species, which is paralleled by evolutionary diversification of core gene functions. The CRISPR-Cas system could have contributed to this evolutionary segregation. Lactobacillus rhamnosus strains contain the genetic and metabolic machinery with strain-specific gene functions required to adapt to a large range of environments. A remarkable congruency of the evolutionary relatedness of the strains’ core and variome functions, possibly favoring interspecies genetic exchanges, underlines the importance of gene-acquisition and loss within the L. rhamnosus strain diversification. PMID:27358423
Enumerating all maximal frequent subtrees in collections of phylogenetic trees

PubMed Central

2014-01-01

Background A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. Results We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Conclusions Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees. PMID:25061474
Enumerating all maximal frequent subtrees in collections of phylogenetic trees.

PubMed

Deepak, Akshay; Fernández-Baca, David

2014-01-01

A common problem in phylogenetic analysis is to identify frequent patterns in a collection of phylogenetic trees. The goal is, roughly, to find a subset of the species (taxa) on which all or some significant subset of the trees agree. One popular method to do so is through maximum agreement subtrees (MASTs). MASTs are also used, among other things, as a metric for comparing phylogenetic trees, computing congruence indices and to identify horizontal gene transfer events. We give algorithms and experimental results for two approaches to identify common patterns in a collection of phylogenetic trees, one based on agreement subtrees, called maximal agreement subtrees, the other on frequent subtrees, called maximal frequent subtrees. These approaches can return subtrees on larger sets of taxa than MASTs, and can reveal new common phylogenetic relationships not present in either MASTs or the majority rule tree (a popular consensus method). Our current implementation is available on the web at https://code.google.com/p/mfst-miner/. Our computational results confirm that maximal agreement subtrees and all maximal frequent subtrees can reveal a more complete phylogenetic picture of the common patterns in collections of phylogenetic trees than maximum agreement subtrees; they are also often more resolved than the majority rule tree. Further, our experiments show that enumerating maximal frequent subtrees is considerably more practical than enumerating ordinary (not necessarily maximal) frequent subtrees.

Genomic characterization of two novel SAR11 isolates from the Red Sea, including the first strain of the SAR11 Ib clade.

PubMed

Jimenez-Infante, Francy; Ngugi, David Kamanda; Vinu, Manikandan; Blom, Jochen; Alam, Intikhab; Bajic, Vladimir B; Stingl, Ulrich

2017-07-01

The SAR11 clade (Pelagibacterales) is a diverse group that forms a monophyletic clade within the Alphaproteobacteria, and constitutes up to one third of all prokaryotic cells in the photic zone of most oceans. Pelagibacterales are very abundant in the warm and highly saline surface waters of the Red Sea, raising the question of adaptive traits of SAR11 populations in this water body and warmer oceans through the world. In this study, two pure cultures were successfully obtained from surface waters on the Red Sea: one isolate of subgroup Ia and one of the previously uncultured SAR11 Ib lineage. The novel genomes were very similar to each other and to genomes of isolates of SAR11 subgroup Ia (Ia pan-genome), both in terms of gene content and synteny. Among the genes that were not present in the Ia pan-genome, 108 (RS39, Ia) and 151 genes (RS40, Ib) were strain specific. Detailed analyses showed that only 51 (RS39, Ia) and 55 (RS40, Ib) of these strain-specific genes had not reported before on genome fragments of Pelagibacterales. Further analyses revealed the potential production of phosphonates by some SAR11 members and possible adaptations for oligotrophic life, including pentose sugar utilization and adhesion to marine particulate matter. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

PubMed

Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

2016-04-01

Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.
Multi-hop path tracing of mobile robot with multi-range image

NASA Astrophysics Data System (ADS)

Choudhury, Ramakanta; Samal, Chandrakanta; Choudhury, Umakanta

2010-02-01

It is well known that image processing depends heavily upon image representation technique . This paper intends to find out the optimal path of mobile robots for a specified area where obstacles are predefined as well as modified. Here the optimal path is represented by using the Quad tree method. Since there has been rising interest in the use of quad tree, we have tried to use the successive subdivision of images into quadrants from which the quad tree is developed. In the quad tree, obstacles-free area and the partial filled area are represented with different notations. After development of quad tree the algorithm is used to find the optimal path by employing neighbor finding technique, with a view to move the robot from the source to destination. The algorithm, here , permeates through the entire tree, and tries to locate the common ancestor for computation. The computation and the algorithm, aim at easing the ability of the robot to trace the optimal path with the help of adjacencies between the neighboring nodes as well as determining such adjacencies in the horizontal, vertical and diagonal directions. In this paper efforts have been made to determine the movement of the adjacent block in the quad tree and to detect the transition between the blocks equal size and finally generate the result.
Maximum parsimony, substitution model, and probability phylogenetic trees.

PubMed

Weng, J F; Thomas, D A; Mareels, I

2011-01-01

The problem of inferring phylogenies (phylogenetic trees) is one of the main problems in computational biology. There are three main methods for inferring phylogenies-Maximum Parsimony (MP), Distance Matrix (DM) and Maximum Likelihood (ML), of which the MP method is the most well-studied and popular method. In the MP method the optimization criterion is the number of substitutions of the nucleotides computed by the differences in the investigated nucleotide sequences. However, the MP method is often criticized as it only counts the substitutions observable at the current time and all the unobservable substitutions that really occur in the evolutionary history are omitted. In order to take into account the unobservable substitutions, some substitution models have been established and they are now widely used in the DM and ML methods but these substitution models cannot be used within the classical MP method. Recently the authors proposed a probability representation model for phylogenetic trees and the reconstructed trees in this model are called probability phylogenetic trees. One of the advantages of the probability representation model is that it can include a substitution model to infer phylogenetic trees based on the MP principle. In this paper we explain how to use a substitution model in the reconstruction of probability phylogenetic trees and show the advantage of this approach with examples.
A Metric on Phylogenetic Tree Shapes

PubMed Central

Plazzotta, G.

2018-01-01

Abstract The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees’ branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. PMID:28472435
Multi-terminal pipe routing by Steiner minimal tree and particle swarm optimisation

NASA Astrophysics Data System (ADS)

Liu, Qiang; Wang, Chengen

2012-08-01

Computer-aided design of pipe routing is of fundamental importance for complex equipments' developments. In this article, non-rectilinear branch pipe routing with multiple terminals that can be formulated as a Euclidean Steiner Minimal Tree with Obstacles (ESMTO) problem is studied in the context of an aeroengine-integrated design engineering. Unlike the traditional methods that connect pipe terminals sequentially, this article presents a new branch pipe routing algorithm based on the Steiner tree theory. The article begins with a new algorithm for solving the ESMTO problem by using particle swarm optimisation (PSO), and then extends the method to the surface cases by using geodesics to meet the requirements of routing non-rectilinear pipes on the surfaces of aeroengines. Subsequently, the adaptive region strategy and the basic visibility graph method are adopted to increase the computation efficiency. Numeral computations show that the proposed routing algorithm can find satisfactory routing layouts while running in polynomial time.
Automatic Classification of Trees from Laser Scanning Point Clouds

NASA Astrophysics Data System (ADS)

Sirmacek, B.; Lindenbergh, R.

2015-08-01

Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point cloud sources are discussed in detail.
Retrieving and Indexing Spatial Data in the Cloud Computing Environment

NASA Astrophysics Data System (ADS)

Wang, Yonggang; Wang, Sheng; Zhou, Daliang

In order to solve the drawbacks of spatial data storage in common Cloud Computing platform, we design and present a framework for retrieving, indexing, accessing and managing spatial data in the Cloud environment. An interoperable spatial data object model is provided based on the Simple Feature Coding Rules from the OGC such as Well Known Binary (WKB) and Well Known Text (WKT). And the classic spatial indexing algorithms like Quad-Tree and R-Tree are re-designed in the Cloud Computing environment. In the last we develop a prototype software based on Google App Engine to implement the proposed model.
From Greeks to Today: Cipher Trees and Computer Cryptography.

ERIC Educational Resources Information Center

Grady, M. Tim; Brumbaugh, Doug

1988-01-01

Explores the use of computers for teaching mathematical models of transposition ciphers. Illustrates the ideas, includes activities and extensions, provides a mathematical model and includes computer programs to implement these topics. (MVL)
MDTS: automatic complex materials design using Monte Carlo tree search.

PubMed

M Dieb, Thaer; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji

2017-01-01

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
MDTS: automatic complex materials design using Monte Carlo tree search

NASA Astrophysics Data System (ADS)

Dieb, Thaer M.; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji

2017-12-01

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
Dynamics of flexible bodies in tree topology - A computer oriented approach

NASA Technical Reports Server (NTRS)

Singh, R. P.; Vandervoort, R. J.; Likins, P. W.

1984-01-01

An approach suited for automatic generation of the equations of motion for large mechanical systems (i.e., large space structures, mechanisms, robots, etc.) is presented. The system topology is restricted to a tree configuration. The tree is defined as an arbitrary set of rigid and flexible bodies connected by hinges characterizing relative translations and rotations of two adjoining bodies. The equations of motion are derived via Kane's method. The resulting equation set is of minimum dimension. Dynamical equations are imbedded in a computer program called TREETOPS. Extensive control simulation capability is built in the TREETOPS program. The simulation is driven by an interactive set-up program resulting in an easy to use analysis tool.
Genus-Wide Assessment of Lignocellulose Utilization in the Extremely Thermophilic Genus Caldicellulosiruptor by Genomic, Pangenomic, and Metagenomic Analyses.

PubMed

Lee, Laura L; Blumer-Schuette, Sara E; Izquierdo, Javier A; Zurawski, Jeffrey V; Loder, Andrew J; Conway, Jonathan M; Elkins, James G; Podar, Mircea; Clum, Alicia; Jones, Piet C; Piatek, Marek J; Weighill, Deborah A; Jacobson, Daniel A; Adams, Michael W W; Kelly, Robert M

2018-05-01

Metagenomic data from Obsidian Pool (Yellowstone National Park, USA) and 13 genome sequences were used to reassess genus-wide biodiversity for the extremely thermophilic Caldicellulosiruptor The updated core genome contains 1,401 ortholog groups (average genome size for 13 species = 2,516 genes). The pangenome, which remains open with a revised total of 3,493 ortholog groups, encodes a variety of multidomain glycoside hydrolases (GHs). These include three cellulases with GH48 domains that are colocated in the glucan degradation locus (GDL) and are specific determinants for microcrystalline cellulose utilization. Three recently sequenced species, Caldicellulosiruptor sp. strain Rt8.B8 (renamed here Caldicellulosiruptor morganii ), Thermoanaerobacter cellulolyticus strain NA10 (renamed here Caldicellulosiruptor naganoensis ), and Caldicellulosiruptor sp. strain Wai35.B1 (renamed here Caldicellulosiruptor danielii ), degraded Avicel and lignocellulose (switchgrass). C. morganii was more efficient than Caldicellulosiruptor bescii in this regard and differed from the other 12 species examined, both based on genome content and organization and in the specific domain features of conserved GHs. Metagenomic analysis of lignocellulose-enriched samples from Obsidian Pool revealed limited new information on genus biodiversity. Enrichments yielded genomic signatures closely related to that of Caldicellulosiruptor obsidiansis , but there was also evidence for other thermophilic fermentative anaerobes ( Caldanaerobacter , Fervidobacterium , Caloramator , and Clostridium ). One enrichment, containing 89.8% Caldicellulosiruptor and 9.7% Caloramator , had a capacity for switchgrass solubilization comparable to that of C. bescii These results refine the known biodiversity of Caldicellulosiruptor and indicate that microcrystalline cellulose degradation at temperatures above 70°C, based on current information, is limited to certain members of this genus that produce GH48 domain-containing enzymes. IMPORTANCE The genus Caldicellulosiruptor contains the most thermophilic bacteria capable of lignocellulose deconstruction, which are promising candidates for consolidated bioprocessing for the production of biofuels and bio-based chemicals. The focus here is on the extant capability of this genus for plant biomass degradation and the extent to which this can be inferred from the core and pangenomes, based on analysis of 13 species and metagenomic sequence information from environmental samples. Key to microcrystalline hydrolysis is the content of the glucan degradation locus (GDL), a set of genes encoding glycoside hydrolases (GHs), several of which have GH48 and family 3 carbohydrate binding module domains, that function as primary cellulases. Resolving the relationship between the GDL and lignocellulose degradation will inform efforts to identify more prolific members of the genus and to develop metabolic engineering strategies to improve this characteristic. Copyright © 2018 American Society for Microbiology.
Genus-wide assessment of lignocellulose utilization in the extremely thermophilic Caldicellulosiruptor by genomic, pan-genomic and metagenomic analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Laura L.; Blumer-Schuette, Sara E.; Izquierdo, Javier A.

Metagenomic data from Obsidian Pool (Yellowstone National Park, USA) and thirteen genome sequences were used to re-assess genus-wide biodiversity for the extremely thermophilicCaldicellulosiruptor. The updated core-genome contains 1,401 ortholog groups (average genome size for thirteen species = 2,516 genes). The pan-genome, which remains open with a revised total of 3,493 ortholog groups, encodes a variety of multi-domain glycoside hydrolases (GH). These include three cellulases with GH48 domains that are co-located in the Glucan Degradation Locus (GDL) and are specific determinants for microcrystalline cellulose utilization. Three recently sequenced species,Caldicellulosiruptorsp. str. Rt8.B8 (re-named hereCaldicellulosiruptor morganii),Thermoanaerobacter cellulolyticusstr. NA10 (re-named hereCaldicellulosiruptor naganoensisNA10), andCaldicellulosiruptorsp. str.more » Wai35.B1 (re-named hereCaldicellulosiruptor danielii) degraded Avicel and lignocellulose (switchgrass).C. morganiiwas more efficient thanC. besciiin this regard and differed from the other twelve species examined here, both based on genome content and organization and in the specific domain features of conserved GHs. Metagenomic analysis of lignocellulose-enriched samples from Obsidian Pool revealed limited new information on genus biodiversity. Enrichments yielded genomic signatures closely related toCaldicellulosiruptor obsidiansis, but there was also evidence for other thermophilic fermentative anaerobes (Caldanaerobacter,Fervidobacterium,Caloramator, andClostridium). One enrichment, containing 89.7%Caldicellulosiruptorand 9.7%Caloramator, had a capacity for switchgrass solubilization comparable toC. bescii. These results refine the known biodiversity ofCaldicellulosiruptorand indicate that microcrystalline cellulose degradation at temperatures above 70°C, based on current information, is limited to certain members of this genus that produce GH48 domain-containing enzymes. The genusCaldicellulosiruptorcontains the most thermophilic bacteria capable of lignocellulose deconstruction and are promising candidates for consolidated bioprocessing for the production of biofuels and bio-based chemicals. The focus here is on the extant capability of this genus for plant biomass degradation and the extent to which this can be inferred from the core and pan-genomes, based on analysis of thirteen species and metagenomic sequence information from environmental samples. Key to microcrystalline hydrolysis is the content of the Glucan Degradation Locus (GDL), a set of genes encoding glycoside hydrolases (GH), several of which have GH48 and family 3 carbohydrate binding module domains, that function as primary cellulases. Resolving the relationship between the GDL and lignocellulose degradation will inform efforts to identify more prolific members of the genus and to develop metabolic engineering strategies to improve this characteristic« less
Genus-wide assessment of lignocellulose utilization in the extremely thermophilic Caldicellulosiruptor by genomic, pan-genomic and metagenomic analysis

DOE PAGES

Lee, Laura L.; Blumer-Schuette, Sara E.; Izquierdo, Javier A.; ...

2018-02-23

Metagenomic data from Obsidian Pool (Yellowstone National Park, USA) and thirteen genome sequences were used to re-assess genus-wide biodiversity for the extremely thermophilicCaldicellulosiruptor. The updated core-genome contains 1,401 ortholog groups (average genome size for thirteen species = 2,516 genes). The pan-genome, which remains open with a revised total of 3,493 ortholog groups, encodes a variety of multi-domain glycoside hydrolases (GH). These include three cellulases with GH48 domains that are co-located in the Glucan Degradation Locus (GDL) and are specific determinants for microcrystalline cellulose utilization. Three recently sequenced species,Caldicellulosiruptorsp. str. Rt8.B8 (re-named hereCaldicellulosiruptor morganii),Thermoanaerobacter cellulolyticusstr. NA10 (re-named hereCaldicellulosiruptor naganoensisNA10), andCaldicellulosiruptorsp. str.more » Wai35.B1 (re-named hereCaldicellulosiruptor danielii) degraded Avicel and lignocellulose (switchgrass).C. morganiiwas more efficient thanC. besciiin this regard and differed from the other twelve species examined here, both based on genome content and organization and in the specific domain features of conserved GHs. Metagenomic analysis of lignocellulose-enriched samples from Obsidian Pool revealed limited new information on genus biodiversity. Enrichments yielded genomic signatures closely related toCaldicellulosiruptor obsidiansis, but there was also evidence for other thermophilic fermentative anaerobes (Caldanaerobacter,Fervidobacterium,Caloramator, andClostridium). One enrichment, containing 89.7%Caldicellulosiruptorand 9.7%Caloramator, had a capacity for switchgrass solubilization comparable toC. bescii. These results refine the known biodiversity ofCaldicellulosiruptorand indicate that microcrystalline cellulose degradation at temperatures above 70°C, based on current information, is limited to certain members of this genus that produce GH48 domain-containing enzymes. The genusCaldicellulosiruptorcontains the most thermophilic bacteria capable of lignocellulose deconstruction and are promising candidates for consolidated bioprocessing for the production of biofuels and bio-based chemicals. The focus here is on the extant capability of this genus for plant biomass degradation and the extent to which this can be inferred from the core and pan-genomes, based on analysis of thirteen species and metagenomic sequence information from environmental samples. Key to microcrystalline hydrolysis is the content of the Glucan Degradation Locus (GDL), a set of genes encoding glycoside hydrolases (GH), several of which have GH48 and family 3 carbohydrate binding module domains, that function as primary cellulases. Resolving the relationship between the GDL and lignocellulose degradation will inform efforts to identify more prolific members of the genus and to develop metabolic engineering strategies to improve this characteristic« less
Growth and Yield of Appalachian Mixed Hardwoods After Thinning

Treesearch

Wade C. Harrison; Harold E. Burkhart; Thomas E. Burk; Donald E. Beckand

1986-01-01

G-RAT (Growth of Hardwoods After Thinning) is a system of computer programs used to predict growth and yield of Appalachian mixed hardwoods after thinning. Given a tree list or stand table, along with inputs of stand age, site index, and stand basal area before thinning, G-RAT software uses species-specific individual tree equations to predict tree basal area...
Comprehensive national database of tree effects on air quality and human health in the United States

Treesearch

Satoshi Hirabayashi; David J. Nowak

2016-01-01

Trees remove air pollutants through dry deposition processes depending upon forest structure, meteorology, and air quality that vary across space and time. Employing nationally available forest, weather, air pollution and human population data for 2010, computer simulations were performed for deciduous and evergreen trees with varying leaf area index for rural and...
A field-to-desktop toolchain for X-ray CT densitometry enables tree ring analysis

PubMed Central

De Mil, Tom; Vannoppen, Astrid; Beeckman, Hans; Van Acker, Joris; Van den Bulcke, Jan

2016-01-01

Background and Aims Disentangling tree growth requires more than ring width data only. Densitometry is considered a valuable proxy, yet laborious wood sample preparation and lack of dedicated software limit the widespread use of density profiling for tree ring analysis. An X-ray computed tomography-based toolchain of tree increment cores is presented, which results in profile data sets suitable for visual exploration as well as density-based pattern matching. Methods Two temperate (Quercus petraea, Fagus sylvatica) and one tropical species (Terminalia superba) were used for density profiling using an X-ray computed tomography facility with custom-made sample holders and dedicated processing software. Key Results Density-based pattern matching is developed and able to detect anomalies in ring series that can be corrected via interactive software. Conclusions A digital workflow allows generation of structure-corrected profiles of large sets of cores in a short time span that provide sufficient intra-annual density information for tree ring analysis. Furthermore, visual exploration of such data sets is of high value. The dated profiles can be used for high-resolution chronologies and also offer opportunities for fast screening of lesser studied tropical tree species. PMID:27107414
The user's guide to STEMS (Stand and Tree Evaluation and Modeling System).

Treesearch

David M. Belcher

1981-01-01

Presents the structure of STEMS, a computer program for projecting growth of individual trees within the Lake States Region, and discusses its input, processing, major subsystems, and output. Includes an example projection.
Comparative analysis of techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Hitt, E. F.; Bridgman, M. S.; Robinson, A. C.

1981-01-01

Performability analysis is a technique developed for evaluating the effectiveness of fault-tolerant computing systems in multiphase missions. Performability was evaluated for its accuracy, practical usefulness, and relative cost. The evaluation was performed by applying performability and the fault tree method to a set of sample problems ranging from simple to moderately complex. The problems involved as many as five outcomes, two to five mission phases, permanent faults, and some functional dependencies. Transient faults and software errors were not considered. A different analyst was responsible for each technique. Significantly more time and effort were required to learn performability analysis than the fault tree method. Performability is inherently as accurate as fault tree analysis. For the sample problems, fault trees were more practical and less time consuming to apply, while performability required less ingenuity and was more checkable. Performability offers some advantages for evaluating very complex problems.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rivasseau, Vincent, E-mail: vincent.rivasseau@th.u-psud.fr, E-mail: adrian.tanasa@ens-lyon.org; Tanasa, Adrian, E-mail: vincent.rivasseau@th.u-psud.fr, E-mail: adrian.tanasa@ens-lyon.org

The Loop Vertex Expansion (LVE) is a quantum field theory (QFT) method which explicitly computes the Borel sum of Feynman perturbation series. This LVE relies in a crucial way on symmetric tree weights which define a measure on the set of spanning trees of any connected graph. In this paper we generalize this method by defining new tree weights. They depend on the choice of a partition of a set of vertices of the graph, and when the partition is non-trivial, they are no longer symmetric under permutation of vertices. Nevertheless we prove they have the required positivity property tomore » lead to a convergent LVE; in fact we formulate this positivity property precisely for the first time. Our generalized tree weights are inspired by the Brydges-Battle-Federbush work on cluster expansions and could be particularly suited to the computation of connected functions in QFT. Several concrete examples are explicitly given.« less
Traversing the tangle: algorithms and applications for cophylogenetic studies.

PubMed

Charleston, Michael A; Perkins, Susan L

2006-02-01

Cophylogenetic analysis supposes that two or more phylogenetic trees for linked groups have been constructed, and explores the relationships the trees have with each other. These types of analyses are most commonly used to assess relationships between hosts and their parasites, however the methodology can also be applied to diverse types of problems such as an examination of the phylogenies of genes with respect to those of organisms or those of geographic areas and the organisms that reside there. The working hypothesis is that the trees are correct, though sometimes attempts are made to take into account their uncertainty. Cophylogeny is computationally hard: that is, there are no known fast methods to compute relationships among such trees for any but the simplest of models. A review of methodology that has been developed to examine cophylogenetic relationships is presented and a brief discussion of some medically relevant examples is given.
Polynomial-Time Algorithms for Building a Consensus MUL-Tree

PubMed Central

Cui, Yun; Jansson, Jesper

2012-01-01

Abstract A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host–parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. PMID:22963134
Polynomial-time algorithms for building a consensus MUL-tree.

PubMed

Cui, Yun; Jansson, Jesper; Sung, Wing-Kin

2012-09-01

A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.
Bioinformatics in proteomics: application, terminology, and pitfalls.

PubMed

Wiemer, Jan C; Prokudin, Alexander

2004-01-01

Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Computational analyses of spectral trees from electrospray multi-stage mass spectrometry to aid metabolite identification.

PubMed

Cao, Mingshu; Fraser, Karl; Rasmussen, Susanne

2013-10-31

Mass spectrometry coupled with chromatography has become the major technical platform in metabolomics. Aided by peak detection algorithms, the detected signals are characterized by mass-over-charge ratio (m/z) and retention time. Chemical identities often remain elusive for the majority of the signals. Multi-stage mass spectrometry based on electrospray ionization (ESI) allows collision-induced dissociation (CID) fragmentation of selected precursor ions. These fragment ions can assist in structural inference for metabolites of low molecular weight. Computational investigations of fragmentation spectra have increasingly received attention in metabolomics and various public databases house such data. We have developed an R package "iontree" that can capture, store and analyze MS2 and MS3 mass spectral data from high throughput metabolomics experiments. The package includes functions for ion tree construction, an algorithm (distMS2) for MS2 spectral comparison, and tools for building platform-independent ion tree (MS2/MS3) libraries. We have demonstrated the utilization of the package for the systematic analysis and annotation of fragmentation spectra collected in various metabolomics platforms, including direct infusion mass spectrometry, and liquid chromatography coupled with either low resolution or high resolution mass spectrometry. Assisted by the developed computational tools, we have demonstrated that spectral trees can provide informative evidence complementary to retention time and accurate mass to aid with annotating unknown peaks. These experimental spectral trees once subjected to a quality control process, can be used for querying public MS2 databases or de novo interpretation. The putatively annotated spectral trees can be readily incorporated into reference libraries for routine identification of metabolites.
Algorithmic Complexity. Volume II.

DTIC Science & Technology

1982-06-01

digital computers, this improvement will go unnoticed if only a few complex products are to be taken, however it can become increasingly important as...computed in the reverse order. If the products are formed moving from the top of the tree downward, and then the divisions are performed going from the...the reverse order, going up the tree. (r- a mod m means that r is the remainder when a is divided by M.) The overall running time of the algorithm is
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE PAGES

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

2017-08-29

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
A program to compute the soft Robinson-Foulds distance between phylogenetic networks.

PubMed

Lu, Bingxin; Zhang, Louxin; Leong, Hon Wai

2017-03-14

Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete. A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks. Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
A Metric on Phylogenetic Tree Shapes.

PubMed

Colijn, C; Plazzotta, G

2018-01-01

The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees' branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Forensic Analysis of Compromised Computers

NASA Technical Reports Server (NTRS)

Wolfe, Thomas

2004-01-01

Directory Tree Analysis File Generator is a Practical Extraction and Reporting Language (PERL) script that simplifies and automates the collection of information for forensic analysis of compromised computer systems. During such an analysis, it is sometimes necessary to collect and analyze information about files on a specific directory tree. Directory Tree Analysis File Generator collects information of this type (except information about directories) and writes it to a text file. In particular, the script asks the user for the root of the directory tree to be processed, the name of the output file, and the number of subtree levels to process. The script then processes the directory tree and puts out the aforementioned text file. The format of the text file is designed to enable the submission of the file as input to a spreadsheet program, wherein the forensic analysis is performed. The analysis usually consists of sorting files and examination of such characteristics of files as ownership, time of creation, and time of most recent access, all of which characteristics are among the data included in the text file.
Dendritic and Axonal Wiring Optimization of Cortical GABAergic Interneurons.

PubMed

Anton-Sanchez, Laura; Bielza, Concha; Benavides-Piccione, Ruth; DeFelipe, Javier; Larrañaga, Pedro

2016-10-01

The way in which a neuronal tree expands plays an important role in its functional and computational characteristics. We aimed to study the existence of an optimal neuronal design for different types of cortical GABAergic neurons. To do this, we hypothesized that both the axonal and dendritic trees of individual neurons optimize brain connectivity in terms of wiring length. We took the branching points of real three-dimensional neuronal reconstructions of the axonal and dendritic trees of different types of cortical interneurons and searched for the minimal wiring arborization structure that respects the branching points. We compared the minimal wiring arborization with real axonal and dendritic trees. We tested this optimization problem using a new approach based on graph theory and evolutionary computation techniques. We concluded that neuronal wiring is near-optimal in most of the tested neurons, although the wiring length of dendritic trees is generally nearer to the optimum. Therefore, wiring economy is related to the way in which neuronal arborizations grow irrespective of the marked differences in the morphology of the examined interneurons.
GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems

NASA Astrophysics Data System (ADS)

Fukushige, Toshiyuki; Makino, Junichiro; Kawai, Atsushi

2005-12-01

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such a configuration is particularly cost-effective in running parallel tree algorithms. Though the use of parallel tree algorithms was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and to optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.
The integrated microbial genome resource of analysis.

PubMed

Checcucci, Alice; Mengoni, Alessio

2015-01-01

Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.

ERIC Educational Resources Information Center

Kim, Sung-Ho

Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
STX--Fortran-4 program for estimates of tree populations from 3P sample-tree-measurements

Treesearch

L. R. Grosenbaugh

1967-01-01

Describes how to use an improved and greatly expanded version of an earlier computer program (1964) that converts dendrometer measurements of 3P-sample trees to population values in terms of whatever units user desires. Many new options are available, including that of obtaining a product-yield and appraisal report based on regression coefficients supplied by user....
Live phylogeny with polytomies: Finding the most compact parsimonious trees.

PubMed

Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G

2017-08-01

Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Not seeing the forest for the trees: size of the minimum spanning trees (MSTs) forest and branch significance in MST-based phylogenetic analysis.

PubMed

Teixeira, Andreia Sofia; Monteiro, Pedro T; Carriço, João A; Ramirez, Mário; Francisco, Alexandre P

2015-01-01

Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in such trees. In this paper, we address this problem for MSTs by introducing a new edge betweenness metric for undirected and weighted graphs. This spanning edge betweenness metric is defined as the fraction of equivalent MSTs where a given edge is present. The metric provides a per edge statistic that is similar to that of the bootstrap approach frequently used in phylogenetics to support the grouping of taxa. We provide methods for the exact computation of this metric based on the well known Kirchhoff's matrix tree theorem. Moreover, we implement and make available a module for the PHYLOViZ software and evaluate the proposed metric concerning both effectiveness and computational performance. Analysis of trees generated using multilocus sequence typing data (MLST) and the goeBURST algorithm revealed that the space of possible MSTs in real data sets is extremely large. Selection of the edge to be represented using bootstrap could lead to unreliable results since alternative edges are present in the same fraction of equivalent MSTs. The choice of the MST to be presented, results from criteria implemented in the algorithm that must be based in biologically plausible models.
Do Branch Lengths Help to Locate a Tree in a Phylogenetic Network?

PubMed

Gambette, Philippe; van Iersel, Leo; Kelk, Steven; Pardi, Fabio; Scornavacca, Celine

2016-09-01

Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms, this is often translated in the following way: is a given phylogenetic tree contained in a given phylogenetic network? Recently this tree containment problem has been widely investigated from a computational perspective, but most studies have only focused on the topology of the phylogenies, ignoring a piece of information that, in the case of phylogenetic trees, is routinely inferred by evolutionary analyses: branch lengths. These measure the amount of change (e.g., nucleotide substitutions) that has occurred along each branch of the phylogeny. Here, we study a number of versions of the tree containment problem that explicitly account for branch lengths. We show that, although length information has the potential to locate more precisely a tree within a network, the problem is computationally hard in its most general form. On a positive note, for a number of special cases of biological relevance, we provide algorithms that solve this problem efficiently. This includes the case of networks of limited complexity, for which it is possible to recover, among the trees contained by the network with the same topology as the input tree, the closest one in terms of branch lengths.

Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

PubMed Central

Reaz, Rezwana; Bayzid, Md. Shamsuzzoha; Rahman, M. Sohel

2014-01-01

Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. PMID:25117474
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics

NASA Astrophysics Data System (ADS)

Iyer, V.; Shetty, S.; Iyengar, S. S.

2015-07-01

Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
The growth of the UniTree mass storage system at the NASA Center for Computational Sciences: Some lessons learned

NASA Technical Reports Server (NTRS)

Tarshish, Adina; Salmon, Ellen

1994-01-01

In October 1992, the NASA Center for Computational Sciences made its Convex-based UniTree system generally available to users. The ensuing months saw growth in every area. Within 26 months, data under UniTree control grew from nil to over 12 terabytes, nearly all of it stored on robotically mounted tape. HiPPI/UltraNet was added to enhance connectivity, and later HiPPI/TCP was added as well. Disks and robotic tape silos were added to those already under UniTree's control, and 18-track tapes were upgraded to 36-track. The primary data source for UniTree, the facility's Cray Y-MP/4-128, first doubled its processing power and then was replaced altogether by a C98/6-256 with nearly two-and-a-half times the Y-MP's combined peak gigaflops. The Convex/UniTree software was upgraded from version 1.5 to 1.7.5, and then to 1.7.6. Finally, the server itself, a Convex C3240, was upgraded to a C3830 with a second I/O bay, doubling the C3240's memory and capacity for I/O. This paper describes insights gained and reinforced with the burgeoning demands on the UniTree storage system and the significant increases in performance gained from the many upgrades.
Diameter-Constrained Steiner Tree

NASA Astrophysics Data System (ADS)

Ding, Wei; Lin, Guohui; Xue, Guoliang

Given an edge-weighted undirected graph G = (V,E,c,w), where each edge e ∈ E has a cost c(e) and a weight w(e), a set S ⊆ V of terminals and a positive constant D 0, we seek a minimum cost Steiner tree where all terminals appear as leaves and its diameter is bounded by D 0. Note that the diameter of a tree represents the maximum weight of path connecting two different leaves in the tree. Such problem is called the minimum cost diameter-constrained Steiner tree problem. This problem is NP-hard even when the topology of Steiner tree is fixed. In present paper we focus on this restricted version and present a fully polynomial time approximation scheme (FPTAS) for computing a minimum cost diameter-constrained Steiner tree under a fixed topology.
A fully resolved consensus between fully resolved phylogenetic trees.

PubMed

Quitzau, José Augusto Amgarten; Meidanis, João

2006-03-31

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations.
A Very Fast and Angular Momentum Conserving Tree Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marcello, Dominic C., E-mail: dmarce504@gmail.com

There are many methods used to compute the classical gravitational field in astrophysical simulation codes. With the exception of the typically impractical method of direct computation, none ensure conservation of angular momentum to machine precision. Under uniform time-stepping, the Cartesian fast multipole method of Dehnen (also known as the very fast tree code) conserves linear momentum to machine precision. We show that it is possible to modify this method in a way that conserves both angular and linear momenta.
Digital computer processing of peach orchard multispectral aerial photography

NASA Technical Reports Server (NTRS)

Atkinson, R. J.

1976-01-01

Several methods of analysis using digital computers applicable to digitized multispectral aerial photography, are described, with particular application to peach orchard test sites. This effort was stimulated by the recent premature death of peach trees in the Southeastern United States. The techniques discussed are: (1) correction of intensity variations by digital filtering, (2) automatic detection and enumeration of trees in five size categories, (3) determination of unhealthy foliage by infrared reflectances, and (4) four band multispectral classification into healthy and declining categories.
Software For Fault-Tree Diagnosis Of A System

NASA Technical Reports Server (NTRS)

Iverson, Dave; Patterson-Hine, Ann; Liao, Jack

1993-01-01

Fault Tree Diagnosis System (FTDS) computer program is automated-diagnostic-system program identifying likely causes of specified failure on basis of information represented in system-reliability mathematical models known as fault trees. Is modified implementation of failure-cause-identification phase of Narayanan's and Viswanadham's methodology for acquisition of knowledge and reasoning in analyzing failures of systems. Knowledge base of if/then rules replaced with object-oriented fault-tree representation. Enhancement yields more-efficient identification of causes of failures and enables dynamic updating of knowledge base. Written in C language, C++, and Common LISP.
A Mixtures-of-Trees Framework for Multi-Label Classification

PubMed Central

Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos

2015-01-01

We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011
Learning classification trees

NASA Technical Reports Server (NTRS)

Buntine, Wray

1991-01-01

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.
Weighting factors for computing the relation between tree volume and d.b.h. in the Pacific Northwest.

Treesearch

Donald R. Gedney; Floyd A. Johnson

1959-01-01

Timber cruising is frequently made easier through use of local volume tables based on d.b.h. alone. These tables are made by establishing the relation between volume and d.b.h. from measurements (including height) made on sample trees in the stand. The sample-tree measurements are converted to volumes through use of standard volume tables, and a volume-diameter curve...
Leaves to landscapes: using high performance computing to assess patch-scale forest response to regional temperature and trace gas gradients

Treesearch

George E. Host; Harlan W. Stech; Kathryn E. Lenz; Kyle Roskoski; Richard Mather; Michael Donahue

2007-01-01

ECOPHYS is one of the early FSTM's that integrated plant physiological and tree architectural models to assess the relative importance of genetic traits in tree growth, and explore the growth response to interacting environmental stresses (Host et al 1999, Isebrands et al 1999, Martin et al 2001). This paper will describe extensions of the ECOPHYS individual tree...
A field-to-desktop toolchain for X-ray CT densitometry enables tree ring analysis.

PubMed

De Mil, Tom; Vannoppen, Astrid; Beeckman, Hans; Van Acker, Joris; Van den Bulcke, Jan

2016-06-01

Disentangling tree growth requires more than ring width data only. Densitometry is considered a valuable proxy, yet laborious wood sample preparation and lack of dedicated software limit the widespread use of density profiling for tree ring analysis. An X-ray computed tomography-based toolchain of tree increment cores is presented, which results in profile data sets suitable for visual exploration as well as density-based pattern matching. Two temperate (Quercus petraea, Fagus sylvatica) and one tropical species (Terminalia superba) were used for density profiling using an X-ray computed tomography facility with custom-made sample holders and dedicated processing software. Density-based pattern matching is developed and able to detect anomalies in ring series that can be corrected via interactive software. A digital workflow allows generation of structure-corrected profiles of large sets of cores in a short time span that provide sufficient intra-annual density information for tree ring analysis. Furthermore, visual exploration of such data sets is of high value. The dated profiles can be used for high-resolution chronologies and also offer opportunities for fast screening of lesser studied tropical tree species. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Multispectral color photography for mineral exploration by the remote sensing of biogeochemical anomalies

NASA Technical Reports Server (NTRS)

Yost, E.

1975-01-01

Selected band multispectral photography was evaluated as a mineral exploration tool by detecting stress on trees caused by underground mineralization. Ground truth consisted of two test sites in the Prescott National Forest within which the mineralization had been established by a drilling program. Species of trees were categorized as background, intermediate, and anomalous based upon where they grew with respect to this underlying mineralization. Soil geochemistry and the metal content of ashed samples of the trees were studied in relation to the inferred locus of mineralization. Computer analysis of the reflectance spectra of mineralized trees confirmed that the relative percent reflectance differences of trees growing in anomalous areas was less than that of the same tree species growing in background areas.
Integrating information about location and value of resources by white-faced saki monkeys (Pithecia pithecia).

PubMed

Cunningham, Elena; Janson, Charles

2007-07-01

Most studies of spatial memory in primates focus on species that inhabit large home ranges and have dispersed, patchy resources. Researchers assume that primates use memory to minimize distances traveled between resources. We investigated the use of spatial memory in a group of six white-faced sakis (Pithecia pithecia) on 12.8-ha Round Island, Guri Lake, Venezuela during a period of fruit abundance. The sakis' movements were analyzed with logistic regressions, a predictive computer model and a computer model that simulates movements. We considered all the resources available to the sakis and compared observed distances to predicted distances from a computer model for foragers who know nothing about the location of resources. Surprisingly, the observed distances were four times greater than the predicted distances, suggesting that the sakis passed by a majority of the available fruit trees without feeding. The odds of visiting a food tree, however, were significantly increased if the tree had been visited in the previous 3 days and had more than 100 fruit. The sakis' preferred resources were highly productive fruit trees, Capparis trees, and trees with water holes. They traveled efficiently to these sites. The sakis choice of feeding sites indicate that they combined knowledge acquired by repeatedly traveling through their home range with 'what' and 'where' information gained from individual visits to resources. Although the sakis' foraging choices increased the distance they traveled overall, choosing more valued sites allowed the group to minimize intra-group feeding competition, maintain intergroup dominance over important resources, and monitor the state of resources throughout their home range. The sakis' foraging decisions appear to have used spatial memory, elements of episodic-like memory and social and nutritional considerations.
Lognormal Approximations of Fault Tree Uncertainty Distributions.

PubMed

El-Shanawany, Ashraf Ben; Ardron, Keith H; Walker, Simon P

2018-01-26

Fault trees are used in reliability modeling to create logical models of fault combinations that can lead to undesirable events. The output of a fault tree analysis (the top event probability) is expressed in terms of the failure probabilities of basic events that are input to the model. Typically, the basic event probabilities are not known exactly, but are modeled as probability distributions: therefore, the top event probability is also represented as an uncertainty distribution. Monte Carlo methods are generally used for evaluating the uncertainty distribution, but such calculations are computationally intensive and do not readily reveal the dominant contributors to the uncertainty. In this article, a closed-form approximation for the fault tree top event uncertainty distribution is developed, which is applicable when the uncertainties in the basic events of the model are lognormally distributed. The results of the approximate method are compared with results from two sampling-based methods: namely, the Monte Carlo method and the Wilks method based on order statistics. It is shown that the closed-form expression can provide a reasonable approximation to results obtained by Monte Carlo sampling, without incurring the computational expense. The Wilks method is found to be a useful means of providing an upper bound for the percentiles of the uncertainty distribution while being computationally inexpensive compared with full Monte Carlo sampling. The lognormal approximation method and Wilks's method appear attractive, practical alternatives for the evaluation of uncertainty in the output of fault trees and similar multilinear models. © 2018 Society for Risk Analysis.
Treecode with a Special-Purpose Processor

NASA Astrophysics Data System (ADS)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
Software Reviews. Programs Worth a Second Look.

ERIC Educational Resources Information Center

Schneider, Roxanne; Eiser, Leslie

1989-01-01

Reviewed are three computer software packages for use in middle/high school classrooms. Included are "MacWrite II," a word-processing program for MacIntosh computers; "Super Story Tree," a word-processing program for Apple and IBM computers; and "Math Blaster Mystery," for IBM, Apple, and Tandy computers. (CW)
PhyloExplorer: a web server to validate, explore and query phylogenetic trees

PubMed Central

Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

2009-01-01

Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253
Polysaccharide utilization loci and nutritional specialization in a dominant group of butyrate-producing human colonic Firmicutes.

PubMed

Sheridan, Paul O; Martin, Jennifer C; Lawley, Trevor D; Browne, Hilary P; Harris, Hugh M B; Bernalier-Donadille, Annick; Duncan, Sylvia H; O'Toole, Paul W; Scott, Karen P; Flint, Harry J

2016-02-01

Firmicutes and Bacteroidetes are the predominant bacterial phyla colonizing the healthy human large intestine. Whilst both ferment dietary fibre, genes responsible for this important activity have been analysed only in the Bacteroidetes , with very little known about the Firmicutes . This work investigates the carbohydrate-active enzymes (CAZymes) in a group of Firmicutes , Roseburia spp. and Eubacterium rectale , which play an important role in producing butyrate from dietary carbohydrates and in health maintenance. Genome sequences of 11 strains representing E. rectale and four Roseburia spp. were analysed for carbohydrate-active genes. Following assembly into a pan-genome, core, variable and unique genes were identified. The 1840 CAZyme genes identified in the pan-genome were assigned to 538 orthologous groups, of which only 26 were present in all strains, indicating considerable inter-strain variability. This analysis was used to categorize the 11 strains into four carbohydrate utilization ecotypes (CUEs), which were shown to correspond to utilization of different carbohydrates for growth. Many glycoside hydrolase genes were found linked to genes encoding oligosaccharide transporters and regulatory elements in the genomes of Roseburia spp. and E. rectale , forming distinct polysaccharide utilization loci (PULs). Whilst PULs are also a common feature in Bacteroidetes , key differences were noted in these Firmicutes , including the absence of close homologues of Bacteroides polysaccharide utilization genes, hence we refer to Gram-positive PULs (gpPULs). Most CAZyme genes in the Roseburia / E. rectale group are organized into gpPULs. Variation in gpPULs can explain the high degree of nutritional specialization at the species level within this group.

Genetic variability of mutans streptococci revealed by wide whole-genome sequencing

PubMed Central

2013-01-01

Background Mutans streptococci are a group of bacteria significantly contributing to tooth decay. Their genetic variability is however still not well understood. Results Genomes of 6 clinical S. mutans isolates of different origins, one isolate of S. sobrinus (DSM 20742) and one isolate of S. ratti (DSM 20564) were sequenced and comparatively analyzed. Genome alignment revealed a mosaic-like structure of genome arrangement. Genes related to pathogenicity are found to have high variations among the strains, whereas genes for oxidative stress resistance are well conserved, indicating the importance of this trait in the dental biofilm community. Analysis of genome-scale metabolic networks revealed significant differences in 42 pathways. A striking dissimilarity is the unique presence of two lactate oxidases in S. sobrinus DSM 20742, probably indicating an unusual capability of this strain in producing H2O2 and expanding its ecological niche. In addition, lactate oxidases may form with other enzymes a novel energetic pathway in S. sobrinus DSM 20742 that can remedy its deficiency in citrate utilization pathway. Using 67 S. mutans genomes currently available including the strains sequenced in this study, we estimates the theoretical core genome size of S. mutans, and performed modeling of S. mutans pan-genome by applying different fitting models. An “open” pan-genome was inferred. Conclusions The comparative genome analyses revealed diversities in the mutans streptococci group, especially with respect to the virulence related genes and metabolic pathways. The results are helpful for better understanding the evolution and adaptive mechanisms of these oral pathogen microorganisms and for combating them. PMID:23805886
Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity

PubMed Central

Bosi, Emanuele; Monk, Jonathan M.; Aziz, Ramy K.; Fondi, Marco; Nizet, Victor; Palsson, Bernhard Ø.

2016-01-01

Staphylococcus aureus is a preeminent bacterial pathogen capable of colonizing diverse ecological niches within its human host. We describe here the pangenome of S. aureus based on analysis of genome sequences from 64 strains of S. aureus spanning a range of ecological niches, host types, and antibiotic resistance profiles. Based on this set, S. aureus is expected to have an open pangenome composed of 7,411 genes and a core genome composed of 1,441 genes. Metabolism was highly conserved in this core genome; however, differences were identified in amino acid and nucleotide biosynthesis pathways between the strains. Genome-scale models (GEMs) of metabolism were constructed for the 64 strains of S. aureus. These GEMs enabled a systems approach to characterizing the core metabolic and panmetabolic capabilities of the S. aureus species. All models were predicted to be auxotrophic for the vitamins niacin (vitamin B3) and thiamin (vitamin B1), whereas strain-specific auxotrophies were predicted for riboflavin (vitamin B2), guanosine, leucine, methionine, and cysteine, among others. GEMs were used to systematically analyze growth capabilities in more than 300 different growth-supporting environments. The results identified metabolic capabilities linked to pathogenic traits and virulence acquisitions. Such traits can be used to differentiate strains responsible for mild vs. severe infections and preference for hosts (e.g., animals vs. humans). Genome-scale analysis of multiple strains of a species can thus be used to identify metabolic determinants of virulence and increase our understanding of why certain strains of this deadly pathogen have spread rapidly throughout the world. PMID:27286824
Pantoea ananatis Genetic Diversity Analysis Reveals Limited Genomic Diversity as Well as Accessory Genes Correlated with Onion Pathogenicity.

PubMed

Stice, Shaun P; Stumpf, Spencer D; Gitaitis, Ron D; Kvitko, Brian H; Dutta, Bhabesh

2018-01-01

Pantoea ananatis is a member of the family Enterobacteriaceae and an enigmatic plant pathogen with a broad host range. Although P. ananatis strains can be aggressive on onion causing foliar necrosis and onion center rot, previous genomic analysis has shown that P. ananatis lacks the primary virulence secretion systems associated with other plant pathogens. We assessed a collection of fifty P. ananatis strains collected from Georgia over three decades to determine genetic factors that correlated with onion pathogenic potential. Previous genetic analysis studies have compared strains isolated from different hosts with varying diseases potential and isolation sources. Strains varied greatly in their pathogenic potential and aggressiveness on different cultivated Allium species like onion, leek, shallot, and chive. Using multi-locus sequence analysis (MLSA) and repetitive extragenic palindrome repeat (rep)-PCR techniques, we did not observe any correlation between onion pathogenic potential and genetic diversity among strains. Whole genome sequencing and pan-genomic analysis of a sub-set of 10 strains aided in the identification of a novel series of genetic regions, likely plasmid borne, and correlating with onion pathogenicity observed on single contigs of the genetic assemblies. We named these loci Onion Virulence Regions (OVR) A-D. The OVR loci contain genes involved in redox regulation as well as pectate lyase and rhamnogalacturonase genes. Previous studies have not identified distinct genetic loci or plasmids correlating with onion foliar pathogenicity or pathogenicity on a single host pathosystem. The lack of focus on a single host system for this phytopathgenic disease necessitates the pan-genomic analysis performed in this study.
Pan-Genome Analysis Links the Hereditary Variation of Leptospirillum ferriphilum With Its Evolutionary Adaptation

PubMed Central

Zhang, Xian; Liu, Xueduan; Yang, Fei; Chen, Lv

2018-01-01

Niche adaptation has long been recognized to drive intra-species differentiation and speciation, yet knowledge about its relatedness with hereditary variation of microbial genomes is relatively limited. Using Leptospirillum ferriphilum species as a case study, we present a detailed analysis of genomic features of five recognized strains. Genome-to-genome distance calculation preliminarily determined the roles of spatial distance and environmental heterogeneity that potentially contribute to intra-species variation within L. ferriphilum species at the genome level. Mathematical models were further constructed to extrapolate the expansion of L. ferriphilum genomes (an ‘open’ pan-genome), indicating the emergence of novel genes with new sequenced genomes. The identification of diverse mobile genetic elements (MGEs) (such as transposases, integrases, and phage-associated genes) revealed the prevalence of horizontal gene transfer events, which is an important evolutionary mechanism that provides avenues for the recruitment of novel functionalities and further for the genetic divergence of microbial genomes. Comprehensive analysis also demonstrated that the genome reduction by gene loss in a broad sense might contribute to the observed diversification. We thus inferred a plausible explanation to address this observation: the community-dependent adaptation that potentially economizes the limiting resources of the entire community. Now that the introduction of new genes is accompanied by a parallel abandonment of some other ones, our results provide snapshots on the biological fitness cost of environmental adaptation within the L. ferriphilum genomes. In short, our genome-wide analyses bridge the relation between genetic variation of L. ferriphilum with its evolutionary adaptation. PMID:29636744
Pantoea ananatis Genetic Diversity Analysis Reveals Limited Genomic Diversity as Well as Accessory Genes Correlated with Onion Pathogenicity

PubMed Central

Stice, Shaun P.; Stumpf, Spencer D.; Gitaitis, Ron D.; Kvitko, Brian H.; Dutta, Bhabesh

2018-01-01

Pantoea ananatis is a member of the family Enterobacteriaceae and an enigmatic plant pathogen with a broad host range. Although P. ananatis strains can be aggressive on onion causing foliar necrosis and onion center rot, previous genomic analysis has shown that P. ananatis lacks the primary virulence secretion systems associated with other plant pathogens. We assessed a collection of fifty P. ananatis strains collected from Georgia over three decades to determine genetic factors that correlated with onion pathogenic potential. Previous genetic analysis studies have compared strains isolated from different hosts with varying diseases potential and isolation sources. Strains varied greatly in their pathogenic potential and aggressiveness on different cultivated Allium species like onion, leek, shallot, and chive. Using multi-locus sequence analysis (MLSA) and repetitive extragenic palindrome repeat (rep)-PCR techniques, we did not observe any correlation between onion pathogenic potential and genetic diversity among strains. Whole genome sequencing and pan-genomic analysis of a sub-set of 10 strains aided in the identification of a novel series of genetic regions, likely plasmid borne, and correlating with onion pathogenicity observed on single contigs of the genetic assemblies. We named these loci Onion Virulence Regions (OVR) A-D. The OVR loci contain genes involved in redox regulation as well as pectate lyase and rhamnogalacturonase genes. Previous studies have not identified distinct genetic loci or plasmids correlating with onion foliar pathogenicity or pathogenicity on a single host pathosystem. The lack of focus on a single host system for this phytopathgenic disease necessitates the pan-genomic analysis performed in this study. PMID:29491851
Polysaccharide utilization loci and nutritional specialization in a dominant group of butyrate-producing human colonic Firmicutes

PubMed Central

O. Sheridan, Paul; Martin, Jennifer C.; Lawley, Trevor D.; Browne, Hilary P.; Harris, Hugh M. B.; Bernalier-Donadille, Annick; Duncan, Sylvia H.; O'Toole, Paul W.; J. Flint, Harry

2016-01-01

Firmicutes and Bacteroidetes are the predominant bacterial phyla colonizing the healthy human large intestine. Whilst both ferment dietary fibre, genes responsible for this important activity have been analysed only in the Bacteroidetes, with very little known about the Firmicutes. This work investigates the carbohydrate-active enzymes (CAZymes) in a group of Firmicutes, Roseburia spp. and Eubacterium rectale, which play an important role in producing butyrate from dietary carbohydrates and in health maintenance. Genome sequences of 11 strains representing E. rectale and four Roseburia spp. were analysed for carbohydrate-active genes. Following assembly into a pan-genome, core, variable and unique genes were identified. The 1840 CAZyme genes identified in the pan-genome were assigned to 538 orthologous groups, of which only 26 were present in all strains, indicating considerable inter-strain variability. This analysis was used to categorize the 11 strains into four carbohydrate utilization ecotypes (CUEs), which were shown to correspond to utilization of different carbohydrates for growth. Many glycoside hydrolase genes were found linked to genes encoding oligosaccharide transporters and regulatory elements in the genomes of Roseburia spp. and E. rectale, forming distinct polysaccharide utilization loci (PULs). Whilst PULs are also a common feature in Bacteroidetes, key differences were noted in these Firmicutes, including the absence of close homologues of Bacteroides polysaccharide utilization genes, hence we refer to Gram-positive PULs (gpPULs). Most CAZyme genes in the Roseburia/E. rectale group are organized into gpPULs. Variation in gpPULs can explain the high degree of nutritional specialization at the species level within this group. PMID:28348841
Comparative genomics of the Bifidobacterium breve taxon.

PubMed

Bottacini, Francesca; O'Connell Motherway, Mary; Kuczynski, Justin; O'Connell, Kerry Joan; Serafini, Fausta; Duranti, Sabrina; Milani, Christian; Turroni, Francesca; Lugli, Gabriele Andrea; Zomer, Aldert; Zhurina, Daria; Riedel, Christian; Ventura, Marco; van Sinderen, Douwe

2014-03-01

Bifidobacteria are commonly found as part of the microbiota of the gastrointestinal tract (GIT) of a broad range of hosts, where their presence is positively correlated with the host's health status. In this study, we assessed the genomes of thirteen representatives of Bifidobacterium breve, which is not only a frequently encountered component of the (adult and infant) human gut microbiota, but can also be isolated from human milk and vagina. In silico analysis of genome sequences from thirteen B. breve strains isolated from different environments (infant and adult faeces, human milk, human vagina) shows that the genetic variability of this species principally consists of hypothetical genes and mobile elements, but, interestingly, also genes correlated with the adaptation to host environment and gut colonization. These latter genes specify the biosynthetic machinery for sortase-dependent pili and exopolysaccharide production, as well as genes that provide protection against invasion of foreign DNA (i.e. CRISPR loci and restriction/modification systems), and genes that encode enzymes responsible for carbohydrate fermentation. Gene-trait matching analysis showed clear correlations between known metabolic capabilities and characterized genes, and it also allowed the identification of a gene cluster involved in the utilization of the alcohol-sugar sorbitol. Genome analysis of thirteen representatives of the B. breve species revealed that the deduced pan-genome exhibits an essentially close trend. For this reason our analyses suggest that this number of B. breve representatives is sufficient to fully describe the pan-genome of this species. Comparative genomics also facilitated the genetic explanation for differential carbon source utilization phenotypes previously observed in different strains of B. breve.
Genetic Competence Drives Genome Diversity in Bacillus subtilis

PubMed Central

Chevreux, Bastien; Serra, Cláudia R; Schyns, Ghislain; Henriques, Adriano O

2018-01-01

Abstract Prokaryote genomes are the result of a dynamic flux of genes, with increases achieved via horizontal gene transfer and reductions occurring through gene loss. The ecological and selective forces that drive this genomic flexibility vary across species. Bacillus subtilis is a naturally competent bacterium that occupies various environments, including plant-associated, soil, and marine niches, and the gut of both invertebrates and vertebrates. Here, we quantify the genomic diversity of B. subtilis and infer the genome dynamics that explain the high genetic and phenotypic diversity observed. Phylogenomic and comparative genomic analyses of 42 B. subtilis genomes uncover a remarkable genome diversity that translates into a core genome of 1,659 genes and an asymptotic pangenome growth rate of 57 new genes per new genome added. This diversity is due to a large proportion of low-frequency genes that are acquired from closely related species. We find no gene-loss bias among wild isolates, which explains why the cloud genome, 43% of the species pangenome, represents only a small proportion of each genome. We show that B. subtilis can acquire xenologous copies of core genes that propagate laterally among strains within a niche. While not excluding the contributions of other mechanisms, our results strongly suggest a process of gene acquisition that is largely driven by competence, where the long-term maintenance of acquired genes depends on local and global fitness effects. This competence-driven genomic diversity provides B. subtilis with its generalist character, enabling it to occupy a wide range of ecological niches and cycle through them. PMID:29272410
Pan-Genomic Approaches in Lactobacillus reuteri as a Porcine Probiotic: Investigation of Host Adaptation and Antipathogenic Activity.

PubMed

Lee, Jun-Yeong; Han, Geon Goo; Choi, Jaeyun; Jin, Gwi-Deuk; Kang, Sang-Kee; Chae, Byung Jo; Kim, Eun Bae; Choi, Yun-Jaie

2017-10-01

After the introduction of a ban on the use of antibiotic growth promoters (AGPs) for livestock, reuterin-producing Lactobacillus reuteri is getting attention as an alternative to AGPs. In this study, we investigated genetic features of L. reuteri associated with host specificity and antipathogenic effect. We isolated 104 L. reuteri strains from porcine feces, and 16 strains, composed of eight strains exhibiting the higher antipathogenic effect (group HS) and eight strains exhibiting the lower effect (group LS), were selected for genomic comparison. We generated draft genomes of the 16 isolates and investigated their pan-genome together with the 26 National Center for Biotechnology Information-registered genomes. L. reuteri genomes organized six clades with multi-locus sequence analysis, and the clade IV includes the 16 isolates. First, we identified six L. reuteri clade IV-specific genes including three hypothetical protein-coding genes. The three annotated genes encode transposases and cell surface proteins, indicating that these genes are the result of adaptation to the host gastrointestinal epithelia and that these host-specific traits were acquired by horizontal gene transfer. We also identified differences between groups HS and LS in the pdu-cbi-cob-hem gene cluster, which is essential for reuterin and cobalamin synthesis, and six genes specific to group HS are revealed. While the strains of group HS possessed all genes of this cluster, LS strains have lost many genes of the cluster. This study provides a deeper understanding of the relationship between probiotic properties and genomic features of L. reuteri.
Efficient Delaunay Tessellation through K-D Tree Decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

Delaunay tessellations are fundamental data structures in computational geometry. They are important in data analysis, where they can represent the geometry of a point set or approximate its density. The algorithms for computing these tessellations at scale perform poorly when the input data is unbalanced. We investigate the use of k-d trees to evenly distribute points among processes and compare two strategies for picking split points between domain regions. Because resulting point distributions no longer satisfy the assumptions of existing parallel Delaunay algorithms, we develop a new parallel algorithm that adapts to its input and prove its correctness. We evaluatemore » the new algorithm using two late-stage cosmology datasets. The new running times are up to 50 times faster using k-d tree compared with regular grid decomposition. Moreover, in the unbalanced data sets, decomposing the domain into a k-d tree is up to five times faster than decomposing it into a regular grid.« less
Analytical simulation and PROFAT II: a new methodology and a computer automated tool for fault tree analysis in chemical process industries.

PubMed

Khan, F I; Abbasi, S A

2000-07-10

Fault tree analysis (FTA) is based on constructing a hypothetical tree of base events (initiating events) branching into numerous other sub-events, propagating the fault and eventually leading to the top event (accident). It has been a powerful technique used traditionally in identifying hazards in nuclear installations and power industries. As the systematic articulation of the fault tree is associated with assigning probabilities to each fault, the exercise is also sometimes called probabilistic risk assessment. But powerful as this technique is, it is also very cumbersome and costly, limiting its area of application. We have developed a new algorithm based on analytical simulation (named as AS-II), which makes the application of FTA simpler, quicker, and cheaper; thus opening up the possibility of its wider use in risk assessment in chemical process industries. Based on the methodology we have developed a computer-automated tool. The details are presented in this paper.
Joint amalgamation of most parsimonious reconciled gene trees

PubMed Central

Scornavacca, Celine; Jacox, Edwin; Szöllősi, Gergely J.

2015-01-01

Motivation: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods—generally computationally more efficient—require a prior estimate of parameters and of the statistical support. Results: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events. Availability and implementation: The algorithm is implemented in our program TERA, which is freely available from http://mbb.univ-montp2.fr/MBB/download_sources/16__TERA. Contact: celine.scornavacca@univ-montp2.fr, ssolo@angel.elte.hu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25380957
Forest Stand Segmentation Using Airborne LIDAR Data and Very High Resolution Multispectral Imagery

NASA Astrophysics Data System (ADS)

Dechesne, Clément; Mallet, Clément; Le Bris, Arnaud; Gouet, Valérie; Hervieu, Alexandre

2016-06-01

Forest stands are the basic units for forest inventory and mapping. Stands are large forested areas (e.g., ≥ 2 ha) of homogeneous tree species composition. The accurate delineation of forest stands is usually performed by visual analysis of human operators on very high resolution (VHR) optical images. This work is highly time consuming and should be automated for scalability purposes. In this paper, a method based on the fusion of airborne laser scanning data (or lidar) and very high resolution multispectral imagery for automatic forest stand delineation and forest land-cover database update is proposed. The multispectral images give access to the tree species whereas 3D lidar point clouds provide geometric information on the trees. Therefore, multi-modal features are computed, both at pixel and object levels. The objects are individual trees extracted from lidar data. A supervised classification is performed at the object level on the computed features in order to coarsely discriminate the existing tree species in the area of interest. The analysis at tree level is particularly relevant since it significantly improves the tree species classification. A probability map is generated through the tree species classification and inserted with the pixel-based features map in an energetical framework. The proposed energy is then minimized using a standard graph-cut method (namely QPBO with α-expansion) in order to produce a segmentation map with a controlled level of details. Comparison with an existing forest land cover database shows that our method provides satisfactory results both in terms of stand labelling and delineation (matching ranges between 94% and 99%).
Decision-Tree Formulation With Order-1 Lateral Execution

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive-disjunctive paths to a flattened conjunctive form composed only of equality checks when possible. If each reduced conjunctive form contains only equality checks and all of these forms use the same variables, then the decision tree can be reduced to an order-one operation through a table lookup. The speedup to order one is accomplished by distributing each decision variable over a surface of a multidimensional object by mapping the equality constant to an index
C-semiring Frameworks for Minimum Spanning Tree Problems

NASA Astrophysics Data System (ADS)

Bistarelli, Stefano; Santini, Francesco

In this paper we define general algebraic frameworks for the Minimum Spanning Tree problem based on the structure of c-semirings. We propose general algorithms that can compute such trees by following different cost criteria, which must be all specific instantiation of c-semirings. Our algorithms are extensions of well-known procedures, as Prim or Kruskal, and show the expressivity of these algebraic structures. They can deal also with partially-ordered costs on the edges.
An Ideal Design for an Idea Processor.

DTIC Science & Technology

1986-09-01

make it a strong, healthy tree --all this, even as he was busily tending other plants in the garden. I will always remember his unique blend of experience...Structures Hierarchies. One of the most important structures of .Y. knowledge for communication is the hierarchy or tree (16:223). Friendly describes...typically used for documenting the tree -like structure of computer programs--to help structure expository writing (14:197). Figures 3 and 4 present examples
Avalaunch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moody, A. T.

2014-12-26

Avalaunch implements a tree-based process launcher. It first bootstraps itself on to a set of compute nodes by launching children processes, which immediately connect back to the parent process to acquire info needed t launch their own children. Once the tree is established, user processes are started by broadcasting commands and application binaries through the tree. All communication flows over high-performance network protocols via spawnnet. The goal is to start MPI jobs having hundreds of thousands of processes within seconds.
Computational path planner for product assembly in complex environments

NASA Astrophysics Data System (ADS)

Shang, Wei; Liu, Jianhua; Ning, Ruxin; Liu, Mi

2013-03-01

Assembly path planning is a crucial problem in assembly related design and manufacturing processes. Sampling based motion planning algorithms are used for computational assembly path planning. However, the performance of such algorithms may degrade much in environments with complex product structure, narrow passages or other challenging scenarios. A computational path planner for automatic assembly path planning in complex 3D environments is presented. The global planning process is divided into three phases based on the environment and specific algorithms are proposed and utilized in each phase to solve the challenging issues. A novel ray test based stochastic collision detection method is proposed to evaluate the intersection between two polyhedral objects. This method avoids fake collisions in conventional methods and degrades the geometric constraint when a part has to be removed with surface contact with other parts. A refined history based rapidly-exploring random tree (RRT) algorithm which bias the growth of the tree based on its planning history is proposed and employed in the planning phase where the path is simple but the space is highly constrained. A novel adaptive RRT algorithm is developed for the path planning problem with challenging scenarios and uncertain environment. With extending values assigned on each tree node and extending schemes applied, the tree can adapts its growth to explore complex environments more efficiently. Experiments on the key algorithms are carried out and comparisons are made between the conventional path planning algorithms and the presented ones. The comparing results show that based on the proposed algorithms, the path planner can compute assembly path in challenging complex environments more efficiently and with higher success. This research provides the references to the study of computational assembly path planning under complex environments.
Tanglegrams for rooted phylogenetic trees and networks

PubMed Central

Scornavacca, Celine; Zickmann, Franziska; Huson, Daniel H.

2011-01-01

Motivation: In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. Results: In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. Availability: The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. Contact: scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de PMID:21685078
Rooting phylogenetic trees under the coalescent model using site pattern probabilities.

PubMed

Tian, Yuan; Kubatko, Laura

2017-12-19

Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.

Program listing for fault tree analysis of JPL technical report 32-1542

NASA Technical Reports Server (NTRS)

Chelson, P. O.

1971-01-01

The computer program listing for the MAIN program and those subroutines unique to the fault tree analysis are described. Some subroutines are used for analyzing the reliability block diagram. The program is written in FORTRAN 5 and is running on a UNIVAC 1108.
Finding Minimum-Power Broadcast Trees for Wireless Networks

NASA Technical Reports Server (NTRS)

Arabshahi, Payman; Gray, Andrew; Das, Arindam; El-Sharkawi, Mohamed; Marks, Robert, II

2004-01-01

Some algorithms have been devised for use in a method of constructing tree graphs that represent connections among the nodes of a wireless communication network. These algorithms provide for determining the viability of any given candidate connection tree and for generating an initial set of viable trees that can be used in any of a variety of search algorithms (e.g., a genetic algorithm) to find a tree that enables the network to broadcast from a source node to all other nodes while consuming the minimum amount of total power. The method yields solutions better than those of a prior algorithm known as the broadcast incremental power algorithm, albeit at a slightly greater computational cost.
Group theoretic approach to the perturbative string S-matrix

NASA Astrophysics Data System (ADS)

Neveu, A.; West, P.

1987-07-01

A new approach to the computation of string scattering is given. From duality, unitarity and a generic overlap property, we determine entirely the N-string amplitude, including the integration measure, and its gauge properties. The techniques do not use any oscillator algebra, but the computation is reduced to a straightforward exercise in conformal group theory. This can be applied to fermionic trees and multiloop diagrams, but in this paper it is demonstrated on the open bosonic tree. Permanent address: Mathematics Department, King's College, Strand, London WC2R 2LS, UK.
Computing and visualizing time-varying merge trees for high-dimensional data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oesterling, Patrick; Heine, Christian; Weber, Gunther H.

2017-06-03

We introduce a new method that identifies and tracks features in arbitrary dimensions using the merge tree -- a structure for identifying topological features based on thresholding in scalar fields. This method analyzes the evolution of features of the function by tracking changes in the merge tree and relates features by matching subtrees between consecutive time steps. Using the time-varying merge tree, we present a structural visualization of the changing function that illustrates both features and their temporal evolution. We demonstrate the utility of our approach by applying it to temporal cluster analysis of high-dimensional point clouds.
Generalized Processing Tree Models: Jointly Modeling Discrete and Continuous Variables.

PubMed

Heck, Daniel W; Erdfelder, Edgar; Kieslich, Pascal J

2018-05-24

Multinomial processing tree models assume that discrete cognitive states determine observed response frequencies. Generalized processing tree (GPT) models extend this conceptual framework to continuous variables such as response times, process-tracing measures, or neurophysiological variables. GPT models assume finite-mixture distributions, with weights determined by a processing tree structure, and continuous components modeled by parameterized distributions such as Gaussians with separate or shared parameters across states. We discuss identifiability, parameter estimation, model testing, a modeling syntax, and the improved precision of GPT estimates. Finally, a GPT version of the feature comparison model of semantic categorization is applied to computer-mouse trajectories.
PhyloDet: a scalable visualization tool for mapping multiple traits to large evolutionary trees

PubMed Central

Lee, Bongshin; Nachmanson, Lev; Robertson, George; Carlson, Jonathan M.; Heckerman, David

2009-01-01

Summary: Evolutionary biologists are often interested in finding correlations among biological traits across a number of species, as such correlations may lead to testable hypotheses about the underlying function. Because some species are more closely related than others, computing and visualizing these correlations must be done in the context of the evolutionary tree that relates species. In this note, we introduce PhyloDet (short for PhyloDetective), an evolutionary tree visualization tool that enables biologists to visualize multiple traits mapped to the tree. Availability: http://research.microsoft.com/cue/phylodet/ Contact: bongshin@microsoft.com. PMID:19633096
PHoToNs–A parallel heterogeneous and threads oriented code for cosmological N-body simulation

NASA Astrophysics Data System (ADS)

Wang, Qiao; Cao, Zong-Yan; Gao, Liang; Chi, Xue-Bin; Meng, Chen; Wang, Jie; Wang, Long

2018-06-01

We introduce a new code for cosmological simulations, PHoToNs, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer (HPC) systems and threads oriented programming. PHoToNs adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh (PM) algorithm to compute the long-range force, the Tree algorithm to compute the short range force and the direct summation Particle-Particle (PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHoToNs scales well and efficiency of the PP kernel achieves 68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.
Modeling ground-based timber harvesting systems using computer simulation

Treesearch

Jingxin Wang; Chris B. LeDoux

2001-01-01

Modeling ground-based timber harvesting systems with an object-oriented methodology was investigated. Object-oriented modeling and design promote a better understanding of requirements, cleaner designs, and better maintainability of the harvesting simulation system. The model developed simulates chainsaw felling, drive-to-tree feller-buncher, swing-to-tree single-grip...
Assessing the benefits and economic values of trees

Treesearch

David J. Nowak

2017-01-01

Understanding the environmental, economic, and social/community benefits of nature, in particular trees and forests, can lead to better vegetation management and designs to optimize environmental quality and human health for current and future generations. Computer models have been developed to assess forest composition and its associated effects on environmental...
Evaluation of a laser scanning sensor for variable-rate tree sprayer development

USDA-ARS?s Scientific Manuscript database

Accurate canopy measurement capabilities are prerequisites to automate variable-rate sprayers. A 270° radial range laser scanning sensor was tested for its scanning accuracy to detect tree canopy profiles. Signals from the laser sensor and a ground speed sensor were processed with an embedded comput...
Genome Surfing As Driver of Microbial Genomic Diversity.

PubMed

Choudoir, Mallory J; Panke-Buisse, Kevin; Andam, Cheryl P; Buckley, Daniel H

2017-08-01

Historical changes in population size, such as those caused by demographic range expansions, can produce nonadaptive changes in genomic diversity through mechanisms such as gene surfing. We propose that demographic range expansion of a microbial population capable of horizontal gene exchange can result in genome surfing, a mechanism that can cause widespread increase in the pan-genome frequency of genes acquired by horizontal gene exchange. We explain that patterns of genetic diversity within Streptomyces are consistent with genome surfing, and we describe several predictions for testing this hypothesis both in Streptomyces and in other microorganisms. Copyright © 2017 Elsevier Ltd. All rights reserved.
A restricted Steiner tree problem is solved by Geometric Method II

NASA Astrophysics Data System (ADS)

Lin, Dazhi; Zhang, Youlin; Lu, Xiaoxu

2013-03-01

The minimum Steiner tree problem has wide application background, such as transportation system, communication network, pipeline design and VISL, etc. It is unfortunately that the computational complexity of the problem is NP-hard. People are common to find some special problems to consider. In this paper, we first put forward a restricted Steiner tree problem, which the fixed vertices are in the same side of one line L and we find a vertex on L such the length of the tree is minimal. By the definition and the complexity of the Steiner tree problem, we know that the complexity of this problem is also Np-complete. In the part one, we have considered there are two fixed vertices to find the restricted Steiner tree problem. Naturally, we consider there are three fixed vertices to find the restricted Steiner tree problem. And we also use the geometric method to solve such the problem.
The Fault Tree Compiler (FTC): Program and mathematics

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Martensen, Anna L.

1989-01-01

The Fault Tree Compiler Program is a new reliability tool used to predict the top-event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, AND m OF n gates. The high-level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precisely (within the limits of double precision floating point arithmetic) within a user specified number of digits accuracy. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Equipment Corporation (DEC) VAX computer with the VMS operation system.
Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

PubMed

Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

2014-11-25

The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis. Copyright © 2014 Kang et al.
CartograTree: connecting tree genomes, phenotypes and environment.

PubMed

Vasquez-Gross, Hans A; Yu, John J; Figueroa, Ben; Gessler, Damian D G; Neale, David B; Wegrzyn, Jill L

2013-05-01

Today, researchers spend a tremendous amount of time gathering, formatting, filtering and visualizing data collected from disparate sources. Under the umbrella of forest tree biology, we seek to provide a platform and leverage modern technologies to connect biotic and abiotic data. Our goal is to provide an integrated web-based workspace that connects environmental, genomic and phenotypic data via geo-referenced coordinates. Here, we connect the genomic query web-based workspace, DiversiTree and a novel geographical interface called CartograTree to data housed on the TreeGenes database. To accomplish this goal, we implemented Simple Semantic Web Architecture and Protocol to enable the primary genomics database, TreeGenes, to communicate with semantic web services regardless of platform or back-end technologies. The novelty of CartograTree lies in the interactive workspace that allows for geographical visualization and engagement of high performance computing (HPC) resources. The application provides a unique tool set to facilitate research on the ecology, physiology and evolution of forest tree species. CartograTree can be accessed at: http://dendrome.ucdavis.edu/cartogratree. © 2013 Blackwell Publishing Ltd.
High-Throughput 3-D Monitoring of Agricultural-Tree Plantations with Unmanned Aerial Vehicle (UAV) Technology

PubMed Central

Torres-Sánchez, Jorge; López-Granados, Francisca; Serrano, Nicolás; Arquero, Octavio; Peña, José M.

2015-01-01

The geometric features of agricultural trees such as canopy area, tree height and crown volume provide useful information about plantation status and crop production. However, these variables are mostly estimated after a time-consuming and hard field work and applying equations that treat the trees as geometric solids, which produce inconsistent results. As an alternative, this work presents an innovative procedure for computing the 3-dimensional geometric features of individual trees and tree-rows by applying two consecutive phases: 1) generation of Digital Surface Models with Unmanned Aerial Vehicle (UAV) technology and 2) use of object-based image analysis techniques. Our UAV-based procedure produced successful results both in single-tree and in tree-row plantations, reporting up to 97% accuracy on area quantification and minimal deviations compared to in-field estimations of tree heights and crown volumes. The maps generated could be used to understand the linkages between tree grown and field-related factors or to optimize crop management operations in the context of precision agriculture with relevant agro-environmental implications. PMID:26107174
High-Throughput 3-D Monitoring of Agricultural-Tree Plantations with Unmanned Aerial Vehicle (UAV) Technology.

PubMed

Torres-Sánchez, Jorge; López-Granados, Francisca; Serrano, Nicolás; Arquero, Octavio; Peña, José M

2015-01-01

The geometric features of agricultural trees such as canopy area, tree height and crown volume provide useful information about plantation status and crop production. However, these variables are mostly estimated after a time-consuming and hard field work and applying equations that treat the trees as geometric solids, which produce inconsistent results. As an alternative, this work presents an innovative procedure for computing the 3-dimensional geometric features of individual trees and tree-rows by applying two consecutive phases: 1) generation of Digital Surface Models with Unmanned Aerial Vehicle (UAV) technology and 2) use of object-based image analysis techniques. Our UAV-based procedure produced successful results both in single-tree and in tree-row plantations, reporting up to 97% accuracy on area quantification and minimal deviations compared to in-field estimations of tree heights and crown volumes. The maps generated could be used to understand the linkages between tree grown and field-related factors or to optimize crop management operations in the context of precision agriculture with relevant agro-environmental implications.
Investigation and Implementation of a Tree Transformation System for User Friendly Programming.

DTIC Science & Technology

1984-12-01

systems have become an important area of research because of theiL direct impact on all areas of computer science such as software engineering ...RD-i52 716 INVESTIGTIN AND IMPLEMENTATION OF A TREE I/2TRANSFORMATION SYSTEM FOR USER FRIENDLY PROGRAMMING (U) NAVAL POSTGRADUATE SCHOOL MONTEREY CA...Implementation of a Master’s Thesis Tree Transformation System for User December 1984 Friendly Programming 6. PERFORMING ORG. REPORT NUMBER 7. AU~THOR(s) S
The Design of a Polymorphous Cognitive Agent Architecture (PCAA)

DTIC Science & Technology

2008-05-01

tree, and search agents may search the tree for documents or clusters, depositing pheromones on the way down the tree. 46 19 The quality of SODAS...location in the lattice is a node, connected to its neighbors by links, and agents roam across the lattice, depositing pheromones . 49 21 A possible FPGA...provided by swarming, and also figure out a way for learning in ACT-R to trickle down to swarming computations, e.g., through the pheromones . Integration
Simulation of Tsunami Resistance of a Pinus Thunbergii tree in Coastal Forest in Japan

NASA Astrophysics Data System (ADS)

Nanko, K.; Suzuki, S.; Noguchi, H.; Hagino, H.

2015-12-01

Forests reduce fluid force of tsunami, whereas extreme tsunami sometimes breaks down the forest trees. It is difficult to estimate the interactive relationship between the fluid and the trees because fluid deform tree architecture and deformed tree changes flow field. Dynamic tree deformation and fluid behavior should be clarified by fluid-structure interaction analysis. For the initial step, we have developed dynamic simulation of tree sway and breakage caused by tsunami based on a vibrating system with multiple degrees of freedom. The target specie of the simulation was Japanese black pine (pinus thunbergii), which is major specie in the coastal forest to secure livelihood area from the damage by blown sand and salt in Japanese coastal area. For the simulation, a tree was segmented into 0.2 m long circular truncated cones. Turning moment induced by tsunami and self-weight was calculated at each segment bottom. Tree deformation was computed on multi-degree-of-freedom vibration equation. Tree sway was simulated by iterative calculation of the tree deformation with time step 0.05 second with temporally varied flow velocity of tsunami. From the calculation of bending stress and turning moment at tree base, we estimated resistance of a Pinus thunbergii tree from tsunami against tree breakage.

Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

PubMed

Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

2005-01-01

Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.
Phylodynamic Inference with Kernel ABC and Its Application to HIV Epidemiology.

PubMed

Poon, Art F Y

2015-09-01

The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Performance analysis of a dual-tree algorithm for computing spatial distance histograms

PubMed Central

Chen, Shaoping; Tu, Yi-Cheng; Xia, Yuni

2011-01-01

Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery. Recently, algorithms for efficient SDH processing in large-scale scientific databases have been proposed. These algorithms adopt a recursive tree-traversing strategy to process point-to-point distances in the visited tree nodes in batches, thus require less time when compared to the brute-force approach where all pairwise distances have to be computed. Despite the promising experimental results, the complexity of such algorithms has not been thoroughly studied. In this paper, we present an analysis of such algorithms based on a geometric modeling approach. The main technique is to transform the analysis of point counts into a problem of quantifying the area of regions where pairwise distances can be processed in batches by the algorithm. From the analysis, we conclude that the number of pairwise distances that are left to be processed decreases exponentially with more levels of the tree visited. This leads to the proof of a time complexity lower than the quadratic time needed for a brute-force algorithm and builds the foundation for a constant-time approximate algorithm. Our model is also general in that it works for a wide range of point spatial distributions, histogram types, and space-partitioning options in building the tree. PMID:21804753
Semiautomated skeletonization of the pulmonary arterial tree in micro-CT images

NASA Astrophysics Data System (ADS)

Hanger, Christopher C.; Haworth, Steven T.; Molthen, Robert C.; Dawson, Christopher A.

2001-05-01

We present a simple and robust approach that utilizes planar images at different angular rotations combined with unfiltered back-projection to locate the central axes of the pulmonary arterial tree. Three-dimensional points are selected interactively by the user. The computer calculates a sub- volume unfiltered back-projection orthogonal to the vector connecting the two points and centered on the first point. Because more x-rays are absorbed at the thickest portion of the vessel, in the unfiltered back-projection, the darkest pixel is assumed to be the center of the vessel. The computer replaces this point with the newly computer-calculated point. A second back-projection is calculated around the original point orthogonal to a vector connecting the newly-calculated first point and user-determined second point. The darkest pixel within the reconstruction is determined. The computer then replaces the second point with the XYZ coordinates of the darkest pixel within this second reconstruction. Following a vector based on a moving average of previously determined 3- dimensional points along the vessel's axis, the computer continues this skeletonization process until stopped by the user. The computer estimates the vessel diameter along the set of previously determined points using a method similar to the full width-half max algorithm. On all subsequent vessels, the process works the same way except that at each point, distances between the current point and all previously determined points along different vessels are determined. If the difference is less than the previously estimated diameter, the vessels are assumed to branch. This user/computer interaction continues until the vascular tree has been skeletonized.
The Garden Banks 388 horizontal tree design and development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Granhaug, O.; Soul, J.

1995-12-31

This paper describes the Horizontal Subsea Production Tree System, later referred to as a SpoolTree{trademark}, developed for the Enserch Garden Banks 388 field in the Gulf of Mexico. The paper starts with a project overview followed by a comparison between the SpoolTree and the Conventional Tree design. A brief discussion explains why Enserch elected to use the SpoolTree for this field development, including available technology, workover frequency, cost etc. The rigorous safety analysis carried out for the subsea production equipment is then explained in depth. The paper continues with a technical discussion of the main features specific to the SpoolTreemore » design and the Garden Banks 388 field development. Issues discussed include the SpoolTree itself, BOP Adapter Plate (for control during installation, workover and production), Tubing Hanger and pressure barrier design, debris cap design, downhole communication (SCSSV, chemical injection, pressure and temperature) ROV intervention, template wellbay insert design and other relevant issues. The use of computer based 3-D modelling tool is also briefly described. The experience and results described in this paper have direct application to numerous subsea development prospects worldwide, particularly in deep water. In addition, the ``system development`` aspect of the project is relevant to most marine equipment development projects. This includes the use of safety analysis techniques, 3-D computer modelling tools and clearly defined engineering procedures. A full account of the final design configuration of the SpoolTree system is given in the paper. A summary of the experience gained during the extensive testing at the factory and during the template integration tests is also provided.« less
Computer-aided design of microvasculature systems for use in vascular scaffold production.

PubMed

Mondy, William Lafayette; Cameron, Don; Timmermans, Jean-Pierre; De Clerck, Nora; Sasov, Alexander; Casteleyn, Christophe; Piegl, Les A

2009-09-01

In vitro biomedical engineering of intact, functional vascular networks, which include capillary structures, is a prerequisite for adequate vascular scaffold production. Capillary structures are necessary since they provide the elements and compounds for the growth, function and maintenance of 3D tissue structures. Computer-aided modeling of stereolithographic (STL) micro-computer tomographic (micro-CT) 3D models is a technique that enables us to mimic the design of vascular tree systems containing capillary beds, found in tissues. In our first paper (Mondy et al 2009 Tissue Eng. at press), using micro-CT, we studied the possibility of using vascular tissues to produce data capable of aiding the design of vascular tree scaffolding, which would help in the reverse engineering of a complete vascular tree system including capillary bed structures. In this paper, we used STL models of large datasets of computer-aided design (CAD) data of vascular structures which contained capillary structures that mimic those in the dermal layers of rabbit skin. Using CAD software we created from 3D STL models a bio-CAD design for the development of capillary-containing vascular tree scaffolding for skin. This method is designed to enhance a variety of therapeutic protocols including, but not limited to, organ and tissue repair, systemic disease mediation and cell/tissue transplantation therapy. Our successful approach to in vitro vasculogenesis will allow the bioengineering of various other types of 3D tissue structures, and as such greatly expands the potential applications of biomedical engineering technology into the fields of biomedical research and medicine.
Global Value Trees

PubMed Central

Zhu, Zhen; Puliga, Michelangelo; Cerina, Federica; Chessa, Alessandro; Riccaboni, Massimo

2015-01-01

The fragmentation of production across countries has become an important feature of the globalization in recent decades and is often conceptualized by the term “global value chains” (GVCs). When empirically investigating the GVCs, previous studies are mainly interested in knowing how global the GVCs are rather than how the GVCs look like. From a complex networks perspective, we use the World Input-Output Database (WIOD) to study the evolution of the global production system. We find that the industry-level GVCs are indeed not chain-like but are better characterized by the tree topology. Hence, we compute the global value trees (GVTs) for all the industries available in the WIOD. Moreover, we compute an industry importance measure based on the GVTs and compare it with other network centrality measures. Finally, we discuss some future applications of the GVTs. PMID:25978067
Coherent multiscale image processing using dual-tree quaternion wavelets.

PubMed

Chan, Wai Lam; Choi, Hyeokho; Baraniuk, Richard G

2008-07-01

The dual-tree quaternion wavelet transform (QWT) is a new multiscale analysis tool for geometric image features. The QWT is a near shift-invariant tight frame representation whose coefficients sport a magnitude and three phases: two phases encode local image shifts while the third contains image texture information. The QWT is based on an alternative theory for the 2-D Hilbert transform and can be computed using a dual-tree filter bank with linear computational complexity. To demonstrate the properties of the QWT's coherent magnitude/phase representation, we develop an efficient and accurate procedure for estimating the local geometrical structure of an image. We also develop a new multiscale algorithm for estimating the disparity between a pair of images that is promising for image registration and flow estimation applications. The algorithm features multiscale phase unwrapping, linear complexity, and sub-pixel estimation accuracy.
Using minimal spanning trees to compare the reliability of network topologies

NASA Technical Reports Server (NTRS)

Leister, Karen J.; White, Allan L.; Hayhurst, Kelly J.

1990-01-01

Graph theoretic methods are applied to compute the reliability for several types of networks of moderate size. The graph theory methods used are minimal spanning trees for networks with bi-directional links and the related concept of strongly connected directed graphs for networks with uni-directional links. A comparison is conducted of ring networks and braided networks. The case is covered where just the links fail and the case where both links and nodes fail. Two different failure modes for the links are considered. For one failure mode, the link no longer carries messages. For the other failure mode, the link delivers incorrect messages. There is a description and comparison of link-redundancy versus path-redundancy as methods to achieve reliability. All the computations are carried out by means of a fault tree program.
MASTtreedist: visualization of tree space based on maximum agreement subtree.

PubMed

Huang, Hong; Li, Yongji

2013-01-01

Phylogenetic tree construction process might produce many candidate trees as the "best estimates." As the number of constructed phylogenetic trees grows, the need to efficiently compare their topological or physical structures arises. One of the tree comparison's software tools, the Mesquite's Tree Set Viz module, allows the rapid and efficient visualization of the tree comparison distances using multidimensional scaling (MDS). Tree-distance measures, such as Robinson-Foulds (RF), for the topological distance among different trees have been implemented in Tree Set Viz. New and sophisticated measures such as Maximum Agreement Subtree (MAST) can be continuously built upon Tree Set Viz. MAST can detect the common substructures among trees and provide more precise information on the similarity of the trees, but it is NP-hard and difficult to implement. In this article, we present a practical tree-distance metric: MASTtreedist, a MAST-based comparison metric in Mesquite's Tree Set Viz module. In this metric, the efficient optimizations for the maximum weight clique problem are applied. The results suggest that the proposed method can efficiently compute the MAST distances among trees, and such tree topological differences can be translated as a scatter of points in two-dimensional (2D) space. We also provide statistical evaluation of provided measures with respect to RF-using experimental data sets. This new comparison module provides a new tree-tree pairwise comparison metric based on the differences of the number of MAST leaves among constructed phylogenetic trees. Such a new phylogenetic tree comparison metric improves the visualization of taxa differences by discriminating small divergences of subtree structures for phylogenetic tree reconstruction.
Computation of NLO processes involving heavy quarks using Loop-Tree Duality

NASA Astrophysics Data System (ADS)

Driencourt-Mangin, Félix

2017-03-01

We present a new method to compute higher-order corrections to physical cross-sections, at Next-to-Leading Order and beyond. This method, based on the Loop Tree Duality, leads to locally integrable expressions in four dimensions. By introducing a physically motivated momentum mapping between the momenta involved in the real and the virtual contributions, infrared singularities naturally cancel at integrand level, without the need to introduce subtraction counter-terms. Ultraviolet singularities are dealt with by using dual representations of suitable counter-terms, with some subtleties regarding the self-energy contributions. As an example, we apply this method to compute the 1 → 2 decay rate in the context of a scalar toy model with massive particles.
Node degree distribution in spanning trees

NASA Astrophysics Data System (ADS)

Pozrikidis, C.

2016-03-01

A method is presented for computing the number of spanning trees involving one link or a specified group of links, and excluding another link or a specified group of links, in a network described by a simple graph in terms of derivatives of the spanning-tree generating function defined with respect to the eigenvalues of the Kirchhoff (weighted Laplacian) matrix. The method is applied to deduce the node degree distribution in a complete or randomized set of spanning trees of an arbitrary network. An important feature of the proposed method is that the explicit construction of spanning trees is not required. It is shown that the node degree distribution in the spanning trees of the complete network is described by the binomial distribution. Numerical results are presented for the node degree distribution in square, triangular, and honeycomb lattices.
Estimating phylogenetic trees from genome-scale data.

PubMed

Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

2015-12-01

The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data. © 2015 New York Academy of Sciences.
Coalescent methods for estimating phylogenetic trees.

PubMed

Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V

2009-10-01

We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.
Development of a laser-guided embedded-computer-controlled air-assisted precision sprayer

USDA-ARS?s Scientific Manuscript database

An embedded computer-controlled, laser-guided, air-assisted, variable-rate precision sprayer was developed to automatically adjust spray outputs on both sides of the sprayer to match presence, size, shape, and foliage density of tree crops. The sprayer was the integration of an embedded computer, a ...
If BZ medium did spanning trees these would be the same trees as Physarum built

NASA Astrophysics Data System (ADS)

Adamatzky, Andrew

2009-03-01

A sub-excitable Belousov-Zhabotinsky (BZ) medium exhibits self-localized wave-fragments which may travel for relatively long time preserving their shape. Using Oregonator model of the BZ medium we imitate foraging behavior of a true slime mold, Physarum polycephalum, on a nutrient-poor substrate. We show that given erosion post-processing operations the BZ medium can approximate a spanning tree of a planar set and thus is computationally equivalent to Physarum in the domain of proximity graph construction.
SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

PubMed

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

PubMed Central

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
An efficient Adaptive Mesh Refinement (AMR) algorithm for the Discontinuous Galerkin method: Applications for the computation of compressible two-phase flows

NASA Astrophysics Data System (ADS)

Papoutsakis, Andreas; Sazhin, Sergei S.; Begg, Steven; Danaila, Ionut; Luddens, Francky

2018-06-01

We present an Adaptive Mesh Refinement (AMR) method suitable for hybrid unstructured meshes that allows for local refinement and de-refinement of the computational grid during the evolution of the flow. The adaptive implementation of the Discontinuous Galerkin (DG) method introduced in this work (ForestDG) is based on a topological representation of the computational mesh by a hierarchical structure consisting of oct- quad- and binary trees. Adaptive mesh refinement (h-refinement) enables us to increase the spatial resolution of the computational mesh in the vicinity of the points of interest such as interfaces, geometrical features, or flow discontinuities. The local increase in the expansion order (p-refinement) at areas of high strain rates or vorticity magnitude results in an increase of the order of accuracy in the region of shear layers and vortices. A graph of unitarian-trees, representing hexahedral, prismatic and tetrahedral elements is used for the representation of the initial domain. The ancestral elements of the mesh can be split into self-similar elements allowing each tree to grow branches to an arbitrary level of refinement. The connectivity of the elements, their genealogy and their partitioning are described by linked lists of pointers. An explicit calculation of these relations, presented in this paper, facilitates the on-the-fly splitting, merging and repartitioning of the computational mesh by rearranging the links of each node of the tree with a minimal computational overhead. The modal basis used in the DG implementation facilitates the mapping of the fluxes across the non conformal faces. The AMR methodology is presented and assessed using a series of inviscid and viscous test cases. Also, the AMR methodology is used for the modelling of the interaction between droplets and the carrier phase in a two-phase flow. This approach is applied to the analysis of a spray injected into a chamber of quiescent air, using the Eulerian-Lagrangian approach. This enables us to refine the computational mesh in the vicinity of the droplet parcels and accurately resolve the coupling between the two phases.
Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

PubMed

Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

2010-03-19

This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

The efficacy of using inventory data to develop optimal diameter increment models

Treesearch

Don C. Bragg

2002-01-01

Most optimal tree diameter growth models have arisen through either the conceptualization of physiological processes or the adaptation of empirical increment models. However, surprisingly little effort has been invested in the melding of these approaches even though it is possible to develop theoretically sound, computationally efficient optimal tree growth models...
Design, development and evaluation of a tree planting-site-specific fumigant applicator

USDA-ARS?s Scientific Manuscript database

The goal of this research was to use recent advances in the global positioning system (GPS) and computer technology to apply just the right amount of fumigant where it is most needed (i.e., tree-planting-site-specific application) to decrease the incidence of replant disease, and achieve the environ...
A computer simulation of full-tree field chipping and trucking.

Treesearch

Dennis P. Bradley; Frank E. Biltonen; Sharon A. Winsauer

1976-01-01

Describes a computerized model of a full-tree field chipping system from stump to mill using the GPSS simulation language. The program instructions reproduce the interactions, production, and costs for the various operations under given stand and operating conditions so a user can find the best way to operate his system.
"Tree Investigators": Supporting Families' Scientific Talk in an Arboretum with Mobile Computers

ERIC Educational Resources Information Center

Zimmerman, Heather Toomey; Land, Susan M.; McClain, Lucy R.; Mohney, Michael R.; Choi, Gi Woong; Salman, Fariha H.

2015-01-01

This research examines the "Tree Investigators" project to support science learning with mobile devices during family public programmes in an arboretum. Using a case study methodology, researchers analysed video records of 10 families (25 people) using mobile technologies with naturalists at an arboretum to understand how mobile devices…
A diagnosis system using object-oriented fault tree models

NASA Technical Reports Server (NTRS)

Iverson, David L.; Patterson-Hine, F. A.

1990-01-01

Spaceborne computing systems must provide reliable, continuous operation for extended periods. Due to weight, power, and volume constraints, these systems must manage resources very effectively. A fault diagnosis algorithm is described which enables fast and flexible diagnoses in the dynamic distributed computing environments planned for future space missions. The algorithm uses a knowledge base that is easily changed and updated to reflect current system status. Augmented fault trees represented in an object-oriented form provide deep system knowledge that is easy to access and revise as a system changes. Given such a fault tree, a set of failure events that have occurred, and a set of failure events that have not occurred, this diagnosis system uses forward and backward chaining to propagate causal and temporal information about other failure events in the system being diagnosed. Once the system has established temporal and causal constraints, it reasons backward from heuristically selected failure events to find a set of basic failure events which are a likely cause of the occurrence of the top failure event in the fault tree. The diagnosis system has been implemented in common LISP using Flavors.
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees.

PubMed

van Iersel, Leo; Kelk, Steven; Lekić, Nela; Scornavacca, Celine

2014-05-05

Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.
T-RMSD: a web server for automated fine-grained protein structural classification.

PubMed

Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric

2013-07-01

This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.
T-RMSD: a web server for automated fine-grained protein structural classification

PubMed Central

Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric

2013-01-01

This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd. PMID:23716642
Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

NASA Astrophysics Data System (ADS)

Kim, Jong Kyu; Kim, Nam Soo

In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
Hybrid Parallel Contour Trees, Version 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher; Fasel, Patricia; Carr, Hamish

A common operation in scientific visualization is to compute and render a contour of a data set. Given a function of the form f : R^d -> R, a level set is defined as an inverse image f^-1(h) for an isovalue h, and a contour is a single connected component of a level set. The Reeb graph can then be defined to be the result of contracting each contour to a single point, and is well defined for Euclidean spaces or for general manifolds. For simple domains, the graph is guaranteed to be a tree, and is called the contourmore » tree. Analysis can then be performed on the contour tree in order to identify isovalues of particular interest, based on various metrics, and render the corresponding contours, without having to know such isovalues a priori. This code is intended to be the first data-parallel algorithm for computing contour trees. Our implementation will use the portable data-parallel primitives provided by Nvidia’s Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs. Native OpenMP and purely serial versions of the code will likely also be included. It will also be extended to provide a hybrid data-parallel / distributed algorithm, allowing scaling beyond a single GPU or CPU.« less
Phylogenomic, Pan-genomic, Pathogenomic and Evolutionary Genomic Insights into the Agronomically Relevant Enterobacteria Pantoea ananatis and Pantoea stewartii

PubMed Central

De Maayer, Pieter; Aliyu, Habibu; Vikram, Surendra; Blom, Jochen; Duffy, Brion; Cowan, Don A.; Smits, Theo H. M.; Venter, Stephanus N.; Coutinho, Teresa A.

2017-01-01

Pantoea ananatis is ubiquitously found in the environment and causes disease on a wide range of plant hosts. By contrast, its sister species, Pantoea stewartii subsp. stewartii is the host-specific causative agent of the devastating maize disease Stewart’s wilt. This pathogen has a restricted lifecycle, overwintering in an insect vector before being introduced into susceptible maize cultivars, causing disease and returning to overwinter in its vector. The other subspecies of P. stewartii subsp. indologenes, has been isolated from different plant hosts and is predicted to proliferate in different environmental niches. Here we have, by the use of comparative genomics and a comprehensive suite of bioinformatic tools, analyzed the genomes of ten P. stewartii and nineteen P. ananatis strains. Our phylogenomic analyses have revealed that there are two distinct clades within P. ananatis while far less phylogenetic diversity was observed among the P. stewartii subspecies. Pan-genome analyses revealed a large core genome comprising of 3,571 protein coding sequences is shared among the twenty-nine compared strains. Furthermore, we showed that an extensive accessory genome made up largely by a mobilome of plasmids, integrated prophages, integrative and conjugative elements and insertion elements has resulted in extensive diversification of P. stewartii and P. ananatis. While these organisms share many pathogenicity determinants, our comparative genomic analyses show that they differ in terms of the secretion systems they encode. The genomic differences identified in this study have allowed us to postulate on the divergent evolutionary histories of the analyzed P. ananatis and P. stewartii strains and on the molecular basis underlying their ecological success and host range. PMID:28959245
Gene-trait matching across the Bifidobacterium longum pan-genome reveals considerable diversity in carbohydrate catabolism among human infant strains.

PubMed

Arboleya, Silvia; Bottacini, Francesca; O'Connell-Motherway, Mary; Ryan, C Anthony; Ross, R Paul; van Sinderen, Douwe; Stanton, Catherine

2018-01-08

Bifidobacterium longum is a common member of the human gut microbiota and is frequently present at high numbers in the gut microbiota of humans throughout life, thus indicative of a close symbiotic host-microbe relationship. Different mechanisms may be responsible for the high competitiveness of this taxon in its human host to allow stable establishment in the complex and dynamic intestinal microbiota environment. The objective of this study was to assess the genetic and metabolic diversity in a set of 20 B. longum strains, most of which had previously been isolated from infants, by performing whole genome sequencing and comparative analysis, and to analyse their carbohydrate utilization abilities using a gene-trait matching approach. We analysed their pan-genome and their phylogenetic relatedness. All strains clustered in the B. longum ssp. longum phylogenetic subgroup, except for one individual strain which was found to cluster in the B. longum ssp. suis phylogenetic group. The examined strains exhibit genomic diversity, while they also varied in their sugar utilization profiles. This allowed us to perform a gene-trait matching exercise enabling the identification of five gene clusters involved in the utilization of xylo-oligosaccharides, arabinan, arabinoxylan, galactan and fucosyllactose, the latter of which is an abundant human milk oligosaccharide (HMO). The results showed high diversity in terms of genes and predicted glycosyl-hydrolases, as well as the ability to metabolize a large range of sugars. Moreover, we corroborate the capability of B. longum ssp. longum to metabolise HMOs. Ultimately, their intraspecific genomic diversity and the ability to consume a wide assortment of carbohydrates, ranging from plant-derived carbohydrates to HMOs, may provide an explanation for the competitive advantage and persistence of B. longum in the human gut microbiome.
A Year of Infection in the Intensive Care Unit: Prospective Whole Genome Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota

PubMed Central

Roach, David J.; Burton, Joshua N.; Lee, Choli; Stackhouse, Bethany; Butler-Wu, Susan M.; Cookson, Brad T.

2015-01-01

Bacterial whole genome sequencing holds promise as a disruptive technology in clinical microbiology, but it has not yet been applied systematically or comprehensively within a clinical context. Here, over the course of one year, we performed prospective collection and whole genome sequencing of nearly all bacterial isolates obtained from a tertiary care hospital’s intensive care units (ICUs). This unbiased collection of 1,229 bacterial genomes from 391 patients enables detailed exploration of several features of clinical pathogens. A sizable fraction of isolates identified as clinically relevant corresponded to previously undescribed species: 12% of isolates assigned a species-level classification by conventional methods actually qualified as distinct, novel genomospecies on the basis of genomic similarity. Pan-genome analysis of the most frequently encountered pathogens in the collection revealed substantial variation in pan-genome size (1,420 to 20,432 genes) and the rate of gene discovery (1 to 152 genes per isolate sequenced). Surprisingly, although potential nosocomial transmission of actively surveilled pathogens was rare, 8.7% of isolates belonged to genomically related clonal lineages that were present among multiple patients, usually with overlapping hospital admissions, and were associated with clinically significant infection in 62% of patients from which they were recovered. Multi-patient clonal lineages were particularly evident in the neonatal care unit, where seven separate Staphylococcus epidermidis clonal lineages were identified, including one lineage associated with bacteremia in 5/9 neonates. Our study highlights key differences in the information made available by conventional microbiological practices versus whole genome sequencing, and motivates the further integration of microbial genome sequencing into routine clinical care. PMID:26230489
Escherichia coli B2 strains prevalent in inflammatory bowel disease patients have distinct metabolic capabilities that enable colonization of intestinal mucosa.

PubMed

Fang, Xin; Monk, Jonathan M; Mih, Nathan; Du, Bin; Sastry, Anand V; Kavvas, Erol; Seif, Yara; Smarr, Larry; Palsson, Bernhard O

2018-06-11

Escherichia coli is considered a leading bacterial trigger of inflammatory bowel disease (IBD). E. coli isolates from IBD patients primarily belong to phylogroup B2. Previous studies have focused on broad comparative genomic analysis of E. coli B2 isolates, and identified virulence factors that allow B2 strains to reside within human intestinal mucosa. Metabolic capabilities of E. coli strains have been shown to be related to their colonization site, but remain unexplored in IBD-associated strains. In this study, we utilized pan-genome analysis and genome-scale models (GEMs) of metabolism to study metabolic capabilities of IBD-associated E. coli B2 strains. The study yielded three results: i) Pan-genome analysis of 110 E. coli strains (including 53 isolates from IBD studies) revealed discriminating metabolic genes between B2 strains and other strains; ii) Both comparative genomic analysis and GEMs suggested that B2 strains have an advantage in degrading and utilizing sugars derived from mucus glycan, and iii) GEMs revealed distinct metabolic features in B2 strains that potentially allow them to utilize energy more efficiently. For example, B2 strains lack the enzymes to degrade amadori products, but instead rely on neighboring bacteria to convert these substrates into a more readily usable and potentially less sought after product. Taken together, these results suggest that the metabolic capabilities of B2 strains vary significantly from those of other strains, enabling B2 strains to colonize intestinal mucosa.The results from this study motivate a broad experimental assessment of the nutritional effects on E. coli B2 pathophysiology in IBD patients.
Pseudomonas aeruginosa clinical and environmental isolates constitute a single population with high phenotypic diversity

PubMed Central

2014-01-01

Background Pseudomonas aeruginosa is an opportunistic pathogen with a high incidence of hospital infections that represents a threat to immune compromised patients. Genomic studies have shown that, in contrast to other pathogenic bacteria, clinical and environmental isolates do not show particular genomic differences. In addition, genetic variability of all the P. aeruginosa strains whose genomes have been sequenced is extremely low. This low genomic variability might be explained if clinical strains constitute a subpopulation of this bacterial species present in environments that are close to human populations, which preferentially produce virulence associated traits. Results In this work, we sequenced the genomes and performed phenotypic descriptions for four non-human P. aeruginosa isolates collected from a plant, the ocean, a water-spring, and from dolphin stomach. We show that the four strains are phenotypically diverse and that this is not reflected in genomic variability, since their genomes are almost identical. Furthermore, we performed a detailed comparative genomic analysis of the four strains studied in this work with the thirteen previously reported P. aeruginosa genomes by means of describing their core and pan-genomes. Conclusions Contrary to what has been described for other bacteria we have found that the P. aeruginosa core genome is constituted by a high proportion of genes and that its pan-genome is thus relatively small. Considering the high degree of genomic conservation between isolates of P. aeruginosa from diverse environments, including human tissues, some implications for the treatment of infections are discussed. This work also represents a methodological contribution for the genomic study of P. aeruginosa, since we provide a database of the comparison of all the proteins encoded by the seventeen strains analyzed. PMID:24773920
Phylogenomic, Pan-genomic, Pathogenomic and Evolutionary Genomic Insights into the Agronomically Relevant Enterobacteria Pantoea ananatis and Pantoea stewartii.

PubMed

De Maayer, Pieter; Aliyu, Habibu; Vikram, Surendra; Blom, Jochen; Duffy, Brion; Cowan, Don A; Smits, Theo H M; Venter, Stephanus N; Coutinho, Teresa A

2017-01-01

Pantoea ananatis is ubiquitously found in the environment and causes disease on a wide range of plant hosts. By contrast, its sister species, Pantoea stewartii subsp. stewartii is the host-specific causative agent of the devastating maize disease Stewart's wilt. This pathogen has a restricted lifecycle, overwintering in an insect vector before being introduced into susceptible maize cultivars, causing disease and returning to overwinter in its vector. The other subspecies of P. stewartii subsp. indologenes , has been isolated from different plant hosts and is predicted to proliferate in different environmental niches. Here we have, by the use of comparative genomics and a comprehensive suite of bioinformatic tools, analyzed the genomes of ten P. stewartii and nineteen P. ananatis strains. Our phylogenomic analyses have revealed that there are two distinct clades within P. ananatis while far less phylogenetic diversity was observed among the P. stewartii subspecies. Pan-genome analyses revealed a large core genome comprising of 3,571 protein coding sequences is shared among the twenty-nine compared strains. Furthermore, we showed that an extensive accessory genome made up largely by a mobilome of plasmids, integrated prophages, integrative and conjugative elements and insertion elements has resulted in extensive diversification of P. stewartii and P. ananatis . While these organisms share many pathogenicity determinants, our comparative genomic analyses show that they differ in terms of the secretion systems they encode. The genomic differences identified in this study have allowed us to postulate on the divergent evolutionary histories of the analyzed P. ananatis and P. stewartii strains and on the molecular basis underlying their ecological success and host range.
Diversity Arrays Technology (DArT) for Pan-Genomic Evolutionary Studies of Non-Model Organisms

PubMed Central

James, Karen E.; Schneider, Harald; Ansell, Stephen W.; Evers, Margaret; Robba, Lavinia; Uszynski, Grzegorz; Pedersen, Niklas; Newton, Angela E.; Russell, Stephen J.; Vogel, Johannes C.; Kilian, Andrzej

2008-01-01

Background High-throughput tools for pan-genomic study, especially the DNA microarray platform, have sparked a remarkable increase in data production and enabled a shift in the scale at which biological investigation is possible. The use of microarrays to examine evolutionary relationships and processes, however, is predominantly restricted to model or near-model organisms. Methodology/Principal Findings This study explores the utility of Diversity Arrays Technology (DArT) in evolutionary studies of non-model organisms. DArT is a hybridization-based genotyping method that uses microarray technology to identify and type DNA polymorphism. Theoretically applicable to any organism (even one for which no prior genetic data are available), DArT has not yet been explored in exclusively wild sample sets, nor extensively examined in a phylogenetic framework. DArT recovered 1349 markers of largely low copy-number loci in two lineages of seed-free land plants: the diploid fern Asplenium viride and the haploid moss Garovaglia elegans. Direct sequencing of 148 of these DArT markers identified 30 putative loci including four routinely sequenced for evolutionary studies in plants. Phylogenetic analyses of DArT genotypes reveal phylogeographic and substrate specificity patterns in A. viride, a lack of phylogeographic pattern in Australian G. elegans, and additive variation in hybrid or mixed samples. Conclusions/Significance These results enable methodological recommendations including procedures for detecting and analysing DArT markers tailored specifically to evolutionary investigations and practical factors informing the decision to use DArT, and raise evolutionary hypotheses concerning substrate specificity and biogeographic patterns. Thus DArT is a demonstrably valuable addition to the set of existing molecular approaches used to infer biological phenomena such as adaptive radiations, population dynamics, hybridization, introgression, ecological differentiation and phylogeography. PMID:18301759
Diversification and niche adaptations of Nitrospina-like bacteria in the polyextreme interfaces of Red Sea brines

PubMed Central

Ngugi, David Kamanda; Blom, Jochen; Stepanauskas, Ramunas; Stingl, Ulrich

2016-01-01

Nitrite-oxidizing bacteria (NOB) of the genus Nitrospina have exclusively been found in marine environments. In the brine–seawater interface layer of Atlantis II Deep (Red Sea), Nitrospina-like bacteria constitute up to one-third of the bacterial 16S ribosomal RNA (rRNA) gene sequences. This is much higher compared with that reported in other marine habitats (~10% of all bacteria), and was unexpected because no NOB culture has been observed to grow above 4.0% salinity, presumably due to the low net energy gained from their metabolism that is insufficient for both growth and osmoregulation. Using phylogenetics, single-cell genomics and metagenomic fragment recruitment approaches, we document here that these Nitrospina-like bacteria, designated as Candidatus Nitromaritima RS, are not only highly diverged from the type species Nitrospina gracilis (pairwise genome identity of 69%) but are also ubiquitous in the deeper, highly saline interface layers (up to 11.2% salinity) with temperatures of up to 52 °C. Comparative pan-genome analyses revealed that less than half of the predicted proteome of Ca. Nitromaritima RS is shared with N. gracilis. Interestingly, the capacity for nitrite oxidation is also conserved in both genomes. Although both lack acidic proteomes synonymous with extreme halophiles, the pangenome of Ca. Nitromaritima RS specifically encodes enzymes with osmoregulatory and thermoprotective roles (i.e., ectoine/hydroxyectoine biosynthesis) and of thermodynamic importance (i.e., nitrate and nitrite reductases). Ca. Nitromaritima RS also possesses many hallmark traits of microaerophiles and high-affinity NOB. The abundance of the uncultured Ca. Nitromaritima lineage in marine oxyclines suggests their unrecognized ecological significance in deoxygenated areas of the global ocean. PMID:26657763
Embedding global and collective in a torus network with message class map based tree path selection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Dong; Coteus, Paul W.; Eisley, Noel A.

Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computermore » program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.« less
Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

PubMed

Wu, Yufeng

2012-03-01

Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.

Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, Sameer

Disclosed is a mechanism on receiving processors in a parallel computing system for providing order to data packets received from a broadcast call and to distinguish data packets received at nodes from several incoming asynchronous broadcast messages where header space is limited. In the present invention, processors at lower leafs of a tree do not need to obtain a broadcast message by directly accessing the data in a root processor's buffer. Instead, each subsequent intermediate node's rank id information is squeezed into the software header of packet headers. In turn, the entire broadcast message is not transferred from the rootmore » processor to each processor in a communicator but instead is replicated on several intermediate nodes which then replicated the message to nodes in lower leafs. Hence, the intermediate compute nodes become "virtual root compute nodes" for the purpose of replicating the broadcast message to lower levels of a tree.« less
Probabilistic atlas based labeling of the cerebral vessel tree

NASA Astrophysics Data System (ADS)

Van de Giessen, Martijn; Janssen, Jasper P.; Brouwer, Patrick A.; Reiber, Johan H. C.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke

2015-03-01

Preoperative imaging of the cerebral vessel tree is essential for planning therapy on intracranial stenoses and aneurysms. Usually, a magnetic resonance angiography (MRA) or computed tomography angiography (CTA) is acquired from which the cerebral vessel tree is segmented. Accurate analysis is helped by the labeling of the cerebral vessels, but labeling is non-trivial due to anatomical topological variability and missing branches due to acquisition issues. In recent literature, labeling the cerebral vasculature around the Circle of Willis has mainly been approached as a graph-based problem. The most successful method, however, requires the definition of all possible permutations of missing vessels, which limits application to subsets of the tree and ignores spatial information about the vessel locations. This research aims to perform labeling using probabilistic atlases that model spatial vessel and label likelihoods. A cerebral vessel tree is aligned to a probabilistic atlas and subsequently each vessel is labeled by computing the maximum label likelihood per segment from label-specific atlases. The proposed method was validated on 25 segmented cerebral vessel trees. Labeling accuracies were close to 100% for large vessels, but dropped to 50-60% for small vessels that were only present in less than 50% of the set. With this work we showed that using solely spatial information of the vessel labels, vessel segments from stable vessels (>50% presence) were reliably classified. This spatial information will form the basis for a future labeling strategy with a very loose topological model.
Worldsheet scattering in AdS3/CFT2

NASA Astrophysics Data System (ADS)

Sundin, Per; Wulff, Linus

2013-07-01

We confront the recently proposed exact S-matrices for AdS 3/ CFT 2 with direct worldsheet calculations. Utilizing the BMN and Near Flat Space (NFS) expansions for strings on AdS 3 × S 3 × S 3 × S 1 and AdS 3 × S 3 × T 4 we compute both tree-level and one-loop scattering amplitudes. Up to some minor issues we find nice agreement in the tree-level sector. At the one-loop level however we find that certain non-zero tree-level processes, which are not visible in the exact solution, contribute, via the optical theorem, and give an apparent mismatch for certain amplitudes. Furthermore we find that a proposed one-loop modification of the dressing phase correctly reproduces the worldsheet calculation while the standard Hernandez-Lopez phase does not. We also compute several massless to massless processes.
Parallel Adaptive Mesh Refinement Library

NASA Technical Reports Server (NTRS)

Mac-Neice, Peter; Olson, Kevin

2005-01-01

Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
A new method for locating changes in a tree reveals distinct nucleotide polymorphism vs. divergence patterns in mouse mitochondrial control region.

PubMed

Galtier, N; Boursot, P

2000-03-01

A new, model-based method was devised to locate nucleotide changes in a given phylogenetic tree. For each site, the posterior probability of any possible change in each branch of the tree is computed. This probabilistic method is a valuable alternative to the maximum parsimony method when base composition is skewed (i.e., different from 25% A, 25% C, 25% G, 25% T): computer simulations showed that parsimony misses more rare --> common than common --> rare changes, resulting in biased inferred change matrices, whereas the new method appeared unbiased. The probabilistic method was applied to the analysis of the mutation and substitution processes in the mitochondrial control region of mouse. Distinct change patterns were found at the polymorphism (within species) and divergence (between species) levels, rejecting the hypothesis of a neutral evolution of base composition in mitochondrial DNA.
Derivative Trade Optimizing Model Utilizing GP Based on Behavioral Finance Theory

NASA Astrophysics Data System (ADS)

Matsumura, Koki; Kawamoto, Masaru

This paper proposed a new technique which makes the strategy trees for the derivative (option) trading investment decision based on the behavioral finance theory and optimizes it using evolutionary computation, in order to achieve high profitability. The strategy tree uses a technical analysis based on a statistical, experienced technique for the investment decision. The trading model is represented by various technical indexes, and the strategy tree is optimized by the genetic programming(GP) which is one of the evolutionary computations. Moreover, this paper proposed a method using the prospect theory based on the behavioral finance theory to set psychological bias for profit and deficit and attempted to select the appropriate strike price of option for the higher investment efficiency. As a result, this technique produced a good result and found the effectiveness of this trading model by the optimized dealings strategy.
Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics.

PubMed

Vaniya, Arpana; Fiehn, Oliver

2015-06-01

Identification of unknown metabolites is the bottleneck in advancing metabolomics, leaving interpretation of metabolomics results ambiguous. The chemical diversity of metabolism is vast, making structure identification arduous and time consuming. Currently, comprehensive analysis of mass spectra in metabolomics is limited to library matching, but tandem mass spectral libraries are small compared to the large number of compounds found in the biosphere, including xenobiotics. Resolving this bottleneck requires richer data acquisition and better computational tools. Multi-stage mass spectrometry (MSn) trees show promise to aid in this regard. Fragmentation trees explore the fragmentation process, generate fragmentation rules and aid in sub-structure identification, while mass spectral trees delineate the dependencies in multi-stage MS of collision-induced dissociations. This review covers advancements over the past 10 years as a tool for metabolite identification, including algorithms, software and databases used to build and to implement fragmentation trees and mass spectral annotations.
Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

PubMed Central

Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip

2015-01-01

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error. PMID:27081304
Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

NASA Astrophysics Data System (ADS)

Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

2010-12-01

We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics

NASA Technical Reports Server (NTRS)

Padovan, Joe; Gute, Doug; Johnson, Keith

1989-01-01

The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.
An evaluation of FIA's stand age variable

Treesearch

John D. Shaw

2015-01-01

The Forest Inventory and Analysis Database (FIADB) includes a large number of measured and computed variables. The definitions of measured variables are usually well-documented in FIA field and database manuals. Some computed variables, such as live basal area of the condition, are equally straightforward. Other computed variables, such as individual tree volume,...
Treetrimmer: a method for phylogenetic dataset size reduction.

PubMed

Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M

2013-04-12

With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.
Tree-Based Unrooted Phylogenetic Networks.

PubMed

Francis, A; Huber, K T; Moulton, V

2018-02-01

Phylogenetic networks are a generalization of phylogenetic trees that are used to represent non-tree-like evolutionary histories that arise in organisms such as plants and bacteria, or uncertainty in evolutionary histories. An unrooted phylogenetic network on a non-empty, finite set X of taxa, or network, is a connected, simple graph in which every vertex has degree 1 or 3 and whose leaf set is X. It is called a phylogenetic tree if the underlying graph is a tree. In this paper we consider properties of tree-based networks, that is, networks that can be constructed by adding edges into a phylogenetic tree. We show that although they have some properties in common with their rooted analogues which have recently drawn much attention in the literature, they have some striking differences in terms of both their structural and computational properties. We expect that our results could eventually have applications to, for example, detecting horizontal gene transfer or hybridization which are important factors in the evolution of many organisms.
BEHAVE: fire behavior prediction and fuel modeling system - BURN subsystem, Part 2

Treesearch

Patricia L. Andrews; Carolyn H. Chase

1989-01-01

This is the third publication describing the BEHAVE system of computer programs for predicting behavior of wildland fires. This publication adds the following predictive capabilities: distance firebrands are lofted ahead of a wind-driven surface fire, probabilities of firebrands igniting spot fires, scorch height of trees, and percentage of tree mortality. The system...
Air pollution removal by urban forests in Canada and its effect on air quality and human health

Treesearch

David J. Nowak; Satoshi Hirabayashi; Marlene Doyle; Mark McGovern; Jon Pasher

2018-01-01

Urban trees perform a number of ecosystem services including air pollution removal, carbon sequestration, cooling air temperatures and providing aesthetic beauty to the urban landscape. Trees remove air pollution by intercepting particulate matter on plant surfaces and absorbing gaseous pollutants through the leaf stomata. Computer simulations with local environmental...
Maintenance cost, toppling risk and size of trees in a self-thinning stand.

PubMed

Larjavaara, Markku

2010-07-07

Wind routinely topples trees during storms, and the likelihood that a tree is toppled depends critically on its allometry. Yet none of the existing theories to explain tree allometry consider wind drag on tree canopies. Since leaf area index in crowded, self-thinning stands is independent of stand density, the drag force per unit land can also be assumed to be independent of stand density, with only canopy height influencing the total toppling moment. Tree stem dimensions and the self-thinning biomass can then be computed by further assuming that the risk of toppling over and stem maintenance per unit land area are independent of stand density, and that stem maintenance cost is a linear function of stem surface area and sapwood volume. These assumptions provide a novel way to understand tree allometry and lead to a self-thinning line relating tree biomass and stand density with a power between -3/2 and -2/3 depending on the ratio of maintenance of sapwood and stem surface. (c) 2010 Elsevier Ltd. All rights reserved.
Phylogenetic trees in bioinformatics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burr, Tom L

2008-01-01

Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding themore » best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.« less
Individualistic and Time-Varying Tree-Ring Growth to Climate Sensitivity

PubMed Central

Carrer, Marco

2011-01-01

The development of dendrochronological time series in order to analyze climate-growth relationships usually involves first a rigorous selection of trees and then the computation of the mean tree-growth measurement series. This study suggests a change in the perspective, passing from an analysis of climate-growth relationships that typically focuses on the mean response of a species to investigating the whole range of individual responses among sample trees. Results highlight that this new approach, tested on a larch and stone pine tree-ring dataset, outperforms, in terms of information obtained, the classical one, with significant improvements regarding the strength, distribution and time-variability of the individual tree-ring growth response to climate. Moreover, a significant change over time of the tree sensitivity to climatic variability has been detected. Accordingly, the best-responder trees at any one time may not always have been the best-responders and may not continue to be so. With minor adjustments to current dendroecological protocol and adopting an individualistic approach, we can improve the quality and reliability of the ecological inferences derived from the climate-growth relationships. PMID:21829523
A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

NASA Astrophysics Data System (ADS)

Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

2014-12-01

Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.
Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary.

PubMed

Brynildsrud, Ola; Bohlin, Jon; Scheffer, Lonneke; Eldholm, Vegard

2016-11-25

Genome-wide association studies (GWAS) have become indispensable in human medicine and genomics, but very few have been carried out on bacteria. Here we introduce Scoary, an ultra-fast, easy-to-use, and widely applicable software tool that scores the components of the pan-genome for associations to observed phenotypic traits while accounting for population stratification, with minimal assumptions about evolutionary processes. We call our approach pan-GWAS to distinguish it from traditional, single nucleotide polymorphism (SNP)-based GWAS. Scoary is implemented in Python and is available under an open source GPLv3 license at https://github.com/AdmiralenOla/Scoary .

Disturbance and density-dependent processes (competition and facilitation) influence the fine-scale genetic structure of a tree species' population.

PubMed

Fajardo, Alex; Torres-Díaz, Cristian; Till-Bottraud, Irène

2016-01-01

Disturbances, dispersal and biotic interactions are three major drivers of the spatial distribution of genotypes within populations, the last of which has been less studied than the other two. This study aimed to determine the role of competition and facilitation in the degree of conspecific genetic relatedness of nearby individuals of tree populations. It was expected that competition among conspecifics will lead to low relatedness, while facilitation will lead to high relatedness (selection for high relatedness within clusters). The stand structure and spatial genetic structure (SGS) of trees were examined within old-growth and second-growth forests (including multi-stemmed trees at the edge of forests) of Nothofagus pumilio following large-scale fires in Patagonia, Chile. Genetic spatial autocorrelations were computed on a spatially explicit sampling of the forests using five microsatellite loci. As biotic plant interactions occur among immediate neighbours, mean nearest neighbour distance (MNND) among trees was computed as a threshold for distinguishing the effects of disturbances and biotic interactions. All forests exhibited a significant SGS for distances greater than the MNND. The old-growth forest genetic and stand structure indicated gap recolonization from nearby trees (significantly related trees at distances between 4 and 10 m). At distances smaller than the MNND, trees of the second-growth interior forest showed significantly lower relatedness, suggesting a fading of the recolonization structure by competition, whereas the second-growth edge forest showed a positive and highly significant relatedness among trees (higher among stems of a cluster than among stems of different clusters), resulting from facilitation. Biotic interactions are shown to influence the genetic composition of a tree population. However, facilitation can only persist if individuals are related. Thus, the genetic composition in turn influences what type of biotic interactions will take place among immediate neighbours in post-disturbance forests. © The Author 2015. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

PubMed Central

2010-01-01

Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

PubMed

Thomas, Paul D

2010-06-09

Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
Remote sensing of vegetation structure using computer vision

NASA Astrophysics Data System (ADS)

Dandois, Jonathan P.

High-spatial resolution measurements of vegetation structure are needed for improving understanding of ecosystem carbon, water and nutrient dynamics, the response of ecosystems to a changing climate, and for biodiversity mapping and conservation, among many research areas. Our ability to make such measurements has been greatly enhanced by continuing developments in remote sensing technology---allowing researchers the ability to measure numerous forest traits at varying spatial and temporal scales and over large spatial extents with minimal to no field work, which is costly for large spatial areas or logistically difficult in some locations. Despite these advances, there remain several research challenges related to the methods by which three-dimensional (3D) and spectral datasets are joined (remote sensing fusion) and the availability and portability of systems for frequent data collections at small scale sampling locations. Recent advances in the areas of computer vision structure from motion (SFM) and consumer unmanned aerial systems (UAS) offer the potential to address these challenges by enabling repeatable measurements of vegetation structural and spectral traits at the scale of individual trees. However, the potential advances offered by computer vision remote sensing also present unique challenges and questions that need to be addressed before this approach can be used to improve understanding of forest ecosystems. For computer vision remote sensing to be a valuable tool for studying forests, bounding information about the characteristics of the data produced by the system will help researchers understand and interpret results in the context of the forest being studied and of other remote sensing techniques. This research advances understanding of how forest canopy and tree 3D structure and color are accurately measured by a relatively low-cost and portable computer vision personal remote sensing system: 'Ecosynth'. Recommendations are made for optimal conditions under which forest structure measurements should be obtained with UAS-SFM remote sensing. Ultimately remote sensing of vegetation by computer vision offers the potential to provide an 'ecologist's eye view', capturing not only canopy 3D and spectral properties, but also seeing the trees in the forest and the leaves on the trees.
An efficient and extensible approach for compressing phylogenetic trees

PubMed Central

2011-01-01

Background Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. Results On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. Conclusions TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. PMID:22165819
An efficient and extensible approach for compressing phylogenetic trees.

PubMed

Matthews, Suzanne J; Williams, Tiffani L

2011-10-18

Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community.
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Buluc, Aydn; Pothen, Alex

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE PAGES

Azad, Ariful; Buluc, Aydn; Pothen, Alex

2016-03-24

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Constructing high-quality bounding volume hierarchies for N-body computation using the acceptance volume heuristic

NASA Astrophysics Data System (ADS)

Olsson, O.

2018-01-01

We present a novel heuristic derived from a probabilistic cost model for approximate N-body simulations. We show that this new heuristic can be used to guide tree construction towards higher quality trees with improved performance over current N-body codes. This represents an important step beyond the current practice of using spatial partitioning for N-body simulations, and enables adoption of a range of state-of-the-art algorithms developed for computer graphics applications to yield further improvements in N-body simulation performance. We outline directions for further developments and review the most promising such algorithms.
Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

NASA Astrophysics Data System (ADS)

Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia

The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B.; Perumalla, Kalyan S.

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Methods in Symbolic Computation and p-Adic Valuations of Polynomials

NASA Astrophysics Data System (ADS)

Guan, Xiao

Symbolic computation has widely appear in many mathematical fields such as combinatorics, number theory and stochastic processes. The techniques created in the area of experimental mathematics provide us efficient ways of symbolic computing and verification of complicated relations. Part I consists of three problems. The first one focuses on a unimodal sequence derived from a quartic integral. Many of its properties are explored with the help of hypergeometric representations and automatic proofs. The second problem tackles the generating function of the reciprocal of Catalan number. It springs from the closed form given by Mathematica. Furthermore, three methods in special functions are used to justify this result. The third issue addresses the closed form solutions for the moments of products of generalized elliptic integrals , which combines the experimental mathematics and classical analysis. Part II concentrates on the p-adic valuations of polynomials from the perspective of trees. For a given polynomial f( n) indexed in positive integers, the package developed in Mathematica will create certain tree structure following a couple of rules. The evolution of such trees are studied both rigorously and experimentally from the view of field extension, nonparametric statistics and random matrix.
MIP models for connected facility location: A theoretical and computational study☆

PubMed Central

Gollowitzer, Stefan; Ljubić, Ivana

2011-01-01

This article comprises the first theoretical and computational study on mixed integer programming (MIP) models for the connected facility location problem (ConFL). ConFL combines facility location and Steiner trees: given a set of customers, a set of potential facility locations and some inter-connection nodes, ConFL searches for the minimum-cost way of assigning each customer to exactly one open facility, and connecting the open facilities via a Steiner tree. The costs needed for building the Steiner tree, facility opening costs and the assignment costs need to be minimized. We model ConFL using seven compact and three mixed integer programming formulations of exponential size. We also show how to transform ConFL into the Steiner arborescence problem. A full hierarchy between the models is provided. For two exponential size models we develop a branch-and-cut algorithm. An extensive computational study is based on two benchmark sets of randomly generated instances with up to 1300 nodes and 115,000 edges. We empirically compare the presented models with respect to the quality of obtained bounds and the corresponding running time. We report optimal values for all but 16 instances for which the obtained gaps are below 0.6%. PMID:25009366
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

DOE PAGES

Yoginath, Srikanth B.; Perumalla, Kalyan S.

2018-01-31

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

PubMed

Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

2016-01-04

The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
First Order Fire Effects Model: FOFEM 4.0, user's guide

Treesearch

Elizabeth D. Reinhardt; Robert E. Keane; James K. Brown

1997-01-01

A First Order Fire Effects Model (FOFEM) was developed to predict the direct consequences of prescribed fire and wildfire. FOFEM computes duff and woody fuel consumption, smoke production, and fire-caused tree mortality for most forest and rangeland types in the United States. The model is available as a computer program for PC or Data General computer.
The Proposal of a Evolutionary Strategy Generating the Data Structures Based on a Horizontal Tree for the Tests

NASA Astrophysics Data System (ADS)

Żukowicz, Marek; Markiewicz, Michał

2016-09-01

The aim of the article is to present a mathematical definition of the object model, that is known in computer science as TreeList and to show application of this model for design evolutionary algorithm, that purpose is to generate structures based on this object. The first chapter introduces the reader to the problem of presenting data using the TreeList object. The second chapter describes the problem of testing data structures based on TreeList. The third one shows a mathematical model of the object TreeList and the parameters, used in determining the utility of structures created through this model and in evolutionary strategy, that generates these structures for testing purposes. The last chapter provides a brief summary and plans for future research related to the algorithm presented in the article.
The role of stand history in assessing forest impacts

USGS Publications Warehouse

Dale, V.H.; Doyle, T.W.

1987-01-01

Air pollution, harvesting practices, and natural disturbances can affect the growth of trees and forest development. To make predictions about anthropogenic impacts on forests, we need to understand how these factors affect tree growth. In this study the effect of disturbance history on tree growth and stand structure was examined by using a computer model of forest development. The model was run under the climatic conditions of east Tennessee, USA, and the results compared to stand structure and tree growth data from a yellow poplar-white oak forest. Basal area growth and forest biomass were more accurately projected when rough approximations of the thinning and fire history typical of the measured plots were included in the simulation model. Stand history can influence tree growth rates and forest structure and should be included in any attempt to assess forest impacts.
A Study of Building a Resource-Based Learning Environment with the Inquiry Learning Approach: Knowledge of Family Trees

ERIC Educational Resources Information Center

Kong, Siu Cheung; So, Wing Mui Winnie

2008-01-01

This study aims to provide teachers with ways and means to facilitate learners to develop nomenclature knowledge of family trees through the establishment of resource-based learning environments (RBLEs). It discusses the design of an RBLE in the classroom by selecting an appropriate context with the assistance of computer-mediated learning…
Interference evaluation between manifold and wet Christmas tree CP systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brasil, S.L.D.C.; Baptista, W.

2000-05-01

Offshore production wells are controlled by valves installed in the marine soil, called wet Christmas trees (WCTs). A manifold receives the production of several wells and transports it to the platform. The manifold is cathodically protected by Al anodes and the WCT by Zn anodes. A computer simulation was carried out to evaluate the interference between the equipment cathodic protection systems.

Indications of vigor loss after fire in Caribbean pine (Pinus caribaea) from electrical resistance measurements

Treesearch

T.E. Paysen; A.L. Koonce; E. Taylor; M.O. Rodriquez

2006-01-01

In May 1993, electrical resistance measurements were performed on trees in burned and unburned stands of Caribbean pine (Pinus caribaea Mor.) in north-eastern Nicaragua to determine whether tree vigor was affected by fire. An Osmose model OZ-67 Shigometer with digital readout was used to collect the sample electrical resistance data. Computer-...
Urban Crowns: crown analysis software to assist in quantifying urban tree benefits

Treesearch

Matthew F. Winn; Sang-Mook Lee Bradley; Philip A. Araman

2010-01-01

UrbanCrowns is a MicrosoftÂ® WindowsÂ®-based computer program developed by the U.S. Forest Service Southern Research Station. The software assists urban forestry professionals, arborists, and community volunteers in assessing and monitoring the crown characteristics of urban trees (both deciduous and coniferous) using a single side-view digital photograph. Program output...
Effects of photographic distance on tree crown atributes calculated using urbancrowns image analysis software

Treesearch

Mason F. Patterson; P. Eric Wiseman; Matthew F. Winn; Sang-mook Lee; Philip A. Araman

2011-01-01

UrbanCrowns is a software program developed by the USDA Forest Service that computes crown attributes using a side-view digital photograph and a few basic field measurements. From an operational standpoint, it is not known how well the software performs under varying photographic conditions for trees of diverse size, which could impact measurement reproducibility and...
Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

PubMed

Soares, Inês; Goios, Ana; Amorim, António

2012-01-01

The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.
A short note on the use of the red-black tree in Cartesian adaptive mesh refinement algorithms

NASA Astrophysics Data System (ADS)

Hasbestan, Jaber J.; Senocak, Inanc

2017-12-01

Mesh adaptivity is an indispensable capability to tackle multiphysics problems with large disparity in time and length scales. With the availability of powerful supercomputers, there is a pressing need to extend time-proven computational techniques to extreme-scale problems. Cartesian adaptive mesh refinement (AMR) is one such method that enables simulation of multiscale, multiphysics problems. AMR is based on construction of octrees. Originally, an explicit tree data structure was used to generate and manipulate an adaptive Cartesian mesh. At least eight pointers are required in an explicit approach to construct an octree. Parent-child relationships are then used to traverse the tree. An explicit octree, however, is expensive in terms of memory usage and the time it takes to traverse the tree to access a specific node. For these reasons, implicit pointerless methods have been pioneered within the computer graphics community, motivated by applications requiring interactivity and realistic three dimensional visualization. Lewiner et al. [1] provides a concise review of pointerless approaches to generate an octree. Use of a hash table and Z-order curve are two key concepts in pointerless methods that we briefly discuss next.
On joint subtree distributions under two evolutionary models.

PubMed

Wu, Taoyang; Choi, Kwok Pui

2016-04-01

In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model. Based on two novel recursive formulae, we propose a dynamic approach to numerically compute the exact joint distribution (and hence the marginal distributions) for trees of any size. We also obtained insights into the statistical properties of trees generated under these two models, including a constant correlation between the cherry and the pitchfork distributions under the YHK model, and the log-concavity and unimodality of the cherry distributions under both models. In addition, we show that there exists a unique change point for the cherry distributions between these two models. Copyright © 2015 Elsevier Inc. All rights reserved.
Fault trees for decision making in systems analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lambert, Howard E.

1975-10-09

The application of fault tree analysis (FTA) to system safety and reliability is presented within the framework of system safety analysis. The concepts and techniques involved in manual and automated fault tree construction are described and their differences noted. The theory of mathematical reliability pertinent to FTA is presented with emphasis on engineering applications. An outline of the quantitative reliability techniques of the Reactor Safety Study is given. Concepts of probabilistic importance are presented within the fault tree framework and applied to the areas of system design, diagnosis and simulation. The computer code IMPORTANCE ranks basic events and cut setsmore » according to a sensitivity analysis. A useful feature of the IMPORTANCE code is that it can accept relative failure data as input. The output of the IMPORTANCE code can assist an analyst in finding weaknesses in system design and operation, suggest the most optimal course of system upgrade, and determine the optimal location of sensors within a system. A general simulation model of system failure in terms of fault tree logic is described. The model is intended for efficient diagnosis of the causes of system failure in the event of a system breakdown. It can also be used to assist an operator in making decisions under a time constraint regarding the future course of operations. The model is well suited for computer implementation. New results incorporated in the simulation model include an algorithm to generate repair checklists on the basis of fault tree logic and a one-step-ahead optimization procedure that minimizes the expected time to diagnose system failure.« less
Tree-Structured Digital Organisms Model

NASA Astrophysics Data System (ADS)

Suzuki, Teruhiko; Nobesawa, Shiho; Tahara, Ikuo

Tierra and Avida are well-known models of digital organisms. They describe a life process as a sequence of computation codes. A linear sequence model may not be the only way to describe a digital organism, though it is very simple for a computer-based model. Thus we propose a new digital organism model based on a tree structure, which is rather similar to the generic programming. With our model, a life process is a combination of various functions, as if life in the real world is. This implies that our model can easily describe the hierarchical structure of life, and it can simulate evolutionary computation through mutual interaction of functions. We verified our model by simulations that our model can be regarded as a digital organism model according to its definitions. Our model even succeeded in creating species such as viruses and parasites.
Parallel fast multipole boundary element method applied to computational homogenization

NASA Astrophysics Data System (ADS)

Ptaszny, Jacek

2018-01-01

In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
Species tree inference by minimizing deep coalescences.

PubMed

Than, Cuong; Nakhleh, Luay

2009-09-01

In a 1997 seminal paper, W. Maddison proposed minimizing deep coalescences, or MDC, as an optimization criterion for inferring the species tree from a set of incongruent gene trees, assuming the incongruence is exclusively due to lineage sorting. In a subsequent paper, Maddison and Knowles provided and implemented a search heuristic for optimizing the MDC criterion, given a set of gene trees. However, the heuristic is not guaranteed to compute optimal solutions, and its hill-climbing search makes it slow in practice. In this paper, we provide two exact solutions to the problem of inferring the species tree from a set of gene trees under the MDC criterion. In other words, our solutions are guaranteed to find the tree that minimizes the total number of deep coalescences from a set of gene trees. One solution is based on a novel integer linear programming (ILP) formulation, and another is based on a simple dynamic programming (DP) approach. Powerful ILP solvers, such as CPLEX, make the first solution appealing, particularly for very large-scale instances of the problem, whereas the DP-based solution eliminates dependence on proprietary tools, and its simplicity makes it easy to integrate with other genomic events that may cause gene tree incongruence. Using the exact solutions, we analyze a data set of 106 loci from eight yeast species, a data set of 268 loci from eight Apicomplexan species, and several simulated data sets. We show that the MDC criterion provides very accurate estimates of the species tree topologies, and that our solutions are very fast, thus allowing for the accurate analysis of genome-scale data sets. Further, the efficiency of the solutions allow for quick exploration of sub-optimal solutions, which is important for a parsimony-based criterion such as MDC, as we show. We show that searching for the species tree in the compatibility graph of the clusters induced by the gene trees may be sufficient in practice, a finding that helps ameliorate the computational requirements of optimization solutions. Further, we study the statistical consistency and convergence rate of the MDC criterion, as well as its optimality in inferring the species tree. Finally, we show how our solutions can be used to identify potential horizontal gene transfer events that may have caused some of the incongruence in the data, thus augmenting Maddison's original framework. We have implemented our solutions in the PhyloNet software package, which is freely available at: http://bioinfo.cs.rice.edu/phylonet.
Study of traffic-related pollutant removal from street canyon with trees: dispersion and deposition perspective.

PubMed

Morakinyo, Tobi Eniolu; Lam, Yun Fat

2016-11-01

Numerical experiments involving street canyons of varying aspect ratio with traffic-induced pollutants (PM 2.5 ) and implanted trees of varying aspect ratio, leaf area index, leaf area density distribution, trunk height, tree-covered area, and tree planting pattern under different wind conditions were conducted using a computational fluid dynamics (CFD) model, ENVI-met. Various aspects of dispersion and deposition were investigated, which include the influence of various tree configurations and wind condition on dispersion within the street canyon, pollutant mass at the free stream layer and street canyon, and comparison between mass removal by surface (leaf) deposition and mass enhancement due to the presence of trees. Results revealed that concentration level was enhanced especially within pedestrian level in street canyons with trees relative to their tree-free counterparts. Additionally, we found a dependence of the magnitude of concentration increase (within pedestrian level) and decrease (above pedestrian level) due to tree configuration and wind condition. Furthermore, we realized that only ∼0.1-3 % of PM 2.5 was dispersed to the free stream layer while a larger percentage (∼97 %) remained in the canyon, regardless of its aspect ratio, prevailing wind condition, and either tree-free or with tree (of various configuration). Lastly, results indicate that pollutant removal due to deposition on leaf surfaces is potentially sufficient to counterbalance the enhancement of PM 2.5 by such trees under some tree planting scenarios and wind conditions.
Using traveling salesman problem algorithms for evolutionary tree construction.

PubMed

Korostensky, C; Gonnet, G H

2000-07-01

The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.
Knowledge Based Synthesis of Efficient Structures for Concurrent Computation Using Fat-Trees and Pipelining.

DTIC Science & Technology

1986-12-31

synthesize synchronization skeletons"Science of Computer Programming 2, 1982, pp. 241-266 [Gel85] Gelernter, David, "Generative communication in...effective computation based on given primitives . An architecture is an abstract object-type, whose instances are computing systems. By a parallel computing...explaining the language primitives on this basis. We explain how such a basis can be "simpler" than a general-purpose manual-programming language such as
Activity: Computer Talk.

ERIC Educational Resources Information Center

Clearing: Nature and Learning in the Pacific Northwest, 1985

1985-01-01

Presents an activity in which students create a computer program capable of recording and projecting paper use at school. Includes instructional strategies and background information such as requirements for pounds of paper/tree, energy needs, water consumption, and paper value at the recycling center. A sample program is included. (DH)
The fast algorithm of spark in compressive sensing

NASA Astrophysics Data System (ADS)

Xie, Meihua; Yan, Fengxia

2017-01-01

Compressed Sensing (CS) is an advanced theory on signal sampling and reconstruction. In CS theory, the reconstruction condition of signal is an important theory problem, and spark is a good index to study this problem. But the computation of spark is NP hard. In this paper, we study the problem of computing spark. For some special matrixes, for example, the Gaussian random matrix and 0-1 random matrix, we obtain some conclusions. Furthermore, for Gaussian random matrix with fewer rows than columns, we prove that its spark equals to the number of its rows plus one with probability 1. For general matrix, two methods are given to compute its spark. One is the method of directly searching and the other is the method of dual-tree searching. By simulating 24 Gaussian random matrixes and 18 0-1 random matrixes, we tested the computation time of these two methods. Numerical results showed that the dual-tree searching method had higher efficiency than directly searching, especially for those matrixes which has as much as rows and columns.
Gravitational tree-code on graphics processing units: implementation in CUDA

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-05-01

We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of θ ≈ 0.5. The code has a convenient user interface and is freely available for use. http://castle.strw.leidenuniv.nl/software/octgrav.html
Bootstrap confidence levels for phylogenetic trees.

PubMed

Efron, B; Halloran, E; Holmes, S

1996-07-09

Evolutionary trees are often estimated from DNA or RNA sequence data. How much confidence should we have in the estimated trees? In 1985, Felsenstein [Felsenstein, J. (1985) Evolution 39, 783-791] suggested the use of the bootstrap to answer this question. Felsenstein's method, which in concept is a straightforward application of the bootstrap, is widely used, but has been criticized as biased in the genetics literature. This paper concerns the use of the bootstrap in the tree problem. We show that Felsenstein's method is not biased, but that it can be corrected to better agree with standard ideas of confidence levels and hypothesis testing. These corrections can be made by using the more elaborate bootstrap method presented here, at the expense of considerably more computation.
Image compression using quad-tree coding with morphological dilation

NASA Astrophysics Data System (ADS)

Wu, Jiaji; Jiang, Weiwei; Jiao, Licheng; Wang, Lei

2007-11-01

In this paper, we propose a new algorithm which integrates morphological dilation operation to quad-tree coding, the purpose of doing this is to compensate each other's drawback by using quad-tree coding and morphological dilation operation respectively. New algorithm can not only quickly find the seed significant coefficient of dilation but also break the limit of block boundary of quad-tree coding. We also make a full use of both within-subband and cross-subband correlation to avoid the expensive cost of representing insignificant coefficients. Experimental results show that our algorithm outperforms SPECK and SPIHT. Without using any arithmetic coding, our algorithm can achieve good performance with low computational cost and it's more suitable to mobile devices or scenarios with a strict real-time requirement.
EEG feature selection method based on decision tree.

PubMed

Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

2015-01-01

This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.
Multitask visual learning using genetic programming.

PubMed

Jaśkowski, Wojciech; Krawiec, Krzysztof; Wieloch, Bartosz

2008-01-01

We propose a multitask learning method of visual concepts within the genetic programming (GP) framework. Each GP individual is composed of several trees that process visual primitives derived from input images. Two trees solve two different visual tasks and are allowed to share knowledge with each other by commonly calling the remaining GP trees (subfunctions) included in the same individual. The performance of a particular tree is measured by its ability to reproduce the shapes contained in the training images. We apply this method to visual learning tasks of recognizing simple shapes and compare it to a reference method. The experimental verification demonstrates that such multitask learning often leads to performance improvements in one or both solved tasks, without extra computational effort.

Reconstructing the spatial pattern of trees from routine stand examination measurements

USGS Publications Warehouse

Hanus, M.L.; Hann, D.W.; Marshall, D.D.

1998-01-01

Reconstruction of the spatial pattern of trees is important for the accurate visual display of unmapped stands. The proposed process for generating the spatial pattern is a nonsimple sequential inhibition process, with the inhibition zone proportionate to the scaled maximum crown width of an open-grown tree of the same species and same diameter at breast height as the subject tree. The results of this coordinate generation procedure are compared with mapped stem data from nine natural stands of Douglas-fir at two ages by the use of a transformed Ripley's K(d) function. The results of this comparison indicate that the proposed method, based on complete tree lists, successfully replicated the spatial patterns of the trees in all nine stands at both ages and over the range of distances examined. On the basis of these findings and the procedure's ability to model effects through time, the nonsimple sequential inhibition process has been chosen to generate tree coordinates in the VIZ4ST computer program for displaying forest stand structure in naturally regenerated young Douglas-fir stands. For. Sci.
Mixed-up trees: the structure of phylogenetic mixtures.

PubMed

Matsen, Frederick A; Mossel, Elchanan; Steel, Mike

2008-05-01

In this paper, we apply new geometric and combinatorial methods to the study of phylogenetic mixtures. The focus of the geometric approach is to describe the geometry of phylogenetic mixture distributions for the two state random cluster model, which is a generalization of the two state symmetric (CFN) model. In particular, we show that the set of mixture distributions forms a convex polytope and we calculate its dimension; corollaries include a simple criterion for when a mixture of branch lengths on the star tree can mimic the site pattern frequency vector of a resolved quartet tree. Furthermore, by computing volumes of polytopes we can clarify how "common" non-identifiable mixtures are under the CFN model. We also present a new combinatorial result which extends any identifiability result for a specific pair of trees of size six to arbitrary pairs of trees. Next we present a positive result showing identifiability of rates-across-sites models. Finally, we answer a question raised in a previous paper concerning "mixed branch repulsion" on trees larger than quartet trees under the CFN model.
treeman: an R package for efficient and intuitive manipulation of phylogenetic trees.

PubMed

Bennett, Dominic J; Sutton, Mark D; Turvey, Samuel T

2017-01-07

Phylogenetic trees are hierarchical structures used for representing the inter-relationships between biological entities. They are the most common tool for representing evolution and are essential to a range of fields across the life sciences. The manipulation of phylogenetic trees-in terms of adding or removing tips-is often performed by researchers not just for reasons of management but also for performing simulations in order to understand the processes of evolution. Despite this, the most common programming language among biologists, R, has few class structures well suited to these tasks. We present an R package that contains a new class, called TreeMan, for representing the phylogenetic tree. This class has a list structure allowing phylogenetic trees to be manipulated more efficiently. Computational running times are reduced because of the ready ability to vectorise and parallelise methods. Development is also improved due to fewer lines of code being required for performing manipulation processes. We present three use cases-pinning missing taxa to a supertree, simulating evolution with a tree-growth model and detecting significant phylogenetic turnover-that demonstrate the new package's speed and simplicity.
Predictive models for radial sap flux variation in coniferous, diffuse-porous and ring-porous temperate trees.

PubMed

Berdanier, Aaron B; Miniat, Chelcy F; Clark, James S

2016-08-01

Accurately scaling sap flux observations to tree or stand levels requires accounting for variation in sap flux between wood types and by depth into the tree. However, existing models for radial variation in axial sap flux are rarely used because they are difficult to implement, there is uncertainty about their predictive ability and calibration measurements are often unavailable. Here we compare different models with a diverse sap flux data set to test the hypotheses that radial profiles differ by wood type and tree size. We show that radial variation in sap flux is dependent on wood type but independent of tree size for a range of temperate trees. The best-fitting model predicted out-of-sample sap flux observations and independent estimates of sapwood area with small errors, suggesting robustness in the new settings. We develop a method for predicting whole-tree water use with this model and include computer code for simple implementation in other studies. Published by Oxford University Press 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
PoMo: An Allele Frequency-Based Approach for Species Tree Estimation

PubMed Central

De Maio, Nicola; Schrempf, Dominik; Kosiol, Carolin

2015-01-01

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches. PMID:26209413
Topological Cacti: Visualizing Contour-based Statistics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weber, Gunther H.; Bremer, Peer-Timo; Pascucci, Valerio

2011-05-26

Contours, the connected components of level sets, play an important role in understanding the global structure of a scalar field. In particular their nestingbehavior and topology-often represented in form of a contour tree-have been used extensively for visualization and analysis. However, traditional contour trees onlyencode structural properties like number of contours or the nesting of contours, but little quantitative information such as volume or other statistics. Here we use thesegmentation implied by a contour tree to compute a large number of per-contour (interval) based statistics of both the function defining the contour tree as well asother co-located functions. We introducemore » a new visual metaphor for contour trees, called topological cacti, that extends the traditional toporrery display of acontour tree to display additional quantitative information as width of the cactus trunk and length of its spikes. We apply the new technique to scalar fields ofvarying dimension and different measures to demonstrate the effectiveness of the approach.« less
Investigating how students communicate tree-thinking

NASA Astrophysics Data System (ADS)

Boyce, Carrie Jo

Learning is often an active endeavor that requires students work at building conceptual understandings of complex topics. Personal experiences, ideas, and communication all play large roles in developing knowledge of and understanding complex topics. Sometimes these experiences can promote formation of scientifically inaccurate or incomplete ideas. Representations are tools used to help individuals understand complex topics. In biology, one way that educators help people understand evolutionary histories of organisms is by using representations called phylogenetic trees. In order to understand phylogenetics trees, individuals need to understand the conventions associated with phylogenies. My dissertation, supported by the Tree-Thinking Representational Competence and Word Association frameworks, is a mixed-methods study investigating the changes in students' tree-reading, representational competence and mental association of phylogenetic terminology after participation in varied instruction. Participants included 128 introductory biology majors from a mid-sized southern research university. Participants were enrolled in either Introductory Biology I, where they were not taught phylogenetics, or Introductory Biology II, where they were explicitly taught phylogenetics. I collected data using a pre- and post-assessment consisting of a word association task and tree-thinking diagnostic (n=128). Additionally, I recruited a subset of students from both courses (n=37) to complete a computer simulation designed to teach students about phylogenetic trees. I then conducted semi-structured interviews consisting of a word association exercise with card sort task, a retrospective pre-assessment discussion, a post-assessment discussion, and interview questions. I found that students who received explicit lecture instruction had a significantly higher increase in scores on a tree-thinking diagnostic than students who did not receive lecture instruction. Students who received both explicit lecture instruction and the computer simulation had a higher level of representational competence and were better able to understand abstract-style phylogenetic trees than students who only completed the simulation. Students who received explicit lecture instruction had a slightly more scientific association of phylogenetic terms than students who received did not receive lecture instruction. My findings suggest that technological instruction alone is not as beneficial as lecture instruction.
On the distribution of interspecies correlation for Markov models of character evolution on Yule trees.

PubMed

Mulder, Willem H; Crawford, Forrest W

2015-01-07

Efforts to reconstruct phylogenetic trees and understand evolutionary processes depend fundamentally on stochastic models of speciation and mutation. The simplest continuous-time model for speciation in phylogenetic trees is the Yule process, in which new species are "born" from existing lineages at a constant rate. Recent work has illuminated some of the structural properties of Yule trees, but it remains mostly unknown how these properties affect sequence and trait patterns observed at the tips of the phylogenetic tree. Understanding the interplay between speciation and mutation under simple models of evolution is essential for deriving valid phylogenetic inference methods and gives insight into the optimal design of phylogenetic studies. In this work, we derive the probability distribution of interspecies covariance under Brownian motion and Ornstein-Uhlenbeck models of phenotypic change on a Yule tree. We compute the probability distribution of the number of mutations shared between two randomly chosen taxa in a Yule tree under discrete Markov mutation models. Our results suggest summary measures of phylogenetic information content, illuminate the correlation between site patterns in sequences or traits of related organisms, and provide heuristics for experimental design and reconstruction of phylogenetic trees. Copyright © 2014 Elsevier Ltd. All rights reserved.
Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice

PubMed Central

Aberer, Andre J.; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

Abstract The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees. PMID:22962004
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.

PubMed

Aberer, Andre J; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.
Fast Dating Using Least-Squares Criteria and Algorithms.

PubMed

To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier

2016-01-01

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley-Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley-Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Fast Dating Using Least-Squares Criteria and Algorithms

PubMed Central

To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier

2016-01-01

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. PMID:26424727
A scaling law derived from optimal dendritic wiring

PubMed Central

Cuntz, Hermann; Mathy, Alexandre; Häusser, Michael

2012-01-01

The wide diversity of dendritic trees is one of the most striking features of neural circuits. Here we develop a general quantitative theory relating the total length of dendritic wiring to the number of branch points and synapses. We show that optimal wiring predicts a 2/3 power law between these measures. We demonstrate that the theory is consistent with data from a wide variety of neurons across many different species and helps define the computational compartments in dendritic trees. Our results imply fundamentally distinct design principles for dendritic arbors compared with vascular, bronchial, and botanical trees. PMID:22715290
On the Hosoya index of a family of deterministic recursive trees

NASA Astrophysics Data System (ADS)

Chen, Xufeng; Zhang, Jingyuan; Sun, Weigang

2017-01-01

In this paper, we calculate the Hosoya index in a family of deterministic recursive trees with a special feature that includes new nodes which are connected to existing nodes with a certain rule. We then obtain a recursive solution of the Hosoya index based on the operations of a determinant. The computational complexity of our proposed algorithm is O(log2 n) with n being the network size, which is lower than that of the existing numerical methods. Finally, we give a weighted tree shrinking method as a graphical interpretation of the recurrence formula for the Hosoya index.
Development of a tree classifier for discrimination of surface mine activity from Landsat digital data

NASA Technical Reports Server (NTRS)

Solomon, J. L.; Miller, W. F.; Quattrochi, D. A.

1979-01-01

In a cooperative project with the Geological Survey of Alabama, the Mississippi State Remote Sensing Applications Program has developed a single purpose, decision-tree classifier using band-ratioing techniques to discriminate various stages of surface mining activity. The tree classifier has four levels and employs only two channels in classification at each level. An accurate computation of the amount of disturbed land resulting from the mining activity can be made as a product of the classification output. The utilization of Landsat data provides a cost-efficient, rapid, and accurate means of monitoring surface mining activities.
Full-tree utilization of southern pine and hardwoods growing on southern pine sites

Treesearch

Peter Koch

1974-01-01

in 1963, approximately 30 percent of the dry weight of above- and below-ground parts of southern pine trees ended as dry surfaced lumber or paper; the remaining 70 percent was largely unused. By 1980, computer-controlled chipping headrigs, thin-kerf saws, lamination of lumber from rotary-cut veneer, high-yield pulping processes, and more intensive use of roots, bark,...
Exploring the optimal economic timing for crop tree release treatments in hardwoods: results from simulation

Treesearch

Chris B. LeDoux; Gary W. Miller

2008-01-01

In this study we used data from 16 Appalachian hardwood stands, a growth and yield computer simulation model, and stump-to-mill logging cost-estimating software to evaluate the optimal economic timing of crop tree release (CTR) treatments. The simulated CTR treatments consisted of one-time logging operations at stand age 11, 23, 31, or 36 years, with the residual...
Whole-tree utilization of southern pine advanced by developments in mechanical conversion

Treesearch

P. Koch

1973-01-01

In 1963 approximately 30 percent of the dry weight of above- and below-ground parts of southern pine trees ended as dry-surfaced lumber or paper; the remaining 70 percent was largely unused. By 1980, computer-controlled chipping headrigs, thin-kerf saws, lamination of lumber from rotary-cut veneer, high-yield pulping processes, and more intensive use of roots, bark,...
Whole-tree utilization of southern pine advanced by developments in mechanical conversion

Treesearch

Peter Koch

1973-01-01

In 1963 approximately 30 percent of the dry weight of aboe- and below-ground parts of southern pine trees ended as dry-surfaced lumber or paper; the remaining 70 percent was largely unused. By 1980, computer-controlled chipping headrigs, think-kerf saws, lamination of lumber from rotary-cut veneer, high-yield pulping processes, and more intensive use of roots, bark,...
Random sampling of constrained phylogenies: conducting phylogenetic analyses when the phylogeny is partially known.

PubMed

Housworth, E A; Martins, E P

2001-01-01

Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.

An improved spanning tree approach for the reliability analysis of supply chain collaborative network

NASA Astrophysics Data System (ADS)

Lam, C. Y.; Ip, W. H.

2012-11-01

A higher degree of reliability in the collaborative network can increase the competitiveness and performance of an entire supply chain. As supply chain networks grow more complex, the consequences of unreliable behaviour become increasingly severe in terms of cost, effort and time. Moreover, it is computationally difficult to calculate the network reliability of a Non-deterministic Polynomial-time hard (NP-hard) all-terminal network using state enumeration, as this may require a huge number of iterations for topology optimisation. Therefore, this paper proposes an alternative approach of an improved spanning tree for reliability analysis to help effectively evaluate and analyse the reliability of collaborative networks in supply chains and reduce the comparative computational complexity of algorithms. Set theory is employed to evaluate and model the all-terminal reliability of the improved spanning tree algorithm and present a case study of a supply chain used in lamp production to illustrate the application of the proposed approach.
Method and system for data clustering for very large databases

NASA Technical Reports Server (NTRS)

Livny, Miron (Inventor); Zhang, Tian (Inventor); Ramakrishnan, Raghu (Inventor)

1998-01-01

Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points.
Augmenting computer networks

NASA Technical Reports Server (NTRS)

Bokhari, S. H.; Raza, A. D.

1984-01-01

Three methods of augmenting computer networks by adding at most one link per processor are discussed: (1) A tree of N nodes may be augmented such that the resulting graph has diameter no greater than 4log sub 2((N+2)/3)-2. Thi O(N(3)) algorithm can be applied to any spanning tree of a connected graph to reduce the diameter of that graph to O(log N); (2) Given a binary tree T and a chain C of N nodes each, C may be augmented to produce C so that T is a subgraph of C. This algorithm is O(N) and may be used to produce augmented chains or rings that have diameter no greater than 2log sub 2((N+2)/3) and are planar; (3) Any rectangular two-dimensional 4 (8) nearest neighbor array of size N = 2(k) may be augmented so that it can emulate a single step shuffle-exchange network of size N/2 in 3(t) time steps.
Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

PubMed

Mirzaei, Sajad; Wu, Yufeng

2016-01-01

Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.
Comprehensive national database of tree effects on air quality and human health in the United States.

PubMed

Hirabayashi, Satoshi; Nowak, David J

2016-08-01

Trees remove air pollutants through dry deposition processes depending upon forest structure, meteorology, and air quality that vary across space and time. Employing nationally available forest, weather, air pollution and human population data for 2010, computer simulations were performed for deciduous and evergreen trees with varying leaf area index for rural and urban areas in every county in the conterminous United States. The results populated a national database of annual air pollutant removal, concentration changes, and reductions in adverse health incidences and costs for NO2, O3, PM2.5 and SO2. The developed database enabled a first order approximation of air quality and associated human health benefits provided by trees with any forest configurations anywhere in the conterminous United States over time. Comprehensive national database of tree effects on air quality and human health in the United States was developed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Forest understory trees can be segmented accurately within sufficiently dense airborne laser scanning point clouds.

PubMed

Hamraz, Hamid; Contreras, Marco A; Zhang, Jun

2017-07-28

Airborne laser scanning (LiDAR) point clouds over large forested areas can be processed to segment individual trees and subsequently extract tree-level information. Existing segmentation procedures typically detect more than 90% of overstory trees, yet they barely detect 60% of understory trees because of the occlusion effect of higher canopy layers. Although understory trees provide limited financial value, they are an essential component of ecosystem functioning by offering habitat for numerous wildlife species and influencing stand development. Here we model the occlusion effect in terms of point density. We estimate the fractions of points representing different canopy layers (one overstory and multiple understory) and also pinpoint the required density for reasonable tree segmentation (where accuracy plateaus). We show that at a density of ~170 pt/m² understory trees can likely be segmented as accurately as overstory trees. Given the advancements of LiDAR sensor technology, point clouds will affordably reach this required density. Using modern computational approaches for big data, the denser point clouds can efficiently be processed to ultimately allow accurate remote quantification of forest resources. The methodology can also be adopted for other similar remote sensing or advanced imaging applications such as geological subsurface modelling or biomedical tissue analysis.
Synthesis of Efficient Structures for Concurrent Computation.

DTIC Science & Technology

1983-10-01

formal presentation of these techniques, called virtualisation and aggregation, can be found n [King-83$. 113.2 Census Functions Trees perform broadcast... Functions .. .. .. .. ... .... ... ... .... ... ... ....... 6 4 User-Assisted Aggregation .. .. .. .. ... ... ... .... ... .. .......... 6 5 Parallel...6. Simple Parallel Structure for Broadcasting .. .. .. .. .. . ... .. . .. . .... 4 Figure 7. Internal Structure of a Prefix Computation Network
Efficient algorithms for dilated mappings of binary trees

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf

1990-01-01

The problem is addressed to find a 1-1 mapping of the vertices of a binary tree onto those of a target binary tree such that the son of a node on the first binary tree is mapped onto a descendent of the image of that node in the second binary tree. There are two natural measures of the cost of this mapping, namely the dilation cost, i.e., the maximum distance in the target binary tree between the images of vertices that are adjacent in the original tree. The other measure, expansion cost, is defined as the number of extra nodes/edges to be added to the target binary tree in order to ensure a 1-1 mapping. An efficient algorithm to find a mapping of one binary tree onto another is described. It is shown that it is possible to minimize one cost of mapping at the expense of the other. This problem arises when designing pipelined arithmetic logic units (ALU) for special purpose computers. The pipeline is composed of ALU chips connected in the form of a binary tree. The operands to the pipeline can be supplied to the leaf nodes of the binary tree which then process and pass the results up to their parents. The final result is available at the root. As each new application may require a distinct nesting of operations, it is useful to be able to find a good mapping of a new binary tree over existing ALU tree. Another problem arises if every distinct required binary tree is known beforehand. Here it is useful to hardwire the pipeline in the form of a minimal supertree that contains all required binary trees.
A generic, cost-effective, and scalable cell lineage analysis platform

PubMed Central

Biezuner, Tamir; Spiro, Adam; Raz, Ofir; Amir, Shiran; Milo, Lilach; Adar, Rivka; Chapal-Ilani, Noa; Berman, Veronika; Fried, Yael; Ainbinder, Elena; Cohen, Galit; Barr, Haim M.; Halaban, Ruth; Shapiro, Ehud

2016-01-01

Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery. PMID:27558250
Planning chemical syntheses with deep neural networks and symbolic AI

NASA Astrophysics Data System (ADS)

Segler, Marwin H. S.; Preuss, Mike; Waller, Mark P.

2018-03-01

To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.
Enumeration method for tree-like chemical compounds with benzene rings and naphthalene rings by breadth-first search order.

PubMed

Jindalertudomdee, Jira; Hayashida, Morihiro; Zhao, Yang; Akutsu, Tatsuya

2016-03-01

Drug discovery and design are important research fields in bioinformatics. Enumeration of chemical compounds is essential not only for the purpose, but also for analysis of chemical space and structure elucidation. In our previous study, we developed enumeration methods BfsSimEnum and BfsMulEnum for tree-like chemical compounds using a tree-structure to represent a chemical compound, which is limited to acyclic chemical compounds only. In this paper, we extend the methods, and develop BfsBenNaphEnum that can enumerate tree-like chemical compounds containing benzene rings and naphthalene rings, which include benzene isomers and naphthalene isomers such as ortho, meta, and para, by treating a benzene ring as an atom with valence six, instead of a ring of six carbon atoms, and treating a naphthalene ring as two benzene rings having a special bond. We compare our method with MOLGEN 5.0, which is a well-known general purpose structure generator, to enumerate chemical structures from a set of chemical formulas in terms of the number of enumerated structures and the computational time. The result suggests that our proposed method can reduce the computational time efficiently. We propose the enumeration method BfsBenNaphEnum for tree-like chemical compounds containing benzene rings and naphthalene rings as cyclic structures. BfsBenNaphEnum was from 50 times to 5,000,000 times faster than MOLGEN 5.0 for instances with 8 to 14 carbon atoms in our experiments.
Neural Mechanisms Underlying the Computation of Hierarchical Tree Structures in Mathematics

PubMed Central

Nakai, Tomoya; Sakai, Kuniyoshi L.

2014-01-01

Whether mathematical and linguistic processes share the same neural mechanisms has been a matter of controversy. By examining various sentence structures, we recently demonstrated that activations in the left inferior frontal gyrus (L. IFG) and left supramarginal gyrus (L. SMG) were modulated by the Degree of Merger (DoM), a measure for the complexity of tree structures. In the present study, we hypothesize that the DoM is also critical in mathematical calculations, and clarify whether the DoM in the hierarchical tree structures modulates activations in these regions. We tested an arithmetic task that involved linear and quadratic sequences with recursive computation. Using functional magnetic resonance imaging, we found significant activation in the L. IFG, L. SMG, bilateral intraparietal sulcus (IPS), and precuneus selectively among the tested conditions. We also confirmed that activations in the L. IFG and L. SMG were free from memory-related factors, and that activations in the bilateral IPS and precuneus were independent from other possible factors. Moreover, by fitting parametric models of eight factors, we found that the model of DoM in the hierarchical tree structures was the best to explain the modulation of activations in these five regions. Using dynamic causal modeling, we showed that the model with a modulatory effect for the connection from the L. IPS to the L. IFG, and with driving inputs into the L. IFG, was highly probable. The intrinsic, i.e., task-independent, connection from the L. IFG to the L. IPS, as well as that from the L. IPS to the R. IPS, would provide a feedforward signal, together with negative feedback connections. We indicate that mathematics and language share the network of the L. IFG and L. IPS/SMG for the computation of hierarchical tree structures, and that mathematics recruits the additional network of the L. IPS and R. IPS. PMID:25379713
Genome-based approaches to develop vaccines against bacterial pathogens.

PubMed

Serruto, Davide; Serino, Laura; Masignani, Vega; Pizza, Mariagrazia

2009-05-26

Bacterial infectious diseases remain the single most important threat to health worldwide. Although conventional vaccinology approaches were successful in conferring protection against several diseases, they failed to provide efficacious solutions against many others. The advent of whole-genome sequencing changed the way to think about vaccine development, enabling the targeting of possible vaccine candidates starting from the genomic information of a single bacterial isolate, with a process named reverse vaccinology. As the genomic era progressed, reverse vaccinology has evolved with a pan-genome approach and multi-strain genome analysis became fundamental for the design of universal vaccines. This review describes the applications of genome-based approaches in the development of new vaccines against bacterial pathogens.
Genome-wide high-resolution aCGH analysis of gestational choriocarcinomas.

PubMed

Poaty, Henriette; Coullin, Philippe; Peko, Jean Félix; Dessen, Philippe; Diatta, Ange Lucien; Valent, Alexander; Leguern, Eric; Prévot, Sophie; Gombé-Mbalawa, Charles; Candelier, Jean-Jacques; Picard, Jean-Yves; Bernheim, Alain

2012-01-01

Eleven samples of DNA from choriocarcinomas were studied by high resolution CGH-array 244 K. They were studied after histopathological confirmation of the diagnosis, of the androgenic etiology and after a microsatellite marker analysis confirming the absence of contamination of tumor DNA from maternal DNA. Three cell lines, BeWo, JAR, JEG were also studied by this high resolution pangenomic technique. According to aCGH analysis, the de novo choriocarcinomas exhibited simple chromosomal rearrangements or normal profiles. The cell lines showed various and complex chromosomal aberrations. 23 Minimal Critical Regions were defined that allowed us to list the genes that were potentially implicated. Among them, unusually high numbers of microRNA clusters and imprinted genes were observed.
A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

NASA Technical Reports Server (NTRS)

Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

2015-01-01

Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
From cacti to carnivores: Improved phylotranscriptomic sampling and hierarchical homology inference provide further insight into the evolution of Caryophyllales.

PubMed

Walker, Joseph F; Yang, Ya; Feng, Tao; Timoneda, Alfonso; Mikenas, Jessica; Hutchison, Vera; Edwards, Caroline; Wang, Ning; Ahluwalia, Sonia; Olivieri, Julia; Walker-Hale, Nathanael; Majure, Lucas C; Puente, Raúl; Kadereit, Gudrun; Lauterbach, Maximilian; Eggli, Urs; Flores-Olvera, Hilda; Ochoterena, Helga; Brockington, Samuel F; Moore, Michael J; Smith, Stephen A

2018-03-01

The Caryophyllales contain ~12,500 species and are known for their cosmopolitan distribution, convergence of trait evolution, and extreme adaptations. Some relationships within the Caryophyllales, like those of many large plant clades, remain unclear, and phylogenetic studies often recover alternative hypotheses. We explore the utility of broad and dense transcriptome sampling across the order for resolving evolutionary relationships in Caryophyllales. We generated 84 transcriptomes and combined these with 224 publicly available transcriptomes to perform a phylogenomic analysis of Caryophyllales. To overcome the computational challenge of ortholog detection in such a large data set, we developed an approach for clustering gene families that allowed us to analyze >300 transcriptomes and genomes. We then inferred the species relationships using multiple methods and performed gene-tree conflict analyses. Our phylogenetic analyses resolved many clades with strong support, but also showed significant gene-tree discordance. This discordance is not only a common feature of phylogenomic studies, but also represents an opportunity to understand processes that have structured phylogenies. We also found taxon sampling influences species-tree inference, highlighting the importance of more focused studies with additional taxon sampling. Transcriptomes are useful both for species-tree inference and for uncovering evolutionary complexity within lineages. Through analyses of gene-tree conflict and multiple methods of species-tree inference, we demonstrate that phylogenomic data can provide unparalleled insight into the evolutionary history of Caryophyllales. We also discuss a method for overcoming computational challenges associated with homolog clustering in large data sets. © 2018 The Authors. American Journal of Botany is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America.
A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

NASA Astrophysics Data System (ADS)

Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

2015-12-01

Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation.

PubMed

Wang, Zhaocai; Huang, Dongmei; Meng, Huajun; Tang, Chengpei

2013-10-01

The minimum spanning tree (MST) problem is to find minimum edge connected subsets containing all the vertex of a given undirected graph. It is a vitally important NP-complete problem in graph theory and applied mathematics, having numerous real life applications. Moreover in previous studies, DNA molecular operations usually were used to solve NP-complete head-to-tail path search problems, rarely for NP-hard problems with multi-lateral path solutions result, such as the minimum spanning tree problem. In this paper, we present a new fast DNA algorithm for solving the MST problem using DNA molecular operations. For an undirected graph with n vertex and m edges, we reasonably design flexible length DNA strands representing the vertex and edges, take appropriate steps and get the solutions of the MST problem in proper length range and O(3m+n) time complexity. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation. Results of computer simulative experiments show that the proposed method updates some of the best known values with very short time and that the proposed method provides a better performance with solution accuracy over existing algorithms. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
AntiClustal: Multiple Sequence Alignment by antipole clustering and linear approximate 1-median computation.

PubMed

Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V

2003-01-01

In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
Knowledge Guided Evolutionary Algorithms in Financial Investing

ERIC Educational Resources Information Center

Wimmer, Hayden

2013-01-01

A large body of literature exists on evolutionary computing, genetic algorithms, decision trees, codified knowledge, and knowledge management systems; however, the intersection of these computing topics has not been widely researched. Moving through the set of all possible solutions--or traversing the search space--at random exhibits no control…

Using Computer-Assisted Multiple Representations in Learning Geometry Proofs

ERIC Educational Resources Information Center

Wong, Wing-Kwong; Yin, Sheng-Kai; Yang, Hsi-Hsun; Cheng, Ying-Hao

2011-01-01

Geometry theorem proving involves skills that are difficult to learn. Instead of working with abstract and complicated representations, students might start with concrete, graphical representations. A proof tree is a graphical representation of a formal proof, with each node representing a proposition or given conditions. A computer-assisted…
3D Tree Dimensionality Assessment Using Photogrammetry and Small Unmanned Aerial Vehicles

PubMed Central

2015-01-01

Detailed, precise, three-dimensional (3D) representations of individual trees are a prerequisite for an accurate assessment of tree competition, growth, and morphological plasticity. Until recently, our ability to measure the dimensionality, spatial arrangement, shape of trees, and shape of tree components with precision has been constrained by technological and logistical limitations and cost. Traditional methods of forest biometrics provide only partial measurements and are labor intensive. Active remote technologies such as LiDAR operated from airborne platforms provide only partial crown reconstructions. The use of terrestrial LiDAR is laborious, has portability limitations and high cost. In this work we capitalized on recent improvements in the capabilities and availability of small unmanned aerial vehicles (UAVs), light and inexpensive cameras, and developed an affordable method for obtaining precise and comprehensive 3D models of trees and small groups of trees. The method employs slow-moving UAVs that acquire images along predefined trajectories near and around targeted trees, and computer vision-based approaches that process the images to obtain detailed tree reconstructions. After we confirmed the potential of the methodology via simulation we evaluated several UAV platforms, strategies for image acquisition, and image processing algorithms. We present an original, step-by-step workflow which utilizes open source programs and original software. We anticipate that future development and applications of our method will improve our understanding of forest self-organization emerging from the competition among trees, and will lead to a refined generation of individual-tree-based forest models. PMID:26393926
3D Tree Dimensionality Assessment Using Photogrammetry and Small Unmanned Aerial Vehicles.

PubMed

Gatziolis, Demetrios; Lienard, Jean F; Vogs, Andre; Strigul, Nikolay S

2015-01-01

Detailed, precise, three-dimensional (3D) representations of individual trees are a prerequisite for an accurate assessment of tree competition, growth, and morphological plasticity. Until recently, our ability to measure the dimensionality, spatial arrangement, shape of trees, and shape of tree components with precision has been constrained by technological and logistical limitations and cost. Traditional methods of forest biometrics provide only partial measurements and are labor intensive. Active remote technologies such as LiDAR operated from airborne platforms provide only partial crown reconstructions. The use of terrestrial LiDAR is laborious, has portability limitations and high cost. In this work we capitalized on recent improvements in the capabilities and availability of small unmanned aerial vehicles (UAVs), light and inexpensive cameras, and developed an affordable method for obtaining precise and comprehensive 3D models of trees and small groups of trees. The method employs slow-moving UAVs that acquire images along predefined trajectories near and around targeted trees, and computer vision-based approaches that process the images to obtain detailed tree reconstructions. After we confirmed the potential of the methodology via simulation we evaluated several UAV platforms, strategies for image acquisition, and image processing algorithms. We present an original, step-by-step workflow which utilizes open source programs and original software. We anticipate that future development and applications of our method will improve our understanding of forest self-organization emerging from the competition among trees, and will lead to a refined generation of individual-tree-based forest models.
Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

PubMed Central

2013-01-01

Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets. PMID:24180377
Comparison of Taxi Time Prediction Performance Using Different Taxi Speed Decision Trees

NASA Technical Reports Server (NTRS)

Lee, Hanbong

2017-01-01

In the STBO modeler and tactical surface scheduler for ATD-2 project, taxi speed decision trees are used to calculate the unimpeded taxi times of flights taxiing on the airport surface. The initial taxi speed values in these decision trees did not show good prediction accuracy of taxi times. Using the more recent, reliable surveillance data, new taxi speed values in ramp area and movement area were computed. Before integrating these values into the STBO system, we performed test runs using live data from Charlotte airport, with different taxi speed settings: 1) initial taxi speed values and 2) new ones. Taxi time prediction performance was evaluated by comparing various metrics. The results show that the new taxi speed decision trees can calculate the unimpeded taxi-out times more accurately.
Modeling the distribution of white spruce (Picea glauca) for Alaska with high accuracy: an open access role-model for predicting tree species in last remaining wilderness areas

Treesearch

Bettina Ohse; Falk Huettmann; Stefanie M. Ickert-Bond; Glenn P. Juday

2009-01-01

Most wilderness areas still lack accurate distribution information on tree species. We met this need with a predictive GIS modeling approach, using freely available digital data and computer programs to efficiently obtain high-quality species distribution maps. Here we present a digital map with the predicted distribution of white spruce (Picea glauca...
A tool to determine crown and plot canopy transparency for forest inventory and analysis phase 3 plots using digital photographs

Treesearch

Matthew F. Winn; Philip A. Araman

2012-01-01

The USDA Forest Service Forest Inventory and Analysis (FIA) program collects crown foliage transparency estimates for individual trees on Phase 3 (P3) inventory plots. The FIA crown foliage estimate is obtained from a pair of perpendicular side views of the tree. Researchers with the USDA Forest Service Southern Research Station have developed a computer program that...
Transfer Factors for Contaminant Uptake by Fruit and Nut Trees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Napier, Bruce A.; Fellows, Robert J.; Minc, Leah D.

Transfer of radionuclides from soils into plants is one of the key mechanisms for long-term contamination of the human food chain. Nearly all computer models that address soil-to-plant uptake of radionuclides use empirically-derived transfer factors to address this process. Essentially all available soil-to-plant transfer factors are based on measurements in annual crops. Because very few measurements are available for tree fruits, samples were taken of alfalfa and oats and the stems, leaves, and fruits and nuts of almond, apple, apricot, carob, fig, grape, nectarine, pecan, pistachio (natural and grafted), and pomegranate, along with local surface soil. The samples were dried,more » ground, weighed, and analyzed for trace constituents through a combination of induction-coupled plasma mass spectrometry and instrumental neutron activation analysis for a wide range of naturally-occurring elements. Analysis results are presented and converted to soil-to-plant transfer factors. These are compared to commonly used and internationally recommended values. Those determined for annual crops are very similar to commonly-used values; those determined for tree fruits show interesting differences. Most macro- and micronutrients are slightly reduced in fruits; non-essential elements are reduced further. These findings may be used in existing computer models and may allow development of tree-fruit-specific transfer models.« less
MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

PubMed Central

Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

2011-01-01

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353
Study of the flow unsteadiness in the human airway using large eddy simulation

NASA Astrophysics Data System (ADS)

Bernate, Jorge A.; Geisler, Taylor S.; Padhy, Sourav; Shaqfeh, Eric S. G.; Iaccarino, Gianluca

2017-08-01

The unsteady flow in a patient-specific geometry of the airways is studied. The geometry comprises the oral cavity, orophrarynx, larynx, trachea, and the bronchial tree extending to generations 5-8. Simulations are carried out for a constant inspiratory flow rate of 60 liters/min, corresponding to a Reynolds number of 4213 for a nominal tracheal diameter of 2 cm. The computed mean flow field is compared extensively with magnetic resonance velocimetry measurements by Banko et al. [Exp. Fluids 56, 117 (2015), 10.1007/s00348-015-1966-y] carried out in the same computed-tomography-based geometry, showing good agreement. In particular, we focus on the dynamics of the flow in the bronchial tree. After becoming unsteady at a constriction in the oropharynx, the flow is found to be chaotic, exhibiting fluctuations with broad-band spectra even at the most distal airways in which the Reynolds numbers are as low as 300. An inertial range signature is present in the trachea but not in the bronchial tree where a narrower range of scales is observed. The unsteadiness is attributed to the convection of turbulent structures produced at the larynx as well as to local kinetic energy production throughout the bronchial tree. Production occurs predominantly at shear layers bounding geometry-induced separation regions.
The Approximability of Partial Vertex Covers in Trees.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mkrtchyan, Vahan; Parekh, Ojas D.; Segev, Danny

Motivated by applications in risk management of computational systems, we focus our attention on a special case of the partial vertex cover problem, where the underlying graph is assumed to be a tree. Here, we consider four possible versions of this setting, depending on whether vertices and edges are weighted or not. Two of these versions, where edges are assumed to be unweighted, are known to be polynomial-time solvable (Gandhi, Khuller, and Srinivasan, 2004). However, the computational complexity of this problem with weighted edges, and possibly with weighted vertices, has not been determined yet. The main contribution of this papermore » is to resolve these questions, by fully characterizing which variants of partial vertex cover remain intractable in trees, and which can be efficiently solved. In particular, we propose a pseudo-polynomial DP-based algorithm for the most general case of having weights on both edges and vertices, which is proven to be NPhard. This algorithm provides a polynomial-time solution method when weights are limited to edges, and combined with additional scaling ideas, leads to an FPTAS for the general case. A secondary contribution of this work is to propose a novel way of using centroid decompositions in trees, which could be useful in other settings as well.« less
Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

PubMed Central

Zhou, Xiaofan; Shen, Xing-Xing; Hittinger, Chris Todd

2018-01-01

Abstract The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses. PMID:29177474
Likelihood of Tree Topologies with Fossils and Diversification Rate Estimation.

PubMed

Didier, Gilles; Fau, Marine; Laurin, Michel

2017-11-01

Since the diversification process cannot be directly observed at the human scale, it has to be studied from the information available, namely the extant taxa and the fossil record. In this sense, phylogenetic trees including both extant taxa and fossils are the most complete representations of the diversification process that one can get. Such phylogenetic trees can be reconstructed from molecular and morphological data, to some extent. Among the temporal information of such phylogenetic trees, fossil ages are by far the most precisely known (divergence times are inferences calibrated mostly with fossils). We propose here a method to compute the likelihood of a phylogenetic tree with fossils in which the only considered time information is the fossil ages, and apply it to the estimation of the diversification rates from such data. Since it is required in our computation, we provide a method for determining the probability of a tree topology under the standard diversification model. Testing our approach on simulated data shows that the maximum likelihood rate estimates from the phylogenetic tree topology and the fossil dates are almost as accurate as those obtained by taking into account all the data, including the divergence times. Moreover, they are substantially more accurate than the estimates obtained only from the exact divergence times (without taking into account the fossil record). We also provide an empirical example composed of 50 Permo-Carboniferous eupelycosaur (early synapsid) taxa ranging in age from about 315 Ma (Late Carboniferous) to 270 Ma (shortly after the end of the Early Permian). Our analyses suggest a speciation (cladogenesis, or birth) rate of about 0.1 per lineage and per myr, a marginally lower extinction rate, and a considerable hidden paleobiodiversity of early synapsids. [Extinction rate; fossil ages; maximum likelihood estimation; speciation rate.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Birth-death models and coalescent point processes: the shape and probability of reconstructed phylogenies.

PubMed

Lambert, Amaury; Stadler, Tanja

2013-12-01

Forward-in-time models of diversification (i.e., speciation and extinction) produce phylogenetic trees that grow "vertically" as time goes by. Pruning the extinct lineages out of such trees leads to natural models for reconstructed trees (i.e., phylogenies of extant species). Alternatively, reconstructed trees can be modelled by coalescent point processes (CPPs), where trees grow "horizontally" by the sequential addition of vertical edges. Each new edge starts at some random speciation time and ends at the present time; speciation times are drawn from the same distribution independently. CPPs lead to extremely fast computation of tree likelihoods and simulation of reconstructed trees. Their topology always follows the uniform distribution on ranked tree shapes (URT). We characterize which forward-in-time models lead to URT reconstructed trees and among these, which lead to CPP reconstructed trees. We show that for any "asymmetric" diversification model in which speciation rates only depend on time and extinction rates only depend on time and on a non-heritable trait (e.g., age), the reconstructed tree is CPP, even if extant species are incompletely sampled. If rates additionally depend on the number of species, the reconstructed tree is (only) URT (but not CPP). We characterize the common distribution of speciation times in the CPP description, and discuss incomplete species sampling as well as three special model cases in detail: (1) the extinction rate does not depend on a trait; (2) rates do not depend on time; (3) mass extinctions may happen additionally at certain points in the past. Copyright © 2013 Elsevier Inc. All rights reserved.
Role of virtual bronchoscopy in children with a vegetable foreign body in the tracheobronchial tree.

PubMed

Behera, G; Tripathy, N; Maru, Y K; Mundra, R K; Gupta, Y; Lodha, M

2014-12-01

Multidetector computed tomography virtual bronchoscopy is a non-invasive diagnostic tool which provides a three-dimensional view of the tracheobronchial airway. This study aimed to evaluate the usefulness of virtual bronchoscopy in cases of vegetable foreign body aspiration in children. The medical records of patients with a history of foreign body aspiration from August 2006 to August 2010 were reviewed. Data were collected regarding their clinical presentation and chest X-ray, virtual bronchoscopy and rigid bronchoscopy findings. Cases of metallic and other non-vegetable foreign bodies were excluded from the analysis. Patients with multidetector computed tomography virtual bronchoscopy showing features of vegetable foreign body were included in the analysis. For each patient, virtual bronchoscopy findings were reviewed and compared with those of rigid bronchoscopy. A total of 60 patients; all children ranging from 1 month to 8 years of age, were included. The mean age at presentation was 2.01 years. Rigid bronchoscopy confirmed the results of multidetector computed tomography virtual bronchoscopy (i.e. presence of foreign body, site of lodgement, and size and shape) in 59 patients. In the remaining case, a vegetable foreign body identified by virtual bronchoscopy was revealed by rigid bronchoscopy to be a thick mucus plug. Thus, the positive predictive value of virtual bronchoscopy was 98.3 per cent. Multidetector computed tomography virtual bronchoscopy is a sensitive and specific diagnostic tool for identifying radiolucent vegetable foreign bodies in the tracheobronchial tree. It can also provide a useful pre-operative road map for rigid bronchoscopy. Patients suspected of having an airway foreign body or chronic unexplained respiratory symptoms should undergo multidetector computed tomography virtual bronchoscopy to rule out a vegetable foreign body in the tracheobronchial tree and avoid general anaesthesia and invasive rigid bronchoscopy.
Efficient algorithms for a class of partitioning problems

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf; Bokhari, Shahid H.

1990-01-01

The problem of optimally partitioning the modules of chain- or tree-like tasks over chain-structured or host-satellite multiple computer systems is addressed. This important class of problems includes many signal processing and industrial control applications. Prior research has resulted in a succession of faster exact and approximate algorithms for these problems. Polynomial exact and approximate algorithms are described for this class that are better than any of the previously reported algorithms. The approach is based on a preprocessing step that condenses the given chain or tree structured task into a monotonic chain or tree. The partitioning of this monotonic take can then be carried out using fast search techniques.
Exact diffusion constant in a lattice-gas wind-tree model on a Bethe lattice

NASA Astrophysics Data System (ADS)

Zhang, Guihua; Percus, J. K.

1992-02-01

Kong and Cohen [Phys. Rev. B 40, 4838 (1989)] obtained the diffusion constant of a lattice-gas wind-tree model in the Boltzmann approximation. The result is consistent with computer simulations for low tree concentration. In this Brief Report we find the exact diffusion constant of the model on a Bethe lattice, which turns out to be identical with the Kong-Cohen and Gunn-Ortuño results. Our interpretation is that the Boltzmann approximation is exact for this type of diffusion on a Bethe lattice in the same sense that the Bethe-Peierls approximation is exact for the Ising model on a Bethe lattice.
From statistics of regular tree-like graphs to distribution function and gyration radius of branched polymers

NASA Astrophysics Data System (ADS)

Grosberg, Alexander Y.; Nechaev, Sergei K.

2015-08-01

We consider flexible branched polymer, with quenched branch structure, and show that its conformational entropy as a function of its gyration radius R, at large R, obeys, in the scaling sense, Δ S˜ {R}2/({a}2L), with a bond length (or Kuhn segment) and L defined as an average spanning distance. We show that this estimate is valid up to at most the logarithmic correction for any tree. We do so by explicitly computing the largest eigenvalues of Kramers matrices for both regular and ‘sparse’ three-branched trees, uncovering on the way their peculiar mathematical properties.
Simulation of wetlands forest vegetation dynamics

USGS Publications Warehouse

Phipps, R.L.

1979-01-01

A computer program, SWAMP, was designed to simulate the effects of flood frequency and depth to water table on southern wetlands forest vegetation dynamics. By incorporating these hydrologic characteristics into the model, forest vegetation and vegetation dynamics can be simulated. The model, based on data from the White River National Wildlife Refuge near De Witt, Arkansas, "grows" individual trees on a 20 x 20-m plot taking into account effects on the tree growth of flooding, depth to water table, shade tolerance, overtopping and crowding, and probability of death and reproduction. A potential application of the model is illustrated with simulations of tree fruit production following flood-control implementation and lumbering. ?? 1979.
Signatures of microevolutionary processes in phylogenetic patterns.

PubMed

Costa, Carolina L N; Lemos-Costa, Paula; Marquitti, Flavia M D; Fernandes, Lucas D; Ramos, Marlon F; Schneider, David M; Martins, Ayana B; Aguiar, Marcus A M

2018-06-23

Phylogenetic trees are representations of evolutionary relationships among species and contain signatures of the processes responsible for the speciation events they display. Inferring processes from tree properties, however, is challenging. To address this problem we analysed a spatially-explicit model of speciation where genome size and mating range can be controlled. We simulated parapatric and sympatric (narrow and wide mating range, respectively) radiations and constructed their phylogenetic trees, computing structural properties such as tree balance and speed of diversification. We showed that parapatric and sympatric speciation are well separated by these structural tree properties. Balanced trees with constant rates of diversification only originate in sympatry and genome size affected both the balance and the speed of diversification of the simulated trees. Comparison with empirical data showed that most of the evolutionary radiations considered to have developed in parapatry or sympatry are in good agreement with model predictions. Even though additional forces other than spatial restriction of gene flow, genome size, and genetic incompatibilities, do play a role in the evolution of species formation, the microevolutionary processes modeled here capture signatures of the diversification pattern of evolutionary radiations, regarding the symmetry and speed of diversification of lineages.

Evolutionary inference via the Poisson Indel Process.

PubMed

Bouchard-Côté, Alexandre; Jordan, Michael I

2013-01-22

We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Evolutionary inference via the Poisson Indel Process

PubMed Central

Bouchard-Côté, Alexandre; Jordan, Michael I.

2013-01-01

We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114–124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments. PMID:23275296
STBase: one million species trees for comparative biology.

PubMed

McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

2015-01-01

Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction

PubMed Central

De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David

2016-01-01

Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. PMID:25281847
Automated Mobile System for Accurate Outdoor Tree Crop Enumeration Using an Uncalibrated Camera.

PubMed

Nguyen, Thuy Tuong; Slaughter, David C; Hanson, Bradley D; Barber, Andrew; Freitas, Amy; Robles, Daniel; Whelan, Erin

2015-07-28

This paper demonstrates an automated computer vision system for outdoor tree crop enumeration in a seedling nursery. The complete system incorporates both hardware components (including an embedded microcontroller, an odometry encoder, and an uncalibrated digital color camera) and software algorithms (including microcontroller algorithms and the proposed algorithm for tree crop enumeration) required to obtain robust performance in a natural outdoor environment. The enumeration system uses a three-step image analysis process based upon: (1) an orthographic plant projection method integrating a perspective transform with automatic parameter estimation; (2) a plant counting method based on projection histograms; and (3) a double-counting avoidance method based on a homography transform. Experimental results demonstrate the ability to count large numbers of plants automatically with no human effort. Results show that, for tree seedlings having a height up to 40 cm and a within-row tree spacing of approximately 10 cm, the algorithms successfully estimated the number of plants with an average accuracy of 95.2% for trees within a single image and 98% for counting of the whole plant population in a large sequence of images.
Automated Mobile System for Accurate Outdoor Tree Crop Enumeration Using an Uncalibrated Camera

PubMed Central

Nguyen, Thuy Tuong; Slaughter, David C.; Hanson, Bradley D.; Barber, Andrew; Freitas, Amy; Robles, Daniel; Whelan, Erin

2015-01-01

This paper demonstrates an automated computer vision system for outdoor tree crop enumeration in a seedling nursery. The complete system incorporates both hardware components (including an embedded microcontroller, an odometry encoder, and an uncalibrated digital color camera) and software algorithms (including microcontroller algorithms and the proposed algorithm for tree crop enumeration) required to obtain robust performance in a natural outdoor environment. The enumeration system uses a three-step image analysis process based upon: (1) an orthographic plant projection method integrating a perspective transform with automatic parameter estimation; (2) a plant counting method based on projection histograms; and (3) a double-counting avoidance method based on a homography transform. Experimental results demonstrate the ability to count large numbers of plants automatically with no human effort. Results show that, for tree seedlings having a height up to 40 cm and a within-row tree spacing of approximately 10 cm, the algorithms successfully estimated the number of plants with an average accuracy of 95.2% for trees within a single image and 98% for counting of the whole plant population in a large sequence of images. PMID:26225982
Eigenvalues of normalized Laplacian matrices of fractal trees and dendrimers: Analytical results and applications

NASA Astrophysics Data System (ADS)

Julaiti, Alafate; Wu, Bin; Zhang, Zhongzhi

2013-05-01

The eigenvalues of the normalized Laplacian matrix of a network play an important role in its structural and dynamical aspects associated with the network. In this paper, we study the spectra and their applications of normalized Laplacian matrices of a family of fractal trees and dendrimers modeled by Cayley trees, both of which are built in an iterative way. For the fractal trees, we apply the spectral decimation approach to determine analytically all the eigenvalues and their corresponding multiplicities, with the eigenvalues provided by a recursive relation governing the eigenvalues of networks at two successive generations. For Cayley trees, we show that all their eigenvalues can be obtained by computing the roots of several small-degree polynomials defined recursively. By using the relation between normalized Laplacian spectra and eigentime identity, we derive the explicit solution to the eigentime identity for random walks on the two treelike networks, the leading scalings of which follow quite different behaviors. In addition, we corroborate the obtained eigenvalues and their degeneracies through the link between them and the number of spanning trees.
Reinforcement Learning Trees

PubMed Central

Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

2015-01-01

In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Improving multivariate Horner schemes with Monte Carlo tree search

NASA Astrophysics Data System (ADS)

Kuipers, J.; Plaat, A.; Vermaseren, J. A. M.; van den Herik, H. J.

2013-11-01

Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two.
Bounding the Resource Availability of Partially Ordered Events with Constant Resource Impact

NASA Technical Reports Server (NTRS)

Frank, Jeremy

2004-01-01

We compare existing techniques to bound the resource availability of partially ordered events. We first show that, contrary to intuition, two existing techniques, one due to Laborie and one due to Muscettola, are not strictly comparable in terms of the size of the search trees generated under chronological search with a fixed heuristic. We describe a generalization of these techniques called the Flow Balance Constraint to tightly bound the amount of available resource for a set of partially ordered events with piecewise constant resource impact We prove that the new technique generates smaller proof trees under chronological search with a fixed heuristic, at little increase in computational expense. We then show how to construct tighter resource bounds but at increased computational cost.
Searching molecular structure databases with tandem mass spectra using CSI:FingerID

PubMed Central

Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

2015-01-01

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543
Priority Queues for Computer Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S. (Inventor)

1998-01-01

The present invention is embodied in new priority queue data structures for event list management of computer simulations, and includes a new priority queue data structure and an improved event horizon applied to priority queue data structures. ne new priority queue data structure is a Qheap and is made out of linked lists for robust, fast, reliable, and stable event list management and uses a temporary unsorted list to store all items until one of the items is needed. Then the list is sorted, next, the highest priority item is removed, and then the rest of the list is inserted in the Qheap. Also, an event horizon is applied to binary tree and splay tree priority queue data structures to form the improved event horizon for event management.
Identifying messaging completion in a parallel computer by checking for change in message received and transmitted count at each node

DOEpatents

Archer, Charles J [Rochester, MN; Hardwick, Camesha R [Fayetteville, NC; McCarthy, Patrick J [Rochester, MN; Wallenfelt, Brian P [Eden Prairie, MN

2009-06-23

Methods, parallel computers, and products are provided for identifying messaging completion on a parallel computer. The parallel computer includes a plurality of compute nodes, the compute nodes coupled for data communications by at least two independent data communications networks including a binary tree data communications network optimal for collective operations that organizes the nodes as a tree and a torus data communications network optimal for point to point operations that organizes the nodes as a torus. Embodiments include reading all counters at each node of the torus data communications network; calculating at each node a current node value in dependence upon the values read from the counters at each node; and determining for all nodes whether the current node value for each node is the same as a previously calculated node value for each node. If the current node is the same as the previously calculated node value for all nodes of the torus data communications network, embodiments include determining that messaging is complete and if the current node is not the same as the previously calculated node value for all nodes of the torus data communications network, embodiments include determining that messaging is currently incomplete.
Birth-death prior on phylogeny and speed dating

PubMed Central

2008-01-01

Background In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models. PMID:18318893
Birth-death prior on phylogeny and speed dating.

PubMed

Akerborg, Orjan; Sennblad, Bengt; Lagergren, Jens

2008-03-04

In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.
Nucleic Acid Polymers Are Active against Hepatitis Delta Virus Infection In Vitro.

PubMed

Beilstein, Frauke; Blanchet, Matthieu; Vaillant, Andrew; Sureau, Camille

2018-02-15

In this study, an in vitro infection model for the hepatitis delta virus (HDV) was used to evaluate the antiviral effects of phosphorothioate nucleic acid polymers (NAPs) and investigate their mechanism of action. The results show that NAPs inhibit HDV infection at concentrations less than 4 μM in cultures of differentiated human hepatoma cells. NAPs were shown to be active at viral entry but inactive postentry on HDV RNA replication. Inhibition was independent of the NAP nucleotide sequence but dependent on both size and amphipathicity of the polymer. NAP antiviral activity was effective against HDV virions bearing the main hepatitis B virus (HBV) immune escape substitutions (D144A and G145R) and was pangenomic with regard to HBV envelope proteins. Furthermore, similar to immobilized heparin, immobilized NAPs could bind HDV particles, suggesting that entry inhibition was due, at least in part, to preventing attachment of the virus to cell surface glycosaminoglycans. The results document NAPs as a novel class of antiviral compounds that can prevent HDV propagation. IMPORTANCE HDV infection causes the most severe form of viral hepatitis in humans and one of the most difficult to cure. Currently, treatments are limited to long-term administration of interferon at high doses, which provide only partial efficacy. There is thus an urgent need for innovative approaches to identify new antiviral against HDV. The significance of our study is in demonstrating that nucleic acid polymers (NAPs) are active against HDV by targeting the envelope of HDV virions. In an in vitro infection assay, NAP activity was recorded at concentrations less than 4 μM in the absence of cell toxicity. Furthermore, the fact that NAPs could block HDV at viral entry suggests their potential to control the spread of HDV in a chronically HBV-infected liver. In addition, NAP anti-HDV activity was pangenomic with regard to HBV envelope proteins and not circumvented by HBsAg substitutions associated with HBV immune escape. Copyright © 2018 American Society for Microbiology.
Comparative Genomics of Plant-Associated Pseudomonas spp.: Insights into Diversity and Inheritance of Traits Involved in Multitrophic Interactions

PubMed Central

Loper, Joyce E.; Hassan, Karl A.; Mavrodi, Dmitri V.; Davis, Edward W.; Lim, Chee Kent; Shaffer, Brenda T.; Elbourne, Liam D. H.; Stockwell, Virginia O.; Hartney, Sierra L.; Breakwell, Katy; Henkels, Marcella D.; Tetu, Sasha G.; Rangel, Lorena I.; Kidarsa, Teresa A.; Wilson, Neil L.; van de Mortel, Judith E.; Song, Chunxu; Blumhagen, Rachel; Radune, Diana; Hostetler, Jessica B.; Brinkac, Lauren M.; Durkin, A. Scott; Kluepfel, Daniel A.; Wechter, W. Patrick; Anderson, Anne J.; Kim, Young Cheol; Pierson, Leland S.; Pierson, Elizabeth A.; Lindow, Steven E.; Kobayashi, Donald Y.; Raaijmakers, Jos M.; Weller, David M.; Thomashow, Linda S.; Allen, Andrew E.; Paulsen, Ian T.

2012-01-01

We provide here a comparative genome analysis of ten strains within the Pseudomonas fluorescens group including seven new genomic sequences. These strains exhibit a diverse spectrum of traits involved in biological control and other multitrophic interactions with plants, microbes, and insects. Multilocus sequence analysis placed the strains in three sub-clades, which was reinforced by high levels of synteny, size of core genomes, and relatedness of orthologous genes between strains within a sub-clade. The heterogeneity of the P. fluorescens group was reflected in the large size of its pan-genome, which makes up approximately 54% of the pan-genome of the genus as a whole, and a core genome representing only 45–52% of the genome of any individual strain. We discovered genes for traits that were not known previously in the strains, including genes for the biosynthesis of the siderophores achromobactin and pseudomonine and the antibiotic 2-hexyl-5-propyl-alkylresorcinol; novel bacteriocins; type II, III, and VI secretion systems; and insect toxins. Certain gene clusters, such as those for two type III secretion systems, are present only in specific sub-clades, suggesting vertical inheritance. Almost all of the genes associated with multitrophic interactions map to genomic regions present in only a subset of the strains or unique to a specific strain. To explore the evolutionary origin of these genes, we mapped their distributions relative to the locations of mobile genetic elements and repetitive extragenic palindromic (REP) elements in each genome. The mobile genetic elements and many strain-specific genes fall into regions devoid of REP elements (i.e., REP deserts) and regions displaying atypical tri-nucleotide composition, possibly indicating relatively recent acquisition of these loci. Collectively, the results of this study highlight the enormous heterogeneity of the P. fluorescens group and the importance of the variable genome in tailoring individual strains to their specific lifestyles and functional repertoire. PMID:22792073
Pangenome evidence for extensive interdomain horizontal transfer affecting lineage core and shell genes in uncultured planktonic thaumarchaeota and euryarchaeota.

PubMed

Deschamps, Philippe; Zivanovic, Yvan; Moreira, David; Rodriguez-Valera, Francisco; López-García, Purificación

2014-06-12

Horizontal gene transfer (HGT) is an important force in evolution, which may lead, among other things, to the adaptation to new environments by the import of new metabolic functions. Recent studies based on phylogenetic analyses of a few genome fragments containing archaeal 16S rRNA genes and fosmid-end sequences from deep-sea metagenomic libraries have suggested that marine planktonic archaea could be affected by high HGT frequency. Likewise, a composite genome of an uncultured marine euryarchaeote showed high levels of gene sequence similarity to bacterial genes. In this work, we ask whether HGT is frequent and widespread in genomes of these marine archaea, and whether HGT is an ancient and/or recurrent phenomenon. To answer these questions, we sequenced 997 fosmid archaeal clones from metagenomic libraries of deep-Mediterranean waters (1,000 and 3,000 m depth) and built comprehensive pangenomes for planktonic Thaumarchaeota (Group I archaea) and Euryarchaeota belonging to the uncultured Groups II and III Euryarchaeota (GII/III-Euryarchaeota). Comparison with available reference genomes of Thaumarchaeota and a composite marine surface euryarchaeote genome allowed us to define sets of core, lineage-specific core, and shell gene ortholog clusters for the two archaeal lineages. Molecular phylogenetic analyses of all gene clusters showed that 23.9% of marine Thaumarchaeota genes and 29.7% of GII/III-Euryarchaeota genes had been horizontally acquired from bacteria. HGT is not only extensive and directional but also ongoing, with high HGT levels in lineage-specific core (ancient transfers) and shell (recent transfers) genes. Many of the acquired genes are related to metabolism and membrane biogenesis, suggesting an adaptive value for life in cold, oligotrophic oceans. We hypothesize that the acquisition of an important amount of foreign genes by the ancestors of these archaeal groups significantly contributed to their divergence and ecological success. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A Computer Program for Testing Grammars On-Line.

ERIC Educational Resources Information Center

Gross, Louis N.

This paper describes a computer system which is intended to aid the linguist in building a transformational grammar. The program operates as a rule tester, performing three services for the user through sets of functions which allow the user to--specify, change, and print base trees (to which transformations would apply); define transformations…
When Practice Doesn't Make Perfect: Effects of Task Goals on Learning Computing Concepts

ERIC Educational Resources Information Center

Miller, Craig S.; Settle, Amber

2011-01-01

Specifying file references for hypertext links is an elementary competence that nevertheless draws upon core computational thinking concepts such as tree traversal and the distinction between relative and absolute references. In this article we explore the learning effects of different instructional strategies in the context of an introductory…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.