databases phylogenetic analysis: Topics by Science.gov

Sample records for databases phylogenetic analysis

REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

PubMed Central

Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

2009-01-01

The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

PubMed

Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A

2009-10-01

Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
Taking the First Steps towards a Standard for Reporting on Phylogenies: Minimal Information about a Phylogenetic Analysis (MIAPA)

PubMed Central

LEEBENS-MACK, JIM; VISION, TODD; BRENNER, ERIC; BOWERS, JOHN E.; CANNON, STEVEN; CLEMENT, MARK J.; CUNNINGHAM, CLIFFORD W.; dePAMPHILIS, CLAUDE; deSALLE, ROB; DOYLE, JEFF J.; EISEN, JONATHAN A.; GU, XUN; HARSHMAN, JOHN; JANSEN, ROBERT K.; KELLOGG, ELIZABETH A.; KOONIN, EUGENE V.; MISHLER, BRENT D.; PHILIPPE, HERVÉ; PIRES, J. CHRIS; QIU, YIN-LONG; RHEE, SEUNG Y.; SJÖLANDER, KIMMEN; SOLTIS, DOUGLAS E.; SOLTIS, PAMELA S.; STEVENSON, DENNIS W.; WALL, KERR; WARNOW, TANDY; ZMASEK, CHRISTIAN

2011-01-01

In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors. PMID:16901231
Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

PubMed

Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

2016-12-01

The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
A new version of the RDP (Ribosomal Database Project)

NASA Technical Reports Server (NTRS)

Maidak, B. L.; Cole, J. R.; Parker, C. T. Jr; Garrity, G. M.; Larsen, N.; Li, B.; Lilburn, T. G.; McCaughey, M. J.; Olsen, G. J.; Overbeek, R.;

1999-01-01

The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.

Prediction and phylogenetic analysis of mammalian short interspersed elements (SINEs).

PubMed

Rogozin, I B; Mayorov, V I; Lavrentieva, M V; Milanesi, L; Adkison, L R

2000-09-01

The presence of repetitive elements can create serious problems for sequence analysis, especially in the case of homology searches in nucleotide sequence databases. Repetitive elements should be treated carefully by using special programs and databases. In this paper, various aspects of SINE (short interspersed repetitive element) identification, analysis and evolution are discussed.
The chordate proteome history database.

PubMed

Levasseur, Anthony; Paganini, Julien; Dainat, Jacques; Thompson, Julie D; Poch, Olivier; Pontarotti, Pierre; Gouret, Philippe

2012-01-01

The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research.

PubMed

Chang, Suhua; Zhang, Jiajie; Liao, Xiaoyun; Zhu, Xinxing; Wang, Dahai; Zhu, Jiang; Feng, Tao; Zhu, Baoli; Gao, George F; Wang, Jian; Yang, Huanming; Yu, Jun; Wang, Jing

2007-01-01

Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at http://influenza.genomics.org.cn.
EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

PubMed Central

Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

2015-01-01

The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928
MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution

PubMed Central

Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

2015-01-01

Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928
MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution.

PubMed

Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian

2015-01-01

Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. © The Author(s) 2015. Published by Oxford University Press.
Phylogenetics.

PubMed

Sleator, Roy D

2011-04-01

The recent rapid expansion in the DNA and protein databases, arising from large-scale genomic and metagenomic sequence projects, has forced significant development in the field of phylogenetics: the study of the evolutionary relatedness of the planet's inhabitants. Advances in phylogenetic analysis have greatly transformed our view of the landscape of evolutionary biology, transcending the view of the tree of life that has shaped evolutionary theory since Darwinian times. Indeed, modern phylogenetic analysis no longer focuses on the restricted Darwinian-Mendelian model of vertical gene transfer, but must also consider the significant degree of lateral gene transfer, which connects and shapes almost all living things. Herein, I review the major tree-building methods, their strengths, weaknesses and future prospects.
Molecular Tracing of Hepatitis C Virus Genotype 1 Isolates in Iran: A NS5B Phylogenetic Analysis with Systematic Review.

PubMed

Hesamizadeh, Khashayar; Alavian, Seyed Moayed; Najafi Tireh Shabankareh, Azar; Sharafi, Heidar

2016-12-01

Hepatitis C virus (HCV) is characterized by a high degree of genetic heterogeneity and classified into 7 genotypes and different subtypes. It heterogeneously distributed through various risk groups and geographical regions. A well-established phylogenetic relationship can simplify the tracing of HCV hierarchical strata into geographical regions. The current study aimed to find genetic phylogeny of subtypes 1a and 1b of HCV isolates based on NS5B nucleotide sequences in Iran and other members of Eastern Mediterranean regional office of world health organization, as well as other Middle Eastern countries, with a systematic review of available published and unpublished studies. The phylogenetic analyses were performed based on the nucleotide sequences of NS5B gene of HCV genotype 1 (HCV-1), which were registered in the GenBank database. The literature review was performed in two steps: 1) searching studies evaluating the NS5B sequences of HCV-1, on PubMed, Scopus, and Web of Science, and 2) Searching sequences of unpublished studies registered in the GenBank database. In this study, 442 sequences from HCV-1a and 232 from HCV-1b underwent phylogenetic analysis. Phylogenetic analysis of all sequences revealed different clusters in the phylogenetic trees. The results showed that the proportion of HCV-1a and -1b isolates from Iranian patients probably originated from domestic sources. Moreover, the HCV-1b isolates from Iranian patients may have similarities with the European ones. In this study, phylogenetic reconstruction of HCV-1 sequences clearly indicated for molecular tracing and ancestral relationships of the HCV genotypes in Iran, and showed the likelihood of domestic origin for HCV-1a and various origin for HCV-1b.
EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

PubMed

Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

2015-01-01

The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.
Stratification of co-evolving genomic groups using ranked phylogenetic profiles

PubMed Central

Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

2009-01-01

Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

PubMed

Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

2012-09-08

The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.
ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

PubMed Central

2012-01-01

Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

PubMed

Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

2006-01-01

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Functional & phylogenetic diversity of copepod communities

NASA Astrophysics Data System (ADS)

Benedetti, F.; Ayata, S. D.; Blanco-Bercial, L.; Cornils, A.; Guilhaumon, F.

2016-02-01

The diversity of natural communities is classically estimated through species identification (taxonomic diversity) but can also be estimated from the ecological functions performed by the species (functional diversity), or from the phylogenetic relationships among them (phylogenetic diversity). Estimating functional diversity requires the definition of specific functional traits, i.e., phenotypic characteristics that impact fitness and are relevant to ecosystem functioning. Estimating phylogenetic diversity requires the description of phylogenetic relationships, for instance by using molecular tools. In the present study, we focused on the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. First, we implemented a specific trait database for the most commonly-sampled and abundant copepod species of the Mediterranean Sea. Our database includes 191 species, described by seven traits encompassing diverse ecological functions: minimal and maximal body length, trophic group, feeding type, spawning strategy, diel vertical migration and vertical habitat. Clustering analysis in the functional trait space revealed that Mediterranean copepods can be gathered into groups that have different ecological roles. Second, we reconstructed a phylogenetic tree using the available sequences of 18S rRNA. Our tree included 154 of the analyzed Mediterranean copepod species. We used these two datasets to describe the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. The replacement component (turn-over) and the species richness difference component (nestedness) of the beta diversity indices were identified. Finally, by comparing various and complementary aspects of plankton diversity (taxonomic, functional, and phylogenetic diversity) we were able to gain a better understanding of the relationships among the zooplankton community, biodiversity, ecosystem function, and environmental forcing.
ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.

PubMed

Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S

2004-01-01

ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.

Phylogenetic analysis of β-xylanase SRXL1 of Sporisorium reilianum and its relationship with families (GH10 and GH11) of Ascomycetes and Basidiomycetes

PubMed Central

Álvarez-Cervantes, Jorge; Díaz-Godínez, Gerardo; Mercado-Flores, Yuridia; Gupta, Vijai Kumar; Anducho-Reyes, Miguel Angel

2016-01-01

In this paper, the amino acid sequence of the β-xylanase SRXL1 of Sporisorium reilianum, which is a pathogenic fungus of maize was used as a model protein to find its phylogenetic relationship with other xylanases of Ascomycetes and Basidiomycetes and the information obtained allowed to establish a hypothesis of monophyly and of biological role. 84 amino acid sequences of β-xylanase obtained from the GenBank database was used. Groupings analysis of higher-level in the Pfam database allowed to determine that the proteins under study were classified into the GH10 and GH11 families, based on the regions of highly conserved amino acids, 233–318 and 180–193 respectively, where glutamate residues are responsible for the catalysis. PMID:27040368
Web-Based Phylogenetic Assignment Tool for Analysis of Terminal Restriction Fragment Length Polymorphism Profiles of Microbial Communities

PubMed Central

Kent, Angela D.; Smith, Dan J.; Benson, Barbara J.; Triplett, Eric W.

2003-01-01

Culture-independent DNA fingerprints are commonly used to assess the diversity of a microbial community. However, relating species composition to community profiles produced by community fingerprint methods is not straightforward. Terminal restriction fragment length polymorphism (T-RFLP) is a community fingerprint method in which phylogenetic assignments may be inferred from the terminal restriction fragment (T-RF) sizes through the use of web-based resources that predict T-RF sizes for known bacteria. The process quickly becomes computationally intensive due to the need to analyze profiles produced by multiple restriction digests and the complexity of profiles generated by natural microbial communities. A web-based tool is described here that rapidly generates phylogenetic assignments from submitted community T-RFLP profiles based on a database of fragments produced by known 16S rRNA gene sequences. Users have the option of submitting a customized database generated from unpublished sequences or from a gene other than the 16S rRNA gene. This phylogenetic assignment tool allows users to employ T-RFLP to simultaneously analyze microbial community diversity and species composition. An analysis of the variability of bacterial species composition throughout the water column in a humic lake was carried out to demonstrate the functionality of the phylogenetic assignment tool. This method was validated by comparing the results generated by this program with results from a 16S rRNA gene clone library. PMID:14602639
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
The COG database: a tool for genome-scale analysis of protein functions and evolution

PubMed Central

Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

2000-01-01

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175
Systematic analysis of snake neurotoxins' functional classification using a data warehousing approach.

PubMed

Siew, Joyce Phui Yee; Khan, Asif M; Tan, Paul T J; Koh, Judice L Y; Seah, Seng Hong; Koo, Chuay Yeng; Chai, Siaw Ching; Armugam, Arunmozhiarasi; Brusic, Vladimir; Jeyaseelan, Kandiah

2004-12-12

Sequence annotations, functional and structural data on snake venom neurotoxins (svNTXs) are scattered across multiple databases and literature sources. Sequence annotations and structural data are available in the public molecular databases, while functional data are almost exclusively available in the published articles. There is a need for a specialized svNTXs database that contains NTX entries, which are organized, well annotated and classified in a systematic manner. We have systematically analyzed svNTXs and classified them using structure-function groups based on their structural, functional and phylogenetic properties. Using conserved motifs in each phylogenetic group, we built an intelligent module for the prediction of structural and functional properties of unknown NTXs. We also developed an annotation tool to aid the functional prediction of newly identified NTXs as an additional resource for the venom research community. We created a searchable online database of NTX proteins sequences (http://research.i2r.a-star.edu.sg/Templar/DB/snake_neurotoxin). This database can also be found under Swiss-Prot Toxin Annotation Project website (http://www.expasy.org/sprot/).
Evolutionary tools for phytosanitary risk analysis: phylogenetic signal as a predictor of host range of plant pests and pathogens

PubMed Central

Gilbert, Gregory S; Magarey, Roger; Suiter, Karl; Webb, Campbell O

2012-01-01

Assessing risk from a novel pest or pathogen requires knowing which local plant species are susceptible. Empirical data on the local host range of novel pests are usually lacking, but we know that some pests are more likely to attack closely related plant species than species separated by greater evolutionary distance. We use the Global Pest and Disease Database, an internal database maintained by the United States Department of Agriculture Animal and Plant Health Inspection Service – Plant Protection and Quarantine Division (USDA APHIS-PPQ), to evaluate the strength of the phylogenetic signal in host range for nine major groups of plant pests and pathogens. Eight of nine groups showed significant phylogenetic signal in host range. Additionally, pests and pathogens with more known hosts attacked a phylogenetically broader range of hosts. This suggests that easily obtained data – the number of known hosts and the phylogenetic distance between known hosts and other species of interest – can be used to predict which plant species are likely to be susceptible to a particular pest. This can facilitate rapid assessment of risk from novel pests and pathogens when empirical host range data are not yet available and guide efficient collection of empirical data for risk evaluation. PMID:23346231
Evolutionary tools for phytosanitary risk analysis: phylogenetic signal as a predictor of host range of plant pests and pathogens.

PubMed

Gilbert, Gregory S; Magarey, Roger; Suiter, Karl; Webb, Campbell O

2012-12-01

Assessing risk from a novel pest or pathogen requires knowing which local plant species are susceptible. Empirical data on the local host range of novel pests are usually lacking, but we know that some pests are more likely to attack closely related plant species than species separated by greater evolutionary distance. We use the Global Pest and Disease Database, an internal database maintained by the United States Department of Agriculture Animal and Plant Health Inspection Service - Plant Protection and Quarantine Division (USDA APHIS-PPQ), to evaluate the strength of the phylogenetic signal in host range for nine major groups of plant pests and pathogens. Eight of nine groups showed significant phylogenetic signal in host range. Additionally, pests and pathogens with more known hosts attacked a phylogenetically broader range of hosts. This suggests that easily obtained data - the number of known hosts and the phylogenetic distance between known hosts and other species of interest - can be used to predict which plant species are likely to be susceptible to a particular pest. This can facilitate rapid assessment of risk from novel pests and pathogens when empirical host range data are not yet available and guide efficient collection of empirical data for risk evaluation.
The COG database: new developments in phylogenetic classification of proteins from complete genomes

PubMed Central

Tatusov, Roman L.; Natale, Darren A.; Garkavtsev, Igor V.; Tatusova, Tatiana A.; Shankavaram, Uma T.; Rao, Bachoti S.; Kiryutin, Boris; Galperin, Michael Y.; Fedorova, Natalie D.; Koonin, Eugene V.

2001-01-01

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih.gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis. PMID:11125040
Bridging meta-analysis and the comparative method: a test of seed size effect on germination after frugivores' gut passage.

PubMed

Verdú, Miguel; Traveset, Anna

2004-02-01

Most studies using meta-analysis try to establish relationships between traits across taxa from interspecific databases and, thus, the phylogenetic relatedness among these taxa should be taken into account to avoid pseudoreplication derived from common ancestry. This paper illustrates, with a representative example of the relationship between seed size and the effect of frugivore's gut on seed germination, that meta-analytic procedures can also be phylogenetically corrected by means of the comparative method. The conclusions obtained in the meta-analytical and phylogenetical approaches are very different. The meta-analysis revealed that the positive effects that gut passage had on seed germination increased with seed size in the case of gut passage through birds whereas decreased in the case of gut passage through non-flying mammals. However, once the phylogenetic relatedness among plant species was taken into account, the effects of gut passage on seed germination did not depend on seed size and were similar between birds and non-flying mammals. Some methodological considerations are given to improve the bridge between the meta-analysis and the comparative method.
Genetic analysis of duck circovirus in Pekin ducks from South Korea.

PubMed

Cha, S-Y; Kang, M; Cho, J-G; Jang, H-K

2013-11-01

The genetic organization of the 24 duck circovirus (DuCV) strains detected in commercial Pekin ducks from South Korea between 2011 and 2012 is described in this study. Multiple sequence alignment and phylogenetic analyses were performed on the 24 viral genome sequences as well as on 45 genome sequences available from the GenBank database. Phylogenetic analyses based on the genomic and open reading frame 2/cap sequences demonstrated that all DuCV strains belonged to genotype 1 and were designated in a subcluster under genotype 1. Analysis of the capsid protein amino acid sequences of the 24 Korean DuCV strains showed 10 substitutions compared with that of other genotype 1 strains. Our analysis showed that genotype 1 is predominant and circulating in South Korea. These present results serve as incentive to add more data to the DuCV database and provide insight to conduct further intensive study on the geographic relationships among these virus strains.
Gramene database: navigating plant comparative genomics resources

USDA-ARS?s Scientific Manuscript database

Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...
VKCDB: voltage-gated K+ channel database updated and upgraded.

PubMed

Gallin, Warren J; Boutet, Patrick A

2011-01-01

The Voltage-gated K(+) Channel DataBase (VKCDB) (http://vkcdb.biology.ualberta.ca) makes a comprehensive set of sequence data readily available for phylogenetic and comparative analysis. The current update contains 2063 entries for full-length or nearly full-length unique channel sequences from Bacteria (477), Archaea (18) and Eukaryotes (1568), an increase from 346 solely eukaryotic entries in the original release. In addition to protein sequences for channels, corresponding nucleotide sequences of the open reading frames corresponding to the amino acid sequences are now available and can be extracted in parallel with sets of protein sequences. Channels are categorized into subfamilies by phylogenetic analysis and by using hidden Markov model analyses. Although the raw database contains a number of fragmentary, duplicated, obsolete and non-channel sequences that were collected in early steps of data collection, the web interface will only return entries that have been validated as likely K(+) channels. The retrieval function of the web interface allows retrieval of entries that contain a substantial fraction of the core structural elements of VKCs, fragmentary entries, or both. The full database can be downloaded as either a MySQL dump or as an XML dump from the web site. We have now implemented automated updates at quarterly intervals.
geophylobuilder 1.0: an arcgis extension for creating 'geophylogenies'.

PubMed

Kidd, David M; Liu, Xianhua

2008-01-01

Evolution is inherently a spatiotemporal process; however, despite this, phylogenetic and geographical data and models remain largely isolated from one another. Geographical information systems provide a ready-made spatial modelling, analysis and dissemination environment within which phylogenetic models can be explicitly linked with their associated spatial data and subsequently integrated with other georeferenced data sets describing the biotic and abiotic environment. geophylobuilder 1.0 is an extension for the arcgis geographical information system that builds a 'geophylogenetic' data model from a phylogenetic tree and associated geographical data. Geophylogenetic database objects can subsequently be queried, spatially analysed and visualized in both 2D and 3D within a geographical information systems. © 2007 The Authors.
The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

PubMed Central

Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

2016-01-01

Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

PubMed

Xu, Weijia; Ozer, Stuart; Gutell, Robin R

2009-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

PubMed Central

Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

2010-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
A curated database of cyanobacterial strains relevant for modern taxonomy and phylogenetic studies.

PubMed

Ramos, Vitor; Morais, João; Vasconcelos, Vitor M

2017-04-25

The dataset herein described lays the groundwork for an online database of relevant cyanobacterial strains, named CyanoType (http://lege.ciimar.up.pt/cyanotype). It is a database that includes categorized cyanobacterial strains useful for taxonomic, phylogenetic or genomic purposes, with associated information obtained by means of a literature-based curation. The dataset lists 371 strains and represents the first version of the database (CyanoType v.1). Information for each strain includes strain synonymy and/or co-identity, strain categorization, habitat, accession numbers for molecular data, taxonomy and nomenclature notes according to three different classification schemes, hierarchical automatic classification, phylogenetic placement according to a selection of relevant studies (including this), and important bibliographic references. The database will be updated periodically, namely by adding new strains meeting the criteria for inclusion and by revising and adding up-to-date metadata for strains already listed. A global 16S rDNA-based phylogeny is provided in order to assist users when choosing the appropriate strains for their studies.
A curated database of cyanobacterial strains relevant for modern taxonomy and phylogenetic studies

PubMed Central

Ramos, Vitor; Morais, João; Vasconcelos, Vitor M.

2017-01-01

The dataset herein described lays the groundwork for an online database of relevant cyanobacterial strains, named CyanoType (http://lege.ciimar.up.pt/cyanotype). It is a database that includes categorized cyanobacterial strains useful for taxonomic, phylogenetic or genomic purposes, with associated information obtained by means of a literature-based curation. The dataset lists 371 strains and represents the first version of the database (CyanoType v.1). Information for each strain includes strain synonymy and/or co-identity, strain categorization, habitat, accession numbers for molecular data, taxonomy and nomenclature notes according to three different classification schemes, hierarchical automatic classification, phylogenetic placement according to a selection of relevant studies (including this), and important bibliographic references. The database will be updated periodically, namely by adding new strains meeting the criteria for inclusion and by revising and adding up-to-date metadata for strains already listed. A global 16S rDNA-based phylogeny is provided in order to assist users when choosing the appropriate strains for their studies. PMID:28440791
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

PubMed

Liang, Ying; Liao, Bo; Zhu, Wen

2017-01-01

Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
The cytochrome P450 genes of channel catfish: their involvement in disease defense responses as revealed by meta-analysis of RNA-Seq datasets

USDA-ARS?s Scientific Manuscript database

Cytochrome P450s (CYPs) encode one of the most diverse enzyme superfamily in nature. They catalyze oxidative reactions of endogenous molecules and exogenous chemicals. Methods: We identifiedCYPs genes through in silico analysis using EST, RNA-Seq and genome databases of channel catfish.Phylogenetic ...

A Visual Interface for Querying Heterogeneous Phylogenetic Databases.

PubMed

Jamil, Hasan M

2017-01-01

Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria.

PubMed

Gaby, John Christian; Buckley, Daniel H

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm.
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria

PubMed Central

Gaby, John Christian; Buckley, Daniel H.

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm PMID:24501396
Classifying the bacterial gut microbiota of termites and cockroaches: A curated phylogenetic reference database (DictDb).

PubMed

Mikaelyan, Aram; Köhler, Tim; Lampert, Niclas; Rohland, Jeffrey; Boga, Hamadi; Meuser, Katja; Brune, Andreas

2015-10-01

Recent developments in sequencing technology have given rise to a large number of studies that assess bacterial diversity and community structure in termite and cockroach guts based on large amplicon libraries of 16S rRNA genes. Although these studies have revealed important ecological and evolutionary patterns in the gut microbiota, classification of the short sequence reads is limited by the taxonomic depth and resolution of the reference databases used in the respective studies. Here, we present a curated reference database for accurate taxonomic analysis of the bacterial gut microbiota of dictyopteran insects. The Dictyopteran gut microbiota reference Database (DictDb) is based on the Silva database but was significantly expanded by the addition of clones from 11 mostly unexplored termite and cockroach groups, which increased the inventory of bacterial sequences from dictyopteran guts by 26%. The taxonomic depth and resolution of DictDb was significantly improved by a general revision of the taxonomic guide tree for all important lineages, including a detailed phylogenetic analysis of the Treponema and Alistipes complexes, the Fibrobacteres, and the TG3 phylum. The performance of this first documented version of DictDb (v. 3.0) using the revised taxonomic guide tree in the classification of short-read libraries obtained from termites and cockroaches was highly superior to that of the current Silva and RDP databases. DictDb uses an informative nomenclature that is consistent with the literature also for clades of uncultured bacteria and provides an invaluable tool for anyone exploring the gut community structure of termites and cockroaches. Copyright © 2015 Elsevier GmbH. All rights reserved.
Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.

PubMed

O'Donnell, Kerry; Sutton, Deanna A; Rinaldi, Michael G; Sarver, Brice A J; Balajee, S Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C; Robert, Vincent A R G; Crous, Pedro W; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M

2010-10-01

Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the sequence results can be verified and isolates are made available for future study.
Biological pattern and transcriptomic exploration and phylogenetic analysis in the odd floral architecture tree: Helwingia willd.

PubMed

Sun, Cheng; Yu, Guoliang; Bao, Manzhu; Zheng, Bo; Ning, Guogui

2014-06-27

Odd traits in few of plant species usually implicate potential biology significances in plant evolutions. The genus Helwingia Willd, a dioecious medical shrub in Aquifoliales order, has an odd floral architecture-epiphyllous inflorescence. The potential significances and possible evolutionary origin of this specie are not well understood due to poorly available data of biological and genetic studies. In addition, the advent of genomics-based technologies has widely revolutionized plant species with unknown genomic information. Morphological and biological pattern were detailed via anatomical and pollination analyses. An RNA sequencing based transcriptomic analysis were undertaken and a high-resolution phylogenetic analysis was conducted based on single-copy genes in more than 80 species of seed plants, including H. japonica. It is verified that a potential fusion of rachis to the leaf midvein facilitates insect pollination. RNA sequencing yielded a total of 111450 unigenes; half of them had significant similarity with proteins in the public database, and 20281 unigenes were mapped to 119 pathways. Deduced from the phylogenetic analysis based on single-copy genes, the group of Helwingia is closer with Euasterids II and rather than Euasterids, congruent with previous reports using plastid sequences. The odd flower architecture make H. Willd adapt to insect pollination by hosting those insects larger than the flower in size via leave, which has little common character that other insect pollination plants hold. Further the present transcriptome greatly riches genomics information of Helwingia species and nucleus genes based phylogenetic analysis also greatly improve the resolution and robustness of phylogenetic reconstruction in H. japonica.
Easy-to-use phylogenetic analysis system for hepatitis B virus infection.

PubMed

Sugiyama, Masaya; Inui, Ayano; Shin-I, Tadasu; Komatsu, Haruki; Mukaide, Motokazu; Masaki, Naohiko; Murata, Kazumoto; Ito, Kiyoaki; Nakanishi, Makoto; Fujisawa, Tomoo; Mizokami, Masashi

2011-10-01

The molecular phylogenetic analysis has been broadly applied to clinical and virological study. However, the appropriate settings and application of calculation parameters are difficult for non-specialists of molecular genetics. In the present study, the phylogenetic analysis tool was developed for the easy determination of genotypes and transmission route. A total of 23 patients of 10 families infected with hepatitis B virus (HBV) were enrolled and expected to undergo intrafamilial transmission. The extracted HBV DNA were amplified and sequenced in a region of the S gene. The software to automatically classify query sequence was constructed and installed on the Hepatitis Virus Database (HVDB). Reference sequences were retrieved from HVDB, which contained major genotypes from A to H. Multiple-alignments using CLUSTAL W were performed before the genetic distance matrix was calculated with the six-parameter method. The phylogenetic tree was output by the neighbor-joining method. User interface using WWW-browser was also developed for intuitive control. This system was named as the easy-to-use phylogenetic analysis system (E-PAS). Twenty-three sera of 10 families were analyzed to evaluate E-PAS. The queries obtained from nine families were genotype C and were located in one cluster per family. However, one patient of a family was classified into the cluster different from her family, suggesting that E-PAS detected the sample distinct from that of her family on the transmission route. The E-PAS to output phylogenetic tree was developed since requisite material was sequence data only. E-PAS could expand to determine HBV genotypes as well as transmission routes. © 2011 The Japan Society of Hepatology.
PHYLOGENETIC AFFILIATION OF WATER DISTRIBUTION SYSTEM BACTERIAL ISOLATES USING 16S RDNA SEQUENCE ANALYSIS

EPA Science Inventory

In a previously described study, only 15% of the bacterial strains isolated from a water distribution system (WDS) grown on R2A agar were identifiable using fatty acid methyl esthers (FAME) profiling. The lack of success was attributed to the use of fatty acid databases of bacter...
Source environment feature related phylogenetic distribution pattern of anoxygenic photosynthetic bacteria as revealed by pufM analysis.

PubMed

Zeng, Yonghui; Jiao, Nianzhi

2007-06-01

Anoxygenic photosynthesis, performed primarily by anoxygenic photosynthetic bacteria (APB), has been supposed to arise on Earth more than 3 billion years ago. The long established APB are distributed in almost every corner where light can reach. However, the relationship between APB phylogeny and source environments has been largely unexplored. Here we retrieved the pufM sequences and related source information of 89 pufM containing species from the public database. Phylogenetic analysis revealed that horizontal gene transfer (HGT) most likely occurred within 11 out of a total 21 pufM subgroups, not only among species within the same class but also among species of different phyla or subphyla. A clear source environment feature related phylogenetic distribution pattern was observed, with all species from oxic habitats and those from anoxic habitats clustering into independent subgroups, respectively. HGT among ancient APB and subsequent long term evolution and adaptation to separated niches may have contributed to the coupling of environment and pufM phylogeny.
The Gypsy Database (GyDB) of mobile genetic elements: release 2.0

PubMed Central

Llorens, Carlos; Futami, Ricardo; Covelli, Laura; Domínguez-Escribá, Laura; Viu, Jose M.; Tamarit, Daniel; Aguilar-Rodríguez, Jose; Vicente-Ripolles, Miguel; Fuster, Gonzalo; Bernet, Guillermo P.; Maumus, Florian; Munoz-Pomer, Alfonso; Sempere, Jose M.; Latorre, Amparo; Moya, Andres

2011-01-01

This article introduces the second release of the Gypsy Database of Mobile Genetic Elements (GyDB 2.0): a research project devoted to the evolutionary dynamics of viruses and transposable elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is continuously progressing, and that owing to the high molecular diversity of mobile elements requires to be completed in several stages. GyDB 2.0 has been powered with a wiki to allow other researchers participate in the project. The current database stage and scope are long terminal repeats (LTR) retroelements and relatives. GyDB 2.0 is an update based on the analysis of Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants. Among other features, in terms of the aforementioned topics, this update adds: (i) a variety of descriptions and reviews distributed in multiple web pages; (ii) protein-based phylogenies, where phylogenetic levels are assigned to distinct classified elements; (iii) a collection of multiple alignments, lineage-specific hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq databases and BLAST and HMM servers to facilitate sequence characterization of new LTR retroelement and caulimovirus queries; and (v) a bibliographic server. GyDB 2.0 is available at http://gydb.org. PMID:21036865
The Gypsy Database (GyDB) of mobile genetic elements: release 2.0.

PubMed

Llorens, Carlos; Futami, Ricardo; Covelli, Laura; Domínguez-Escribá, Laura; Viu, Jose M; Tamarit, Daniel; Aguilar-Rodríguez, Jose; Vicente-Ripolles, Miguel; Fuster, Gonzalo; Bernet, Guillermo P; Maumus, Florian; Munoz-Pomer, Alfonso; Sempere, Jose M; Latorre, Amparo; Moya, Andres

2011-01-01

This article introduces the second release of the Gypsy Database of Mobile Genetic Elements (GyDB 2.0): a research project devoted to the evolutionary dynamics of viruses and transposable elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is continuously progressing, and that owing to the high molecular diversity of mobile elements requires to be completed in several stages. GyDB 2.0 has been powered with a wiki to allow other researchers participate in the project. The current database stage and scope are long terminal repeats (LTR) retroelements and relatives. GyDB 2.0 is an update based on the analysis of Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants. Among other features, in terms of the aforementioned topics, this update adds: (i) a variety of descriptions and reviews distributed in multiple web pages; (ii) protein-based phylogenies, where phylogenetic levels are assigned to distinct classified elements; (iii) a collection of multiple alignments, lineage-specific hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq databases and BLAST and HMM servers to facilitate sequence characterization of new LTR retroelement and caulimovirus queries; and (v) a bibliographic server. GyDB 2.0 is available at http://gydb.org.
STBase: one million species trees for comparative biology.

PubMed

McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J

2015-01-01

Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information

PubMed Central

Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.

2010-01-01

The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719
Hepatitis C infection among intravenous drug users attending therapy programs in Cyprus.

PubMed

Demetriou, Victoria L; van de Vijver, David A M C; Hezka, Johana; Kostrikis, Leondios G; Kostrikis, Leondios G

2010-02-01

The most high-risk population for HCV transmission worldwide today are intravenous drug users. HCV genotypes in the general population in Cyprus demonstrate a polyphyletic infection and include subtypes associated with intravenous drug users. The prevalence of HCV, HBV, and HIV infection, HCV genotypes and risk factors among intravenous drug users in Cyprus were investigated here for the first time. Blood samples and interviews were obtained from 40 consenting users in treatment centers, and were tested for HCV, HBV, and HIV antibodies. On the HCV-positive samples, viral RNA extraction, RT-PCR and sequencing were performed. Phylogenetic analysis determined subtype and any relationships with database sequences and statistical analysis determined any correlation of risk factors with HCV infection. The prevalence of HCV infection was 50%, but no HBV or HIV infections were found. Of the PCR-positive samples, eight (57%) were genotype 3a, and six (43%) were 1b. No other subtypes, recombinant strains or mixed infections were observed. The phylogenetic analysis of the injecting drug users' strains against database sequences observed no clustering, which does not allow determination of transmission route, possibly due to a limitation of sequences in the database. However, three clusters were discovered among the drug users' sequences, revealing small groups who possibly share injecting equipment. Statistical analysis showed the risk factor associated with HCV infection is drug use duration. Overall, the polyphyletic nature of HCV infection in Cyprus is confirmed, but the transmission route remains unknown. These findings highlight the need for harm-reduction strategies to reduce HCV transmission. (c) 2009 Wiley-Liss, Inc.
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

PubMed Central

Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

2006-01-01

We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
Genetic and phylogenetic analysis of multi-continent human influenza A(H1N2) reassortant viruses isolated in 2001 through 2003.

PubMed

Chen, M-J; La, T; Zhao, P; Tam, J S; Rappaport, R; Cheng, S-M

2006-12-01

Genetic analyses were performed on 228 influenza A(H1) viruses derived from clinical subjects participating in an experimental vaccine trial conducted in 20 countries on four continents between 2001 and 2003. HA1 phylogenetic analysis of these viruses showed multiple clades circulated around the world with regional prevalence patterns. Sixty-five of the A(H1) viruses were identified as A(H1N2), 40 of which were isolated from South Africa. The A(H1) sequences of these viruses cluster with published H1N2 viruses phylogenetically and share with them diagnostic signature V169A and A193T changes. The results also showed for the first time that H1N2 viruses were prominent in South Africa during the 2001-2002 influenza season, accounting for over 90% of the A(H1) cases in our study, and infecting both children (29/31) and the elderly (11/13). Phylogenetic analysis of the 65 H1N2 viruses we identified, in conjunction with the 56 recent H1N2 viruses currently available in the database, provided a comprehensive view of the circulation and evolution of distinct clades of H1N2 viruses in a temporal manner between early 2001 and mid-2003, shortly after the appearance of these recent reassortant viruses in or near year 2000.
PhyloExplorer: a web server to validate, explore and query phylogenetic trees

PubMed Central

Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

2009-01-01

Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253
An automated genotyping tool for enteroviruses and noroviruses.

PubMed

Kroneman, A; Vennema, H; Deforche, K; v d Avoort, H; Peñaranda, S; Oberste, M S; Vinjé, J; Koopmans, M

2011-06-01

Molecular techniques are established as routine in virological laboratories and virus typing through (partial) sequence analysis is increasingly common. Quality assurance for the use of typing data requires harmonization of genotype nomenclature, and agreement on target genes, depending on the level of resolution required, and robustness of methods. To develop and validate web-based open-access typing-tools for enteroviruses and noroviruses. An automated web-based typing algorithm was developed, starting with BLAST analysis of the query sequence against a reference set of sequences from viruses in the family Picornaviridae or Caliciviridae. The second step is phylogenetic analysis of the query sequence and a sub-set of the reference sequences, to assign the enterovirus type or norovirus genotype and/or variant, with profile alignment, construction of phylogenetic trees and bootstrap validation. Typing is performed on VP1 sequences of Human enterovirus A to D, and ORF1 and ORF2 sequences of genogroup I and II noroviruses. For validation, we used the tools to automatically type sequences in the RIVM and CDC enterovirus databases and the FBVE norovirus database. Using the typing-tools, 785(99%) of 795 Enterovirus VP1 sequences, and 8154(98.5%) of 8342 norovirus sequences were typed in accordance with previously used methods. Subtyping into variants was achieved for 4439(78.4%) of 5838 NoV GII.4 sequences. The online typing-tools reliably assign genotypes for enteroviruses and noroviruses. The use of phylogenetic methods makes these tools robust to ongoing evolution. This should facilitate standardized genotyping and nomenclature in clinical and public health laboratories, thus supporting inter-laboratory comparisons. Copyright © 2011 Elsevier B.V. All rights reserved.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data.

PubMed

da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos

2013-12-01

The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data

PubMed Central

da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos

2013-01-01

The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree. PMID:24385862

GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

PubMed

Issac, Biju; Raghava, G P S

2002-09-01

Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Phylogenetic Network Analysis Revealed the Occurrence of Horizontal Gene Transfer of 16S rRNA in the Genus Enterobacter

PubMed Central

Sato, Mitsuharu; Miyazaki, Kentaro

2017-01-01

Horizontal gene transfer (HGT) is a ubiquitous genetic event in bacterial evolution, but it seldom occurs for genes involved in highly complex supramolecules (or biosystems), which consist of many gene products. The ribosome is one such supramolecule, but several bacteria harbor dissimilar and/or chimeric 16S rRNAs in their genomes, suggesting the occurrence of HGT of this gene. However, we know little about whether the genes actually experience HGT and, if so, the frequency of such a transfer. This is primarily because the methods currently employed for phylogenetic analysis (e.g., neighbor-joining, maximum likelihood, and maximum parsimony) of 16S rRNA genes assume point mutation-driven tree-shape evolution as an evolutionary model, which is intrinsically inappropriate to decipher the evolutionary history for genes driven by recombination. To address this issue, we applied a phylogenetic network analysis, which has been used previously for detection of genetic recombination in homologous alleles, to the 16S rRNA gene. We focused on the genus Enterobacter, whose phylogenetic relationships inferred by multi-locus sequence alignment analysis and 16S rRNA sequences are incompatible. All 10 complete genomic sequences were retrieved from the NCBI database, in which 71 16S rRNA genes were included. Neighbor-joining analysis demonstrated that the genes residing in the same genomes clustered, indicating the occurrence of intragenomic recombination. However, as suggested by the low bootstrap values, evolutionary relationships between the clusters were uncertain. We then applied phylogenetic network analysis to representative sequences from each cluster. We found three ancestral 16S rRNA groups; the others were likely created through recursive recombination between the ancestors and chimeric descendants. Despite the large sequence changes caused by the recombination events, the RNA secondary structures were conserved. Successive intergenomic and intragenomic recombination thus shaped the evolution of 16S rRNA genes in the genus Enterobacter. PMID:29180992
Defining objective clusters for rabies virus sequences using affinity propagation clustering

PubMed Central

Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

2018-01-01

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361
Saprolegniaceae identified on amphibian eggs throughout the Pacific Northwest, USA, by internal transcribed spacer sequences and phylogenetic analysis

Treesearch

Jill E. Petrisko; Christopher A. Pearl; David S. Pilliod; Peter P. Sheridan; Charles F. Williams; Charles R. Peterson; R. Bruce Bury

2008-01-01

We assessed the diversity and phylogeny of Saprolegniaceae on amphibian eggs from the Pacific Northwest, with particular focus on Saprolegnia ferax, a species implicated in high egg mortality. We identified isolates from eggs of six amphibians with the internal transcribed spacer (ITS) and 5.8S gene regions and BLAST of the GenBank database. We...
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

PubMed

Odronitz, Florian; Kollmar, Martin

2006-11-29

Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.
Spatial phylogenetics of the vascular flora of Chile.

PubMed

Scherson, Rosa A; Thornhill, Andrew H; Urbina-Casanova, Rafael; Freyman, William A; Pliscoff, Patricio A; Mishler, Brent D

2017-07-01

Current geographic patterns of biodiversity are a consequence of the evolutionary history of the lineages that comprise them. This study was aimed at exploring how evolutionary features of the vascular flora of Chile are distributed across the landscape. Using a phylogeny at the genus level for 87% of the Chilean vascular flora, and a geographic database of sample localities, we calculated phylogenetic diversity (PD), phylogenetic endemism (PE), relative PD (RPD), and relative PE (RPE). Categorical Analyses of Neo- and Paleo-Endemism (CANAPE) were also performed, using a spatial randomization to assess statistical significance. A cluster analysis using range-weighted phylogenetic turnover was used to compare among grid cells, and with known Chilean bioclimates. PD patterns were concordant with known centers of high taxon richness and the Chilean biodiversity hotspot. In addition, several other interesting areas of concentration of evolutionary history were revealed as potential conservation targets. The south of the country shows areas of significantly high RPD and a concentration of paleo-endemism, and the north shows areas of significantly low PD and RPD, and a concentration of neo-endemism. Range-weighted phylogenetic turnover shows high congruence with the main macrobioclimates of Chile. Even though the study was done at the genus level, the outcome provides an accurate outline of phylogenetic patterns that can be filled in as more fine-scaled information becomes available. Copyright © 2017 Elsevier Inc. All rights reserved.
STBase: One Million Species Trees for Comparative Biology

PubMed Central

McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.

2015-01-01

Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees. PMID:25679219
[Phylogenetic analysis of genomes of Vibrio cholerae strains isolated on the territory of Rostov region].

PubMed

Kuleshov, K V; Markelov, M L; Dedkov, V G; Vodop'ianov, A S; Kermanov, A V; Pisanov, R V; Kruglikov, V D; Mazrukho, A B; Maleev, V V; Shipulin, G A

2013-01-01

Determination of origin of 2 Vibrio cholerae strains isolated on the territory of Rostov region by using full genome sequencing data. Toxigenic strain 2011 EL- 301 V. cholerae 01 El Tor Inaba No. 301 (ctxAB+, tcpA+) and nontoxigenic strain V. cholerae O1 Ogawa P- 18785 (ctxAB-, tcpA+) were studied. Sequencing was carried out on the MiSeq platform. Phylogenetic analysis of the genomes obtained was carried out based on comparison of conservative part of the studied and 54 previously sequenced genomes. 2011EL-301 strain genome was presented by 164 contigs with an average coverage of 100, N50 parameter was 132 kb, for strain P- 18785 - 159 contigs with a coverage of69, N50 - 83 kb. The contigs obtained for strain 2011 EL-301 were deposited in DDBJ/EMBL/GenBank databases with access code AJFN02000000, for strain P-18785 - ANHS00000000. 716 protein-coding orthologous genes were detected. Based on phylogenetic analysis strain P- 18785 belongs to PG-1 subgroup (a group of predecessor strains of the 7th pandemic). Strain 2011EL-301 belongs to groups of strains of the 7th pandemic and is included into the cluster with later isolates that are associated with cases of cholera in South Africa and cases of import of cholera to the USA from Pakistan. The data obtained allows to establish phylogenetic connections with V cholerae strains isolated earlier.
Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance.

PubMed

de Sousa Costa, Robherson Wector; da Silva, Giovanni Lucca França; de Carvalho Filho, Antonio Oseas; Silva, Aristófanes Corrêa; de Paiva, Anselmo Cardoso; Gattass, Marcelo

2018-05-23

Lung cancer presents the highest cause of death among patients around the world, in addition of being one of the smallest survival rates after diagnosis. Therefore, this study proposes a methodology for diagnosis of lung nodules in benign and malignant tumors based on image processing and pattern recognition techniques. Mean phylogenetic distance (MPD) and taxonomic diversity index (Δ) were used as texture descriptors. Finally, the genetic algorithm in conjunction with the support vector machine were applied to select the best training model. The proposed methodology was tested on computed tomography (CT) images from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), with the best sensitivity of 93.42%, specificity of 91.21%, accuracy of 91.81%, and area under the ROC curve of 0.94. The results demonstrate the promising performance of texture extraction techniques using mean phylogenetic distance and taxonomic diversity index combined with phylogenetic trees. Graphical Abstract Stages of the proposed methodology.
SAM: String-based sequence search algorithm for mitochondrial DNA database queries

PubMed Central

Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

2011-01-01

The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022
Identification of phylogenetic position in the Chlamydiaceae family for Chlamydia strains released from monkeys and humans with chlamydial pathology.

PubMed

Karaulov, Alexander; Aleshkin, Vladimir; Slobodenyuk, Vladimir; Grechishnikova, Olga; Afanasyev, Stanislav; Lapin, Boris; Dzhikidze, Eteri; Nesvizhsky, Yuriy; Evsegneeva, Irina; Voropayeva, Elena; Afanasyev, Maxim; Aleshkin, Andrei; Metelskaya, Valeria; Yegorova, Ekaterina; Bayrakova, Alexandra

2010-01-01

Based on the results of the comparative analysis concerning relatedness and evolutional difference of the 16S-23S nucleotide sequences of the middle ribosomal cluster and 23S rRNA I domain, and based on identification of phylogenetic position for Chlamydophila pneumoniae and Chlamydia trichomatis strains released from monkeys, relatedness of the above stated isolates with similar strains released from humans and with strains having nucleotide sequences presented in the GenBank electronic database has been detected for the first time ever. Position of these isolates in the Chlamydiaceae family phylogenetic tree has been identified. The evolutional position of the investigated original Chlamydia and Chlamydophila strains close to analogous strains from the Gen-Bank electronic database has been demonstrated. Differences in the 16S-23S nucleotide sequence of the middle ribosomal cluster and 23S rRNA I domain of plasmid and nonplasmid Chlamydia trachomatis strains released from humans and monkeys relative to different genotype groups (group B-B, Ba, D, Da, E, L1, L2, L2a; intermediate group-F, G, Ga) have been revealed for the first time ever. Abnormality in incA chromosomal gene expression resulting in Chlamydia life development cycle disorder, and decrease of Chlamydia virulence can be related to probable changes in the nucleotide sequence of the gene under consideration.
Taxonomic review of Argentine mackerel Scomber japonicus (Houttuyn, 1782) by phylogenetic analysis

PubMed Central

Trucco, María Inés; Buratti, Claudio César

2017-01-01

Taxonomically, Argentine mackerels were first considered as Scomber japonicus marplatensis and later as Scomber japonicus Houttuyn 1782, although, in the last years, different studies have suggested that South Atlantic mackerel species belongs to Scomber colias Gmelin 1789. These latter results, incorporated in the main fish databases (FishBase and Catalog of Fishes), promoted a phylogenetic study using cytochrome c oxidase I (COI) gene sequences taken from the Barcode of Life (FISH-BOL) database. Thus, 76 sequences of S. japonicus, S. colias, S. australasicus and S. scombrus from different regions were used; including 3 from Sarda sarda as outgroup. Among S. japonicus selected sequences are those corresponding to the Argentine mackerels collected in 2007. Phylogenetic trees were obtained by neighbor joining and maximum likelihood methods and a network of haplotypes was reconstructed to analyze the relationship between species. The results showed the clear differentiation of S. australasicus, S. scombrus and S. japonicus from the Pacific while S. japonicus from Argentina was included in the S. colias group, with genetic differences corresponding to conspecific populations (0.1%). Four of the five Argentine specimens shared the same haplotype with S. colias, and none were shared with S. japonicus from the Pacific. These results suggest that the current specific name of Argentine mackerel S. japonicus should be changed to S. colias, in agreement with several genetic studies carried out with species of the genus Scomber. PMID:29071283
A novel prophage identified in strains from Salmonella enterica serovar Enteritidis is a phylogenetic signature of the lineage ST-1974

PubMed Central

D'Alessandro, Bruno; Pérez Escanda, Victoria; Balestrazzi, Lucía; Iriarte, Andrés; Pickard, Derek; Yim, Lucía; Chabalgoity, José Alejandro; Betancor, Laura

2018-01-01

Salmonella enterica serovar Enteritidis is a major agent of foodborne diseases worldwide. In Uruguay, this serovar was almost negligible until the mid 1990s but since then it has become the most prevalent. Previously, we characterized a collection of strains isolated from 1988 to 2005 and found that the two oldest strains were the most genetically divergent. In order to further characterize these strains, we sequenced and annotated eight genomes including those of the two oldest isolates. We report on the identification and characterization of a novel 44 kbp Salmonella prophage found exclusively in these two genomes. Sequence analysis reveals that the prophage is a mosaic, with homologous regions in different Salmonella prophages. It contains 60 coding sequences, including two genes, gogB and sseK3, involved in virulence and modulation of host immune response. Analysis of serovar Enteritidis genomes available in public databases confirmed that this prophage is absent in most of them, with the exception of a group of 154 genomes. All 154 strains carrying this prophage belong to the same sequence type (ST-1974), suggesting that its acquisition occurred in a common ancestor. We tested this by phylogenetic analysis of 203 genomes representative of the intraserovar diversity. The ST-1974 forms a distinctive monophyletic lineage, and the newly described prophage is a phylogenetic signature of this lineage that could be used as a molecular marker. The phylogenetic analysis also shows that the major ST (ST-11) is polyphyletic and might have given rise to almost all other STs, including ST-1974. PMID:29509137
Phylogenetic Properties of RNA Viruses

PubMed Central

Pompei, Simone; Loreto, Vittorio; Tria, Francesca

2012-01-01

A new word, phylodynamics, was coined to emphasize the interconnection between phylogenetic properties, as observed for instance in a phylogenetic tree, and the epidemic dynamics of viruses, where selection, mediated by the host immune response, and transmission play a crucial role. The challenges faced when investigating the evolution of RNA viruses call for a virtuous loop of data collection, data analysis and modeling. This already resulted both in the collection of massive sequences databases and in the formulation of hypotheses on the main mechanisms driving qualitative differences observed in the (reconstructed) evolutionary patterns of different RNA viruses. Qualitatively, it has been observed that selection driven by the host immune response induces an uneven survival ability among co-existing strains. As a consequence, the imbalance level of the phylogenetic tree is manifestly more pronounced if compared to the case when the interaction with the host immune system does not play a central role in the evolutive dynamics. While many imbalance metrics have been introduced, reliable methods to discriminate in a quantitative way different level of imbalance are still lacking. In our work, we reconstruct and analyze the phylogenetic trees of six RNA viruses, with a special emphasis on the human Influenza A virus, due to its relevance for vaccine preparation as well as for the theoretical challenges it poses due to its peculiar evolutionary dynamics. We focus in particular on topological properties. We point out the limitation featured by standard imbalance metrics, and we introduce a new methodology with which we assign the correct imbalance level of the phylogenetic trees, in agreement with the phylodynamics of the viruses. Our thorough quantitative analysis allows for a deeper understanding of the evolutionary dynamics of the considered RNA viruses, which is crucial in order to provide a valuable framework for a quantitative assessment of theoretical predictions. PMID:23028645
Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes.

PubMed

Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D

2016-01-01

We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

PubMed Central

Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

2006-01-01

Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
Yeast species diversity in apple juice for cider production evidenced by culture-based method.

PubMed

Lorenzini, Marilinda; Simonato, Barbara; Zapparoli, Giacomo

2018-05-07

Identification of yeasts isolated from apple juices of two cider houses (one located in a plain area and one in an alpine area) was carried out by culture-based method. Wallerstein Laboratory Nutrient Agar was used as medium for isolation and preliminary yeasts identification. A total of 20 species of yeasts belonging to ten different genera were identified using both BLAST algorithm for pairwise sequence comparison and phylogenetic approaches. A wide variety of non-Saccharomyces species was found. Interestingly, Candida railenensis, Candida cylindracea, Hanseniaspora meyeri, Hanseniaspora pseudoguilliermondii, and Metschnikowia sinensis were recovered for the first time in the yeast community of an apple environment. Phylogenetic analysis revealed a better resolution in identifying Metschnikowia and Moesziomyces isolates than comparative analysis using the GenBank or YeastIP gene databases. This study provides important data on yeast microbiota of apple juice and evidenced differences between two geographical cider production areas in terms of species composition.
Pan-genome and phylogeny of Bacillus cereus sensu lato.

PubMed

Bazinet, Adam L

2017-08-02

Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.
The Impact of Clinical, Demographic and Risk Factors on Rates of HIV Transmission: A Population-based Phylogenetic Analysis in British Columbia, Canada

PubMed Central

Poon, Art F. Y.; Joy, Jeffrey B.; Woods, Conan K.; Shurgold, Susan; Colley, Guillaume; Brumme, Chanson J.; Hogg, Robert S.; Montaner, Julio S. G.; Harrigan, P. Richard

2015-01-01

Background. The diversification of human immunodeficiency virus (HIV) is shaped by its transmission history. We therefore used a population based province wide HIV drug resistance database in British Columbia (BC), Canada, to evaluate the impact of clinical, demographic, and behavioral factors on rates of HIV transmission. Methods. We reconstructed molecular phylogenies from 27 296 anonymized bulk HIV pol sequences representing 7747 individuals in BC—about half the estimated HIV prevalence in BC. Infections were grouped into clusters based on phylogenetic distances, as a proxy for variation in transmission rates. Rates of cluster expansion were reconstructed from estimated dates of HIV seroconversion. Results. Our criteria grouped 4431 individuals into 744 clusters largely separated with respect to risk factors, including large established clusters predominated by injection drug users and more-recently emerging clusters comprising men who have sex with men. The mean log10 viral load of an individual's phylogenetic neighborhood (composed of 5 other individuals with shortest phylogenetic distances) increased their odds of appearing in a cluster by >2-fold per log10 viruses per milliliter. Conclusions. Hotspots of ongoing HIV transmission can be characterized in near real time by the secondary analysis of HIV resistance genotypes, providing an important potential resource for targeting public health initiatives for HIV prevention. PMID:25312037
Molecular Characterization and Analysis of 16S Ribosomal DNA in Some Isolates of Demodex folicullorum

PubMed Central

DANESHPARVAR, Afrooz; MOWLAVI, Gholamreza; MIRJALALI, Hamed; HAJJARAN, Homa; MOBEDI, Iraj; NADDAF, Saeed Reza; SHIDFAR, Mohammadreza; SADAT MAKKI, Mahsa

2017-01-01

Background: Demodicosis is one of the most prevalent skin diseases resulting from infestation by Demodex mites. This parasite usually inhabits in follicular infundibulum or sebaceous duct and transmits through close contact with an infested host. Methods: This study was carried from September 2014 to January 2016 at Tehran University of Medical Sciences, Tehran, Iran. DNA extraction and amplification of 16S ribosomal RNA was performed on four isolates, already obtained from four different patients and identified morphologically though clearing with 10% Potassium hydroxide (KOH) and microscopical examination. Amplified fragments from the isolates were compared with GeneBank database and phylogenetic analysis was carried out using MEGA6 software. Results: A 390 bp fragment of 16S rDNA was obtained in all isolates and analysis of generated sequences showed high similarity with those submitted to GenBank, previously. Intra-species similarity and distance also showed 99.983% and 0.017, respectively, for the studied isolates. Multiple alignments of the isolates showed Single Nucleotide Polymorphisms (SNPs) in 16S rRNA fragment. Phylogenetic analysis revealed that all 4 isolates clustered with other D. folliculorum, recovered from GenBank database. Our accession numbers KF875587 and KF875589 showed more similarity together in comparison with two other studied isolates. Conclusion: Mitochondrial 16S rDNA is one of the most suitable molecular barcodes for identification D. folliculorum and this fragment can use for intra-species characterization of the most human-infected mites. PMID:28761482

Molecular Characterization and Analysis of 16S Ribosomal DNA in Some Isolates of Demodex folicullorum.

PubMed

Daneshparvar, Afrooz; Mowlavi, Gholamreza; Mirjalali, Hamed; Hajjaran, Homa; Mobedi, Iraj; Naddaf, Saeed Reza; Shidfar, Mohammadreza; Sadat Makki, Mahsa

2017-01-01

Demodicosis is one of the most prevalent skin diseases resulting from infestation by Demodex mites. This parasite usually inhabits in follicular infundibulum or sebaceous duct and transmits through close contact with an infested host. This study was carried from September 2014 to January 2016 at Tehran University of Medical Sciences, Tehran, Iran. DNA extraction and amplification of 16S ribosomal RNA was performed on four isolates, already obtained from four different patients and identified morphologically though clearing with 10% Potassium hydroxide (KOH) and microscopical examination. Amplified fragments from the isolates were compared with GeneBank database and phylogenetic analysis was carried out using MEGA6 software. A 390 bp fragment of 16S rDNA was obtained in all isolates and analysis of generated sequences showed high similarity with those submitted to GenBank, previously. Intra-species similarity and distance also showed 99.983% and 0.017, respectively, for the studied isolates. Multiple alignments of the isolates showed Single Nucleotide Polymorphisms (SNPs) in 16S rRNA fragment. Phylogenetic analysis revealed that all 4 isolates clustered with other D. folliculorum, recovered from GenBank database. Our accession numbers KF875587 and KF875589 showed more similarity together in comparison with two other studied isolates. Mitochondrial 16S rDNA is one of the most suitable molecular barcodes for identification D. folliculorum and this fragment can use for intra-species characterization of the most human-infected mites.
PCR-Internal Transcribed Spacer (ITS) genes sequencing and phylogenetic analysis of clinical and environmental Aspergillus species associated with HIV-TB co infected patients in a hospital in Abeokuta, southwestern Nigeria.

PubMed

Shittu, Olufunke Bolatito; Adelaja, Oluwabunmi Molade; Obuotor, Tolulope Mobolaji; Sam-Wobo, Sam Olufemi; Adenaike, Adeyemi Sunday

2016-03-01

Aspergillosis has been identified as one of the hospital acquired infections but the contribution of water and inhouse air as possible sources of Aspergillus infection in immunocompromised individuals like HIV-TB patients have not been studied in any hospital setting in Nigeria. To identify and investigate genetic relationship between clinical and environmental Aspergillus sp. associated with HIV-TB co infected patients. DNA extraction, purification, amplification and sequencing of Internal Transcribed Spacer (ITS) genes were performed using standard protocols. Similarity search using BLAST on NCBI was used for species identification and MEGA 5.0 was used for phylogenetic analysis. Analyses of sequenced ITS genes of selected fourteen (14) Aspergillus isolates identified in the GenBank database revealed Aspergillus niger (28.57%), A. tubingensis (7.14%), A. flavus (7.14%) and A. fumigatus (57.14%). Aspergillus in sputum of HIV patients were Aspergillus niger, A. fumigatus, A. tubingensis and A. flavus. Also, A. niger and A. fumigatus were identified from water and open-air. Phylogenetic analysis of sequences yielded genetic relatedness between clinical and environmental isolates. Water and air in health care settings in Nigeria are important sources of Aspergillus sp. for HIV-TB patients.
Verdant: automated annotation, alignment and phylogenetic analysis of whole chloroplast genomes.

PubMed

McKain, Michael R; Hartsock, Ryan H; Wohl, Molly M; Kellogg, Elizabeth A

2017-01-01

Chloroplast genomes are now produced in the hundreds for angiosperm phylogenetics projects, but current methods for annotation, alignment and tree estimation still require some manual intervention reducing throughput and increasing analysis time for large chloroplast systematics projects. Verdant is a web-based software suite and database built to take advantage a novel annotation program, annoBTD. Using annoBTD, Verdant provides accurate annotation of chloroplast genomes without manual intervention. Subsequent alignment and tree estimation can incorporate newly annotated and publically available plastomes and can accommodate a large number of taxa. Verdant sharply reduces the time required for analysis of assembled chloroplast genomes and removes the need for pipelines and software on personal hardware. Verdant is available at: http://verdant.iplantcollaborative.org/plastidDB/ It is implemented in PHP, Perl, MySQL, Javascript, HTML and CSS with all major browsers supported. mrmckain@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
An integrated genetic data environment (GDE)-based LINUX interface for analysis of HIV-1 and other microbial sequences.

PubMed

De Oliveira, T; Miller, R; Tarin, M; Cassol, S

2003-01-01

Sequence databases encode a wealth of information needed to develop improved vaccination and treatment strategies for the control of HIV and other important pathogens. To facilitate effective utilization of these datasets, we developed a user-friendly GDE-based LINUX interface that reduces input/output file formatting. GDE was adapted to the Linux operating system, bioinformatics tools were integrated with microbe-specific databases, and up-to-date GDE menus were developed for several clinically important viral, bacterial and parasitic genomes. Each microbial interface was designed for local access and contains Genbank, BLAST-formatted and phylogenetic databases. GDE-Linux is available for research purposes by direct application to the corresponding author. Application-specific menus and support files can be downloaded from (http://www.bioafrica.net).
Molecular characterization of chikungunya virus from Andhra Pradesh, India & phylogenetic relationship with Central African isolates.

PubMed

M Naresh Kumar, C V; Anthony Johnson, A M; R Sai Gopal, D V

2007-12-01

Chikungunya virus has caused numerous large outbreaks in India. Suspected blood samples from the epidemic were collected and characterized for the identification of the responsible causative from Rayalaseema region of Andhra Pradesh. RT-PCR was used for screening of suspected blood samples. Primers were designed to amplify partial E1 gene and the amplified fragment was cloned and sequenced. The sequence was analyzed and compared with other geographical isolates to find the phylogenetic relationship. The sequence was submitted to the Gen bank DNA database (accession DQ888620). Comparative nucleotide homology analysis of the AP Ra-CTR isolate with the other isolates revealed 94.7+/-3.6 per cent of homology of CHIKAPRa-CTR with other isolates of Chikungunya virus at nucleotide level and 96.8+/-3.2 per cent of homology at amino acid level. The current epidemic was caused by the Central African genotype of CHIKV, grouped in Central Africa cluster in phylogenetic trees generated based on nucleotide and amino acid sequences.
[Genome-wide identification and expression analysis of auxin-related gene families in grape].

PubMed

Yuan, Hua-zhao; Zhao, Mi-zhen; Wu, Wei-min; Yu, Hong-Mei; Qian, Ya-ming; Wang, Zhuang-wei; Wang, Xi-cheng

2015-07-01

The auxin response gene family adjusts the auxin balance and the growth hormone signaling pathways in plants. Using bioinformatics methods, the auxin-response genes from the grape genome database are identified and their chromosomal location, gene collinearity and phylogenetic analysis are performed. Probable genes include 25 AUX_IAA, 19 ARF, 9 GH3 and 42 LBD genes, which are unevenly distributed on all 19 chromosomes and some of them formed distinct tandem duplicate gene clusters. The available grape microarray databases show that all of the auxin-response genes are expressed in fruit and leaf buds, and significant overexpressed during fruit color-changing, bud break and bud dormancy periods. This paper provides a resource for functional studies of auxin-response genes in grape leaf and fruit development.
Taxonomic evaluation of selected Ganoderma species and database sequence validation

PubMed Central

Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

2017-01-01

Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785
Phylogenetic evidence for cladogenetic polyploidization in land plants.

PubMed

Zhan, Shing H; Drori, Michal; Goldberg, Emma E; Otto, Sarah P; Mayrose, Itay

2016-07-01

Polyploidization is a common and recurring phenomenon in plants and is often thought to be a mechanism of "instant speciation". Whether polyploidization is associated with the formation of new species (cladogenesis) or simply occurs over time within a lineage (anagenesis), however, has never been assessed systematically. We tested this hypothesis using phylogenetic and karyotypic information from 235 plant genera (mostly angiosperms). We first constructed a large database of combined sequence and chromosome number data sets using an automated procedure. We then applied likelihood models (ClaSSE) that estimate the degree of synchronization between polyploidization and speciation events in maximum likelihood and Bayesian frameworks. Our maximum likelihood analysis indicated that 35 genera supported a model that includes cladogenetic transitions over a model with only anagenetic transitions, whereas three genera supported a model that incorporates anagenetic transitions over one with only cladogenetic transitions. Furthermore, the Bayesian analysis supported a preponderance of cladogenetic change in four genera but did not support a preponderance of anagenetic change in any genus. Overall, these phylogenetic analyses provide the first broad confirmation that polyploidization is temporally associated with speciation events, suggesting that it is indeed a major speciation mechanism in plants, at least in some genera. © 2016 Botanical Society of America.
Marine Bacillus spp. associated with the egg capsule of Concholepas concholepas (common name "loco") have an inhibitory activity toward the pathogen Vibrio parahaemolyticus.

PubMed

Leyton, Yanett; Riquelme, Carlos

2010-10-01

The pandemic bacterium Vibrio parahaemolyticus, isolated from seawater, sediment, and marine organisms, is responsible for gastroenteric illnesses in humans and also cause diseases in aquaculture industry in Chile and other countries around the world. In this study, bacterial flora with inhibitory activity against pathogenic V. parahaemolyticus were collected from egg capsules of Concholepas concholepas and evaluated. The 16S rRNA fragment was sequenced from each isolated strain to determine its identity using the GenBank database. A phylogenetic analysis was made, and tests for the productions of antibacterial substance were performed using the double-layer method. Forty-five morphotypes of bacterial colonies were isolated, 8 of which presented an inhibitory effect on the growth of V. parahaemolyticus. 16S rRNA sequence and phylogenetic analysis show that these strains constitute taxa that are phylogenetically related to the Bacillus genus and are probably sister species or strains of the species Bacillus pumilus, Bacillus licheniform, or Bacillus sp. It is important to determine the nature of the antibacterial substance to evaluate their potential for use against the pathogen species V. parahaemolyticus.
Evaluation of a Phylogenetic Marker Based on Genomic Segment B of Infectious Bursal Disease Virus: Facilitating a Feasible Incorporation of this Segment to the Molecular Epidemiology Studies for this Viral Agent.

PubMed

Alfonso-Morales, Abdulahi; Rios, Liliam; Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L; Bertran, Kateri; Frías, Maria T; Ganges, Llilianne; Díaz de Arce, Heidy; Majó, Natàlia; Núñez, José I; Pérez, Lester J

2015-01-01

Infectious bursal disease (IBD) is a highly contagious and acute viral disease, which has caused high mortality rates in birds and considerable economic losses in different parts of the world for more than two decades and it still represents a considerable threat to poultry. The current study was designed to rigorously measure the reliability of a phylogenetic marker included into segment B. This marker can facilitate molecular epidemiology studies, incorporating this segment of the viral genome, to better explain the links between emergence, spreading and maintenance of the very virulent IBD virus (vvIBDV) strains worldwide. Sequences of the segment B gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank Database; Cuban sequences were obtained in the current work. A phylogenetic marker named B-marker was assessed by different phylogenetic principles such as saturation of substitution, phylogenetic noise and high consistency. This last parameter is based on the ability of B-marker to reconstruct the same topology as the complete segment B of the viral genome. From the results obtained from B-marker, demographic history for both main lineages of IBDV regarding segment B was performed by Bayesian skyline plot analysis. Phylogenetic analysis for both segments of IBDV genome was also performed, revealing the presence of a natural reassortant strain with segment A from vvIBDV strains and segment B from non-vvIBDV strains within Cuban IBDV population. This study contributes to a better understanding of the emergence of vvIBDV strains, describing molecular epidemiology of IBDV using the state-of-the-art methodology concerning phylogenetic reconstruction. This study also revealed the presence of a novel natural reassorted strain as possible manifest of change in the genetic structure and stability of the vvIBDV strains. Therefore, it highlights the need to obtain information about both genome segments of IBDV for molecular epidemiology studies.
MycoDB, a global database of plant response to mycorrhizal fungi.

PubMed

Chaudhary, V Bala; Rúa, Megan A; Antoninka, Anita; Bever, James D; Cannon, Jeffery; Craig, Ashley; Duchicela, Jessica; Frame, Alicia; Gardes, Monique; Gehring, Catherine; Ha, Michelle; Hart, Miranda; Hopkins, Jacob; Ji, Baoming; Johnson, Nancy Collins; Kaonongbua, Wittaya; Karst, Justine; Koide, Roger T; Lamit, Louis J; Meadow, James; Milligan, Brook G; Moore, John C; Pendergast, Thomas H; Piculell, Bridget; Ramsby, Blake; Simard, Suzanne; Shrestha, Shubha; Umbanhowar, James; Viechtbauer, Wolfgang; Walters, Lawrence; Wilson, Gail W T; Zee, Peter C; Hoeksema, Jason D

2016-05-10

Plants form belowground associations with mycorrhizal fungi in one of the most common symbioses on Earth. However, few large-scale generalizations exist for the structure and function of mycorrhizal symbioses, as the nature of this relationship varies from mutualistic to parasitic and is largely context-dependent. We announce the public release of MycoDB, a database of 4,010 studies (from 438 unique publications) to aid in multi-factor meta-analyses elucidating the ecological and evolutionary context in which mycorrhizal fungi alter plant productivity. Over 10 years with nearly 80 collaborators, we compiled data on the response of plant biomass to mycorrhizal fungal inoculation, including meta-analysis metrics and 24 additional explanatory variables that describe the biotic and abiotic context of each study. We also include phylogenetic trees for all plants and fungi in the database. To our knowledge, MycoDB is the largest ecological meta-analysis database. We aim to share these data to highlight significant gaps in mycorrhizal research and encourage synthesis to explore the ecological and evolutionary generalities that govern mycorrhizal functioning in ecosystems.
BGD: a database of bat genomes.

PubMed

Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

2015-01-01

Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.
MycoDB, a global database of plant response to mycorrhizal fungi

PubMed Central

Chaudhary, V. Bala; Rúa, Megan A.; Antoninka, Anita; Bever, James D.; Cannon, Jeffery; Craig, Ashley; Duchicela, Jessica; Frame, Alicia; Gardes, Monique; Gehring, Catherine; Ha, Michelle; Hart, Miranda; Hopkins, Jacob; Ji, Baoming; Johnson, Nancy Collins; Kaonongbua, Wittaya; Karst, Justine; Koide, Roger T.; Lamit, Louis J.; Meadow, James; Milligan, Brook G.; Moore, John C.; Pendergast IV, Thomas H.; Piculell, Bridget; Ramsby, Blake; Simard, Suzanne; Shrestha, Shubha; Umbanhowar, James; Viechtbauer, Wolfgang; Walters, Lawrence; Wilson, Gail W. T.; Zee, Peter C.; Hoeksema, Jason D.

2016-01-01

Plants form belowground associations with mycorrhizal fungi in one of the most common symbioses on Earth. However, few large-scale generalizations exist for the structure and function of mycorrhizal symbioses, as the nature of this relationship varies from mutualistic to parasitic and is largely context-dependent. We announce the public release of MycoDB, a database of 4,010 studies (from 438 unique publications) to aid in multi-factor meta-analyses elucidating the ecological and evolutionary context in which mycorrhizal fungi alter plant productivity. Over 10 years with nearly 80 collaborators, we compiled data on the response of plant biomass to mycorrhizal fungal inoculation, including meta-analysis metrics and 24 additional explanatory variables that describe the biotic and abiotic context of each study. We also include phylogenetic trees for all plants and fungi in the database. To our knowledge, MycoDB is the largest ecological meta-analysis database. We aim to share these data to highlight significant gaps in mycorrhizal research and encourage synthesis to explore the ecological and evolutionary generalities that govern mycorrhizal functioning in ecosystems. PMID:27163938
MycoDB, a global database of plant response to mycorrhizal fungi

NASA Astrophysics Data System (ADS)

Chaudhary, V. Bala; Rúa, Megan A.; Antoninka, Anita; Bever, James D.; Cannon, Jeffery; Craig, Ashley; Duchicela, Jessica; Frame, Alicia; Gardes, Monique; Gehring, Catherine; Ha, Michelle; Hart, Miranda; Hopkins, Jacob; Ji, Baoming; Johnson, Nancy Collins; Kaonongbua, Wittaya; Karst, Justine; Koide, Roger T.; Lamit, Louis J.; Meadow, James; Milligan, Brook G.; Moore, John C.; Pendergast, Thomas H., IV; Piculell, Bridget; Ramsby, Blake; Simard, Suzanne; Shrestha, Shubha; Umbanhowar, James; Viechtbauer, Wolfgang; Walters, Lawrence; Wilson, Gail W. T.; Zee, Peter C.; Hoeksema, Jason D.

2016-05-01

Plants form belowground associations with mycorrhizal fungi in one of the most common symbioses on Earth. However, few large-scale generalizations exist for the structure and function of mycorrhizal symbioses, as the nature of this relationship varies from mutualistic to parasitic and is largely context-dependent. We announce the public release of MycoDB, a database of 4,010 studies (from 438 unique publications) to aid in multi-factor meta-analyses elucidating the ecological and evolutionary context in which mycorrhizal fungi alter plant productivity. Over 10 years with nearly 80 collaborators, we compiled data on the response of plant biomass to mycorrhizal fungal inoculation, including meta-analysis metrics and 24 additional explanatory variables that describe the biotic and abiotic context of each study. We also include phylogenetic trees for all plants and fungi in the database. To our knowledge, MycoDB is the largest ecological meta-analysis database. We aim to share these data to highlight significant gaps in mycorrhizal research and encourage synthesis to explore the ecological and evolutionary generalities that govern mycorrhizal functioning in ecosystems.
Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

PubMed

Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

2009-06-01

The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
Comprehensive analysis of orthologous protein domains using the HOPS database.

PubMed

Storm, Christian E V; Sonnhammer, Erik L L

2003-10-01

One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

PubMed Central

Odronitz, Florian; Kollmar, Martin

2006-01-01

Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497
dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.

PubMed

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

2016-01-01

Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf.
dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock

PubMed Central

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta

2016-01-01

Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf. PMID:26727469
In silico analysis of protein toxin and bacteriocins from Lactobacillus paracasei SD1 genome and available online databases

PubMed Central

Surachat, Komwit; Sangket, Unitsa; Deachamag, Panchalika; Chotigeat, Wilaiwan

2017-01-01

Lactobacillus paracasei SD1 is a potential probiotic strain due to its ability to survive several conditions in human dental cavities. To ascertain its safety for human use, we therefore performed a comprehensive bioinformatics analysis and characterization of the bacterial protein toxins produced by this strain. We report the complete genome of Lactobacillus paracasei SD1 and its comparison to other Lactobacillus genomes. Additionally, we identify and analyze its protein toxins and antimicrobial proteins using reliable online database resources and establish its phylogenetic relationship with other bacterial genomes. Our investigation suggests that this strain is safe for human use and contains several bacteriocins that confer health benefits to the host. An in silico analysis of protein-protein interactions between the target bacteriocins and the microbial proteins gtfB and luxS of Streptococcus mutans was performed and is discussed here. PMID:28837656

Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear and Giant Panda Ancestry.

ERIC Educational Resources Information Center

Maier, Caroline Alexandra

2001-01-01

Presents an activity in which students seek answers to questions about evolutionary relationships by using genetic databases and bioinformatics software. Students build genetic distance matrices and phylogenetic trees based on molecular sequence data using web-based resources. Provides a flowchart of steps involved in accessing, retrieving, and…
Phylogenetic placement of two previously described intranuclear bacteria from the ciliate Paramecium bursaria (Protozoa, Ciliophora): 'Holospora acuminata' and 'Holospora curviuscula'.

PubMed

Rautian, Maria S; Wackerow-Kouzova, Natalia D

2013-05-01

'Holospora acuminata' infects micronuclei of Paramecium bursaria (Protozoa, Ciliophora), whereas 'Holospora curviuscula' infects the macronucleus in other clones of the same host species. Because these micro-organisms have not been cultivated, their description has been based only on some morphological properties and host and nuclear specificities. One16S rRNA gene sequence of 'H. curviuscula' is present in databases. The systematic position of the representative strain of 'H. curviuscula', strain MC-3, was determined in this study. Moreover, for the first time, two strains of 'H. acuminata', KBN10-1 and AC61-10, were investigated. Phylogenetic analysis indicated that all three strains belonged to the genus Holospora, family Holosporaceae, order Rickettsiales within the Alphaproteobacteria.
Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

PubMed Central

Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

2011-01-01

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
The impact of clinical, demographic and risk factors on rates of HIV transmission: a population-based phylogenetic analysis in British Columbia, Canada.

PubMed

Poon, Art F Y; Joy, Jeffrey B; Woods, Conan K; Shurgold, Susan; Colley, Guillaume; Brumme, Chanson J; Hogg, Robert S; Montaner, Julio S G; Harrigan, P Richard

2015-03-15

The diversification of human immunodeficiency virus (HIV) is shaped by its transmission history. We therefore used a population based province wide HIV drug resistance database in British Columbia (BC), Canada, to evaluate the impact of clinical, demographic, and behavioral factors on rates of HIV transmission. We reconstructed molecular phylogenies from 27,296 anonymized bulk HIV pol sequences representing 7747 individuals in BC-about half the estimated HIV prevalence in BC. Infections were grouped into clusters based on phylogenetic distances, as a proxy for variation in transmission rates. Rates of cluster expansion were reconstructed from estimated dates of HIV seroconversion. Our criteria grouped 4431 individuals into 744 clusters largely separated with respect to risk factors, including large established clusters predominated by injection drug users and more-recently emerging clusters comprising men who have sex with men. The mean log10 viral load of an individual's phylogenetic neighborhood (composed of 5 other individuals with shortest phylogenetic distances) increased their odds of appearing in a cluster by >2-fold per log10 viruses per milliliter. Hotspots of ongoing HIV transmission can be characterized in near real time by the secondary analysis of HIV resistance genotypes, providing an important potential resource for targeting public health initiatives for HIV prevention. © The Author 2014. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Expanding the Halohydrin Dehalogenase Enzyme Family: Identification of Novel Enzymes by Database Mining.

PubMed

Schallmey, Marcus; Koopmeiners, Julia; Wells, Elizabeth; Wardenga, Rainer; Schallmey, Anett

2014-12-01

Halohydrin dehalogenases are very rare enzymes that are naturally involved in the mineralization of halogenated xenobiotics. Due to their catalytic potential and promiscuity, many biocatalytic reactions have been described that have led to several interesting and industrially important applications. Nevertheless, only a few of these enzymes have been made available through recombinant techniques; hence, it is of general interest to expand the repertoire of these enzymes so as to enable novel biocatalytic applications. After the identification of specific sequence motifs, 37 novel enzyme sequences were readily identified in public sequence databases. All enzymes that could be heterologously expressed also catalyzed typical halohydrin dehalogenase reactions. Phylogenetic inference for enzymes of the halohydrin dehalogenase enzyme family confirmed that all enzymes form a distinct monophyletic clade within the short-chain dehydrogenase/reductase superfamily. In addition, the majority of novel enzymes are substantially different from previously known phylogenetic subtypes. Consequently, four additional phylogenetic subtypes were defined, greatly expanding the halohydrin dehalogenase enzyme family. We show that the enormous wealth of environmental and genome sequences present in public databases can be tapped for in silico identification of very rare but biotechnologically important biocatalysts. Our findings help to readily identify halohydrin dehalogenases in ever-growing sequence databases and, as a consequence, make even more members of this interesting enzyme family available to the scientific and industrial community. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Molecular characterization of feline calicivirus variants from multicat household and public animal shelter in Rio de Janeiro, Brazil.

PubMed

Pereira, Joylson de Jesus; Baumworcel, Natasha; Fioretti, Júlia Monassa; Domingues, Cinthya Fonseca; Moraes, Laís Fernandes de; Marinho, Robson Dos Santos Souza; Vieira, Maria Clara Rodrigues; Pinto, Ana Maria Viana; de Castro, Tatiana Xavier

2018-02-28

The aim of this study was to perform the molecular characterization of conserved and variable regions of feline calicivirus capsid genome in order to investigate the molecular diversity of variants in Brazilian cat population. Twenty-six conjunctival samples from cats living in five public short-term animal shelters and three multicat life-long households were analyzed. Fifteen cats had conjunctivitis, three had oral ulceration, eight had respiratory signs (cough, sneeze and nasal discharge) and nine were asymptomatic. Feline calicivirus were isolated in CRFK cells and characterized by reverse transcription PCR target to both conserved and variable regions of open reading frame 2. The amplicons obtained were sequenced. A phylogenetic analysis along with most of the prototypes available in GenBank database and an amino acid analysis were performed. Phylogenetic analysis based on both conserved and variable region revealed two clusters with an aLTR value of 1.00 and 0.98 respectively and the variants from this study belong to feline calicivirus genogroup I. No association between geographical distribution and/or clinical signs and clustering in phylogenetic tree was observed. The variants circulating in public short-term animal shelter demonstrated a high variability because of the relatively rapid turnover of carrier cats constantly introduced of multiple viruses into this location over time. Copyright © 2018 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.

PubMed

Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J

2004-10-01

Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
sRNAdb: A small non-coding RNA database for gram-positive bacteria

PubMed Central

2012-01-01

Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983
FunGene: the functional gene pipeline and repository.

PubMed

Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

2013-01-01

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
bcgTree: automatized phylogenetic tree building from bacterial core genomes.

PubMed

Ankenbrand, Markus J; Keller, Alexander

2016-10-01

The need for multi-gene analyses in scientific fields such as phylogenetics and DNA barcoding has increased in recent years. In particular, these approaches are increasingly important for differentiating bacterial species, where reliance on the standard 16S rDNA marker can result in poor resolution. Additionally, the assembly of bacterial genomes has become a standard task due to advances in next-generation sequencing technologies. We created a bioinformatic pipeline, bcgTree, which uses assembled bacterial genomes either from databases or own sequencing results from the user to reconstruct their phylogenetic history. The pipeline automatically extracts 107 essential single-copy core genes, found in a majority of bacteria, using hidden Markov models and performs a partitioned maximum-likelihood analysis. Here, we describe the workflow of bcgTree and, as a proof-of-concept, its usefulness in resolving the phylogeny of 293 publically available bacterial strains of the genus Lactobacillus. We also evaluate its performance in both low- and high-level taxonomy test sets. The tool is freely available at github ( https://github.com/iimog/bcgTree ) and our institutional homepage ( http://www.dna-analytics.biozentrum.uni-wuerzburg.de ).
Information categorization approach to literary authorship disputes

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.

2003-11-01

Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.
TreeVector: scalable, interactive, phylogenetic trees for the web.

PubMed

Pethica, Ralph; Barker, Gary; Kovacs, Tim; Gough, Julian

2010-01-28

Phylogenetic trees are complex data forms that need to be graphically displayed to be human-readable. Traditional techniques of plotting phylogenetic trees focus on rendering a single static image, but increases in the production of biological data and large-scale analyses demand scalable, browsable, and interactive trees. We introduce TreeVector, a Scalable Vector Graphics-and Java-based method that allows trees to be integrated and viewed seamlessly in standard web browsers with no extra software required, and can be modified and linked using standard web technologies. There are now many bioinformatics servers and databases with a range of dynamic processes and updates to cope with the increasing volume of data. TreeVector is designed as a framework to integrate with these processes and produce user-customized phylogenies automatically. We also address the strengths of phylogenetic trees as part of a linked-in browsing process rather than an end graphic for print. TreeVector is fast and easy to use and is available to download precompiled, but is also open source. It can also be run from the web server listed below or the user's own web server. It has already been deployed on two recognized and widely used database Web sites.
PAQ: Partition Analysis of Quasispecies.

PubMed

Baccam, P; Thompson, R J; Fedrigo, O; Carpenter, S; Cornette, J L

2001-01-01

The complexities of genetic data may not be accurately described by any single analytical tool. Phylogenetic analysis is often used to study the genetic relationship among different sequences. Evolutionary models and assumptions are invoked to reconstruct trees that describe the phylogenetic relationship among sequences. Genetic databases are rapidly accumulating large amounts of sequences. Newly acquired sequences, which have not yet been characterized, may require preliminary genetic exploration in order to build models describing the evolutionary relationship among sequences. There are clustering techniques that rely less on models of evolution, and thus may provide nice exploratory tools for identifying genetic similarities. Some of the more commonly used clustering methods perform better when data can be grouped into mutually exclusive groups. Genetic data from viral quasispecies, which consist of closely related variants that differ by small changes, however, may best be partitioned by overlapping groups. We have developed an intuitive exploratory program, Partition Analysis of Quasispecies (PAQ), which utilizes a non-hierarchical technique to partition sequences that are genetically similar. PAQ was used to analyze a data set of human immunodeficiency virus type 1 (HIV-1) envelope sequences isolated from different regions of the brain and another data set consisting of the equine infectious anemia virus (EIAV) regulatory gene rev. Analysis of the HIV-1 data set by PAQ was consistent with phylogenetic analysis of the same data, and the EIAV rev variants were partitioned into two overlapping groups. PAQ provides an additional tool which can be used to glean information from genetic data and can be used in conjunction with other tools to study genetic similarities and genetic evolution of viral quasispecies.
Evaluation of a Phylogenetic Marker Based on Genomic Segment B of Infectious Bursal Disease Virus: Facilitating a Feasible Incorporation of this Segment to the Molecular Epidemiology Studies for this Viral Agent

PubMed Central

Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L.; Bertran, Kateri; Frías, Maria T.; Ganges, Llilianne; Díaz de Arce, Heidy; Majó, Natàlia; Núñez, José I.; Pérez, Lester J.

2015-01-01

Background Infectious bursal disease (IBD) is a highly contagious and acute viral disease, which has caused high mortality rates in birds and considerable economic losses in different parts of the world for more than two decades and it still represents a considerable threat to poultry. The current study was designed to rigorously measure the reliability of a phylogenetic marker included into segment B. This marker can facilitate molecular epidemiology studies, incorporating this segment of the viral genome, to better explain the links between emergence, spreading and maintenance of the very virulent IBD virus (vvIBDV) strains worldwide. Methodology/Principal Findings Sequences of the segment B gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank Database; Cuban sequences were obtained in the current work. A phylogenetic marker named B-marker was assessed by different phylogenetic principles such as saturation of substitution, phylogenetic noise and high consistency. This last parameter is based on the ability of B-marker to reconstruct the same topology as the complete segment B of the viral genome. From the results obtained from B-marker, demographic history for both main lineages of IBDV regarding segment B was performed by Bayesian skyline plot analysis. Phylogenetic analysis for both segments of IBDV genome was also performed, revealing the presence of a natural reassortant strain with segment A from vvIBDV strains and segment B from non-vvIBDV strains within Cuban IBDV population. Conclusions/Significance This study contributes to a better understanding of the emergence of vvIBDV strains, describing molecular epidemiology of IBDV using the state-of-the-art methodology concerning phylogenetic reconstruction. This study also revealed the presence of a novel natural reassorted strain as possible manifest of change in the genetic structure and stability of the vvIBDV strains. Therefore, it highlights the need to obtain information about both genome segments of IBDV for molecular epidemiology studies. PMID:25946336
The Transporter Classification Database: recent advances.

PubMed

Saier, Milton H; Yen, Ming Ren; Noto, Keith; Tamang, Dorjee G; Elkan, Charles

2009-01-01

The Transporter Classification Database (TCDB), freely accessible at http://www.tcdb.org, is a relational database containing sequence, structural, functional and evolutionary information about transport systems from a variety of living organisms, based on the International Union of Biochemistry and Molecular Biology-approved transporter classification (TC) system. It is a curated repository for factual information compiled largely from published references. It uses a functional/phylogenetic system of classification, and currently encompasses about 5000 representative transporters and putative transporters in more than 500 families. We here describe novel software designed to support and extend the usefulness of TCDB. Our recent efforts render it more user friendly, incorporate machine learning to input novel data in a semiautomatic fashion, and allow analyses that are more accurate and less time consuming. The availability of these tools has resulted in recognition of distant phylogenetic relationships and tremendous expansion of the information available to TCDB users.
Diversity and Phylogenetic Distribution of Extracellular Microbial Peptidases

NASA Astrophysics Data System (ADS)

Nguyen, Trang; Mueller, Ryan; Myrold, David

2017-04-01

Depolymerization of proteinaceous compounds by extracellular proteolytic enzymes is a bottleneck in the nitrogen cycle, limiting the rate of the nitrogen turnover in soils. Protein degradation is accomplished by a diverse range of extracellular (secreted) peptidases. Our objective was to better understand the evolution of these enzymes and how their functional diversity corresponds to known phylogenetic diversity. Peptidase subfamilies from 110 archaeal, 1,860 bacterial, and 97 fungal genomes were extracted from the MEROPS database along with corresponding SSU sequences for each genome from the SILVA database, resulting in 43,177 secreted peptidases belonging to 34 microbial phyla and 149 peptidase subfamilies. We compared the distribution of each peptidase subfamily across all taxa to the phylogenetic relationships of these organisms based on their SSU gene sequences. The occurrence and abundance of genes coding for secreted peptidases varied across microbial taxa, distinguishing the peptidase complement of the three microbial kingdoms. Bacteria had the highest frequency of secreted peptidase coding genes per 1,000 genes and contributed from 1% to 6% of the gene content. Fungi only had a slightly higher number of secreted peptidase gene content than archaea, standardized by the total genes. The relative abundance profiles of secreted peptidases in each microbial kingdom also varied, in which aspartic family was found to be the greatest in fungi (25%), whereas it was only 12% in archaea and 4% in bacteria. Serine, metallo, and cysteine families consistently contributed widely up to 75% of the secreted peptidase abundance across the three kingdoms. Overall, bacteria had a much wider collection of secreted peptidases, whereas fungi and archaea shared most of their secreted peptidase families. Principle coordinate analysis of the peptidase subfamily-based dissimilarities showed distinguishable clusters for different groups of microorganisms. The distribution of secreted peptidases was found to be significantly correlated with phylogenetic relationships within kingdoms (archaea rMantel=0.364, p=0.001; bacteria rMantel=0.257, p=0.001, and fungi rMantel=0.281, p=0.005), inferring an evolutionary relationship where subsets of phylogenetically related organisms share similar types of secreted peptidases. We also tested the phylogenetic signal strength of each peptidase subfamily for each microbial kingdom based on the binary traits of the distribution (presence or absence of secreted peptidase subfamilies in individual species). About one-third of the peptidase subfamilies displayed a strong evolutionary signal; the rest were phylogenetically over-dispersed, suggesting that these subfamilies are randomly distributed across the tree of life or the result of events such as horizontal gene transfer. Study of the diversity and phylogenetic distribution of secreted peptidases offered a mechanistic basis to anticipate the proteolytic potential function of microbial communities.
Instances of erroneous DNA barcoding of metazoan invertebrates: Are universal cox1 gene primers too "universal"?

PubMed

Mioduchowska, Monika; Czyż, Michał Jan; Gołdyn, Bartłomiej; Kur, Jarosław; Sell, Jerzy

2018-01-01

The cytochrome c oxidase subunit I (cox1) gene is the main mitochondrial molecular marker playing a pivotal role in phylogenetic research and is a crucial barcode sequence. Folmer's "universal" primers designed to amplify this gene in metazoan invertebrates allowed quick and easy barcode and phylogenetic analysis. On the other hand, the increase in the number of studies on barcoding leads to more frequent publishing of incorrect sequences, due to amplification of non-target taxa, and insufficient analysis of the obtained sequences. Consequently, some sequences deposited in genetic databases are incorrectly described as obtained from invertebrates, while being in fact bacterial sequences. In our study, in which we used Folmer's primers to amplify COI sequences of the crustacean fairy shrimp Branchipus schaefferi (Fischer 1834), we also obtained COI sequences of microbial contaminants from Aeromonas sp. However, when we searched the GenBank database for sequences closely matching these contaminations we found entries described as representatives of Gastrotricha and Mollusca. When these entries were compared with other sequences bearing the same names in the database, the genetic distance between the incorrect and correct sequences amplified from the same species was c.a. 65%. Although the responsibility for the correct molecular identification of species rests on researchers, the errors found in already published sequences data have not been re-evaluated so far. On the basis of the standard sampling technique we have estimated with 95% probability that the chances of finding incorrectly described metazoan sequences in the GenBank depend on the systematic group, and variety from less than 1% (Mollusca and Arthropoda) up to 6.9% (Gastrotricha). Consequently, the increasing popularity of DNA barcoding and metabarcoding analysis may lead to overestimation of species diversity. Finally, the study also discusses the sources of the problems with amplification of non-target sequences.
The origin and evolution of Basigin(BSG) gene: A comparative genomic and phylogenetic analysis.

PubMed

Zhu, Xinyan; Wang, Shenglan; Shao, Mingjie; Yan, Jie; Liu, Fei

2017-07-01

Basigin (BSG), also known as extracellular matrix metalloproteinase inducer (EMMPRIN) or cluster of differentiation 147 (CD147), plays various fundamental roles in the intercellular recognition involved in immunologic phenomena, differentiation, and development. In this study, we aimed to compare the similarities and differences of BSG among organisms and explore possible evolutionary relationships based on the comparison result. We used the extensive BLAST tool to search the metazoan genomes, N-glycosylation sites, the transmembrane region and other functional sites. We then identified BSG homologs from genomic sequences and analyzed their phylogenetic relationships. We identified that BSG genes exist not only in the vertebrate metazoans but also in the invertebrate metazoans such as Amphioxus B. floridae, D. melanogaster, A. mellifera, S. japonicum, C. gigas, and T. patagoniensis. After sequence analysis, we confirmed that only vertebrate metazoans and Cephalochordate (amphioxus B. floridae) have the classic structure (a signal peptide, two Ig-like domains (IgC2 and IgI), a transmembrane region, and an intracellular domain). The invertebrate metazoans (excluding amphioxus B. floridae) lack the N-terminal signal peptides and IgC2 domain. We then generated a phylogenetic tree, genome organization comparison, and chromosomal disposition analysis based on the biological information obtained from the NCBI and Ensembl databases. Finally, we established the possible evolutionary scenario of the BSG gene, which showed the restricted exon rearrangement that has occurred during evolution, forming the present-day BSG gene. Copyright © 2017 Elsevier Ltd. All rights reserved.
DHLAS: A web-based information system for statistical genetic analysis of HLA population data.

PubMed

Thriskos, P; Zintzaras, E; Germenis, A

2007-03-01

DHLAS (database HLA system) is a user-friendly, web-based information system for the analysis of human leukocyte antigens (HLA) data from population studies. DHLAS has been developed using JAVA and the R system, it runs on a Java Virtual Machine and its user-interface is web-based powered by the servlet engine TOMCAT. It utilizes STRUTS, a Model-View-Controller framework and uses several GNU packages to perform several of its tasks. The database engine it relies upon for fast access is MySQL, but others can be used a well. The system estimates metrics, performs statistical testing and produces graphs required for HLA population studies: (i) Hardy-Weinberg equilibrium (calculated using both asymptotic and exact tests), (ii) genetics distances (Euclidian or Nei), (iii) phylogenetic trees using the unweighted pair group method with averages and neigbor-joining method, (iv) linkage disequilibrium (pairwise and overall, including variance estimations), (v) haplotype frequencies (estimate using the expectation-maximization algorithm) and (vi) discriminant analysis. The main merit of DHLAS is the incorporation of a database, thus, the data can be stored and manipulated along with integrated genetic data analysis procedures. In addition, it has an open architecture allowing the inclusion of other functions and procedures.
The Fossil Calibration Database-A New Resource for Divergence Dating.

PubMed

Ksepka, Daniel T; Parham, James F; Allman, James F; Benton, Michael J; Carrano, Matthew T; Cranston, Karen A; Donoghue, Philip C J; Head, Jason J; Hermsen, Elizabeth J; Irmis, Randall B; Joyce, Walter G; Kohli, Manpreet; Lamm, Kristin D; Leehr, Dan; Patané, Josés L; Polly, P David; Phillips, Matthew J; Smith, N Adam; Smith, Nathan D; Van Tuinen, Marcel; Ware, Jessica L; Warnock, Rachel C M

2015-09-01

Fossils provide the principal basis for temporal calibrations, which are critical to the accuracy of divergence dating analyses. Translating fossil data into minimum and maximum bounds for calibrations is the most important-often least appreciated-step of divergence dating. Properly justified calibrations require the synthesis of phylogenetic, paleontological, and geological evidence and can be difficult for nonspecialists to formulate. The dynamic nature of the fossil record (e.g., new discoveries, taxonomic revisions, updates of global or local stratigraphy) requires that calibration data be updated continually lest they become obsolete. Here, we announce the Fossil Calibration Database (http://fossilcalibrations.org), a new open-access resource providing vetted fossil calibrations to the scientific community. Calibrations accessioned into this database are based on individual fossil specimens and follow best practices for phylogenetic justification and geochronological constraint. The associated Fossil Calibration Series, a calibration-themed publication series at Palaeontologia Electronica, will serve as a key pipeline for peer-reviewed calibrations to enter the database. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Phylogenetic analysis of bacterial and archaeal species in symptomatic and asymptomatic endodontic infections.

PubMed

Vickerman, M M; Brossard, K A; Funk, D B; Jesionowski, A M; Gill, S R

2007-01-01

Phylogenetic analysis of bacterial and archaeal 16S rRNA was used to examine polymicrobial communities within infected root canals of 20 symptomatic and 14 asymptomatic patients. Nucleotide sequences from approximately 750 clones amplified from each patient group with universal bacterial primers were matched to the Ribosomal Database Project II database. Phylotypes from 37 genera representing Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria and Proteobacteria were identified. Results were compared to those obtained with species-specific primers designed to detect Prevotella intermedia, Porphyromonas gingivalis, Porphyromonas endodontalis, Peptostreptococcus micros, Enterococcus sp., Streptococcus sp., Fusobacterium nucleatum, Tannerella forsythensis and Treponema denticola. Since members of the domain Archaea have been implicated in the severity of periodontal disease, and a recent report confirms that archaea are present in endodontic infections, 16S archaeal primers were also used to detect which patients carried these prokaryotes, to determine if their presence correlated with severity of the clinical symptoms. A Methanobrevibacter oralis-like species was detected in one asymptomatic and one symptomatic patient. DNA from root canals of these two patients was further analysed using species-specific primers to determine bacterial cohabitants. Trep. denticola was detected in the asymptomatic but not the symptomatic patient. Conversely, Porph. endodontalis was found in the symptomatic but not the asymptomatic patient. All other species except enterococci were detected with the species-specific primers in both patients. These results confirm the presence of archaea in root canals and provide additional insights into the polymicrobial communities in endodontic infections associated with clinical symptoms.
MANTIS: a phylogenetic framework for multi-species genome comparisons.

PubMed

Tzika, Athanasia C; Helaers, Raphaël; Van de Peer, Yves; Milinkovitch, Michel C

2008-01-15

Practitioners of comparative genomics face huge analytical challenges as whole genome sequences and functional/expression data accumulate. Furthermore, the field would greatly benefit from a better integration of this wealth of data with evolutionary concepts. Here, we present MANTIS, a relational database for the analysis of (i) gains and losses of genes on specific branches of the metazoan phylogeny, (ii) reconstructed genome content of ancestral species and (iii) over- or under-representation of functions/processes and tissue specificity of gained, duplicated and lost genes. MANTIS estimates the most likely positions of gene losses on the true phylogeny using a maximum-likelihood function. A user-friendly interface and an extensive query system allow to investigate questions pertaining to gene identity, phylogenetic mapping and function/expression parameters. MANTIS is freely available at http://www.mantisdb.org and constitutes the missing link between multi-species genome comparisons and functional analyses.
Complete mitochondrial genome sequence of Indian medium carp, Labeo gonius (Hamilton, 1822) and its comparison with other related carp species.

PubMed

Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K

2017-01-01

In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.
The aquatic animals' transcriptome resource for comparative functional analysis.

PubMed

Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da

2018-05-09

Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .
Terminal Restriction Fragment Length Polymorphism Analysis Program, a Web-Based Research Tool for Microbial Community Analysis

PubMed Central

Marsh, Terence L.; Saxman, Paul; Cole, James; Tiedje, James

2000-01-01

Rapid analysis of microbial communities has proven to be a difficult task. This is due, in part, to both the tremendous diversity of the microbial world and the high complexity of many microbial communities. Several techniques for community analysis have emerged over the past decade, and most take advantage of the molecular phylogeny derived from 16S rRNA comparative sequence analysis. We describe a web-based research tool located at the Ribosomal Database Project web site (http://www.cme.msu.edu/RDP/html/analyses.html) that facilitates microbial community analysis using terminal restriction fragment length polymorphism of 16S ribosomal DNA. The analysis function (designated TAP T-RFLP) permits the user to perform in silico restriction digestions of the entire 16S sequence database and derive terminal restriction fragment sizes, measured in base pairs, from the 5′ terminus of the user-specified primer to the 3′ terminus of the restriction endonuclease target site. The output can be sorted and viewed either phylogenetically or by size. It is anticipated that the site will guide experimental design as well as provide insight into interpreting results of community analysis with terminal restriction fragment length polymorphisms. PMID:10919828
Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

PubMed Central

Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

2012-01-01

Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278
A network perspective on the topological importance of enzymes and their phylogenetic conservation

PubMed Central

Liu, Wei-chung; Lin, Wen-hsien; Davis, Andrew J; Jordán, Ferenc; Yang, Hsih-te; Hwang, Ming-jing

2007-01-01

Background A metabolic network is the sum of all chemical transformations or reactions in the cell, with the metabolites being interconnected by enzyme-catalyzed reactions. Many enzymes exist in numerous species while others occur only in a few. We ask if there are relationships between the phylogenetic profile of an enzyme, or the number of different bacterial species that contain it, and its topological importance in the metabolic network. Our null hypothesis is that phylogenetic profile is independent of topological importance. To test our null hypothesis we constructed an enzyme network from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. We calculated three network indices of topological importance: the degree or the number of connections of a network node; closeness centrality, which measures how close a node is to others; and betweenness centrality measuring how frequently a node appears on all shortest paths between two other nodes. Results Enzyme phylogenetic profile correlates best with betweenness centrality and also quite closely with degree, but poorly with closeness centrality. Both betweenness and closeness centralities are non-local measures of topological importance and it is intriguing that they have contrasting power of predicting phylogenetic profile in bacterial species. We speculate that redundancy in an enzyme network may be reflected by betweenness centrality but not by closeness centrality. We also discuss factors influencing the correlation between phylogenetic profile and topological importance. Conclusion Our analysis falsifies the hypothesis that phylogenetic profile of enzymes is independent of enzyme network importance. Our results show that phylogenetic profile correlates better with degree and betweenness centrality, but less so with closeness centrality. Enzymes that occur in many bacterial species tend to be those that have high network importance. We speculate that this phenomenon originates in mechanisms driving network evolution. Closeness centrality reflects phylogenetic profile poorly. This is because metabolic networks often consist of distinct functional modules and some are not in the centre of the network. Enzymes in these peripheral parts of a network might be important for cell survival and should therefore occur in many bacterial species. They are, however, distant from other enzymes in the same network. PMID:17425808
VitisExpDB: a database resource for grape functional genomics.

PubMed

Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

2008-02-28

The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores approximately 320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of approximately 20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website http://cropdisease.ars.usda.gov/vitis_at/main-page.htm.
VitisExpDB: A database resource for grape functional genomics

PubMed Central

Doddapaneni, Harshavardhan; Lin, Hong; Walker, M Andrew; Yao, Jiqiang; Civerolo, Edwin L

2008-01-01

Background The family Vitaceae consists of many different grape species that grow in a range of climatic conditions. In the past few years, several studies have generated functional genomic information on different Vitis species and cultivars, including the European grape vine, Vitis vinifera. Our goal is to develop a comprehensive web data source for Vitaceae. Description VitisExpDB is an online MySQL-PHP driven relational database that houses annotated EST and gene expression data for V. vinifera and non-vinifera grape species and varieties. Currently, the database stores ~320,000 EST sequences derived from 8 species/hybrids, their annotation (BLAST top match) details and Gene Ontology based structured vocabulary. Putative homologs for each EST in other species and varieties along with information on their percent nucleotide identities, phylogenetic relationship and common primers can be retrieved. The database also includes information on probe sequence and annotation features of the high density 60-mer gene expression chip consisting of ~20,000 non-redundant set of ESTs. Finally, the database includes 14 processed global microarray expression profile sets. Data from 12 of these expression profile sets have been mapped onto metabolic pathways. A user-friendly web interface with multiple search indices and extensively hyperlinked result features that permit efficient data retrieval has been developed. Several online bioinformatics tools that interact with the database along with other sequence analysis tools have been added. In addition, users can submit their ESTs to the database. Conclusion The developed database provides genomic resource to grape community for functional analysis of genes in the collection and for the grape genome annotation and gene function identification. The VitisExpDB database is available through our website . PMID:18307813
Members of the Dof transcription factor family in Triticum aestivum are associated with light-mediated gene regulation.

PubMed

Shaw, Lindsay M; McIntyre, C Lynne; Gresshoff, Peter M; Xue, Gang-Ping

2009-11-01

DNA binding with One Finger (Dof) protein is a plant-specific transcription factor implicated in the regulation of many important plant-specific processes, including photosynthesis and carbohydrate metabolism. This study has identified 31 Dof genes (TaDof) in bread wheat through extensive analysis of current nucleotide databases. Phylogenetic analysis suggests that the TaDof family can be divided into four clades. Expression analysis of the TaDof family across all major organs using quantitative RT-PCR and searches of the wheat genome array database revealed that the majority of TaDof members were predominately expressed in vegetative organs. A large number of TaDof members were down-regulated by drought and/or were responsive to the light and dark cycle. Further expression analysis revealed that light up-regulated TaDof members were highly correlated in expression with a number of genes that are involved in photosynthesis or sucrose transport. These data suggest that the TaDof family may have an important role in light-mediated gene regulation, including involvement in the photosynthetic process.
Use of phylogenetic and phenotypic analyses to identify nonhemolytic streptococci isolated from bacteremic patients.

PubMed

Hoshino, Tomonori; Fujiwara, Taku; Kilian, Mogens

2005-12-01

The aim of this study was to evaluate molecular and phenotypic methods for the identification of nonhemolytic streptococci. A collection of 148 strains consisting of 115 clinical isolates from cases of infective endocarditis, septicemia, and meningitis and 33 reference strains, including type strains of all relevant Streptococcus species, were examined. Identification was performed by phylogenetic analysis of nucleotide sequences of four housekeeping genes, ddl, gdh, rpoB, and sodA; by PCR analysis of the glucosyltransferase (gtf) gene; and by conventional phenotypic characterization and identification using two commercial kits, Rapid ID 32 STREP and STREPTOGRAM and the associated databases. A phylogenetic tree based on concatenated sequences of the four housekeeping genes allowed unequivocal differentiation of recognized species and was used as the reference. Analysis of single gene sequences revealed deviation clustering in eight strains (5.4%) due to homologous recombination with other species. This was particularly evident in S. sanguinis and in members of the anginosus group of streptococci. The rate of correct identification of the strains by both commercial identification kits was below 50% but varied significantly between species. The most significant problems were observed with S. mitis and S. oralis and 11 Streptococcus species described since 1991. Our data indicate that identification based on multilocus sequence analysis is optimal. As a more practical alternative we recommend identification based on sodA sequences with reference to a comprehensive set of sequences that is available for downloading from our server. An analysis of the species distribution of 107 nonhemolytic streptococci from bacteremic patients showed a predominance of S. oralis and S. anginosus with various underlying infections.
Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

PubMed

Dai, Qi; Yang, Yanchun; Wang, Tianming

2008-10-15

Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.
Genetic polymorphisms, forensic efficiency and phylogenetic analysis of 15 autosomal STR loci in the Kazak population of Ili Kazak Autonomous Prefecture, northwestern China.

PubMed

Feng, Chunmei; Wang, Xin; Wang, Xiaolong; Yu, Hao; Zhang, Guohua

2018-03-01

We investigated the frequencies of 15 autosomal STR loci in the Kazak population of the Ili Kazak Autonomous Prefecture with the aim of expanding the available population information in human genetic databases and for forensic DNA analysis. Genetic polymorphisms of 15 autosomal short tandem repeat (STR) loci were analysed in 456 individuals of the Kazak population from Ili Kazakh Autonomous Prefecture, northwestern China. A total of 173 alleles at 15 autosomal STR loci were found; the allele frequencies ranged from 0.5022-0.0011. The combined power of discrimination and exclusion statistics for the 15 STR loci were 0.999 999 999 85 and 0.999 998 800 65, respectively. In addition, phylogenetic analysis involving the Ili Uygur population and other relevant populations was carried out. A neighbour-joining tree and multidimensional scaling plot were generated based on Nei's standard genetic distance. Results of the population comparison indicated that the Ili Uygur population was most closely related genetically to the Uygur populations from other regions in China. These findings are consistent with the historical and geographic backgrounds of these populations.
Facilitation can increase the phylogenetic diversity of plant communities.

PubMed

Valiente-Banuet, Alfonso; Verdú, Miguel

2007-11-01

With the advent of molecular phylogenies the assessment of community assembly processes has become a central topic in community ecology. These processes have focused almost exclusively on habitat filtering and competitive exclusion. Recent evidence, however, indicates that facilitation has been important in preserving biodiversity over evolutionary time, with recent lineages conserving the regeneration niches of older, distant lineages. Here we test whether, if facilitation among distant-related species has preserved the regeneration niche of plant lineages, this has increased the phylogenetic diversity of communities. By analyzing a large worldwide database of species, we showed that the regeneration niches were strongly conserved across evolutionary history. Likewise, a phylogenetic supertree of all species of three communities driven by facilitation showed that nurse species facilitated distantly related species and increased phylogenetic diversity.
The Forest behind the Tree: Phylogenetic Exploration of a Dominant Mycobacterium tuberculosis Strain Lineage from a High Tuberculosis Burden Country

PubMed Central

Cardoso Oelemann, Maranibia; Gomes, Harrison M.; Willery, Eve; Possuelo, Lia; Batista Lima, Karla Valéria; Allix-Béguec, Caroline; Locht, Camille; Goguet de la Salmonière, Yves-Olivier L.; Gutierrez, Maria Cristina; Suffys, Philip; Supply, Philip

2011-01-01

Background Genotyping of Mycobacterium tuberculosis isolates is a powerful tool for epidemiological control of tuberculosis (TB) and phylogenetic exploration of the pathogen. Standardized PCR-based typing, based on 15 to 24 mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) loci combined with spoligotyping, has been shown to have adequate resolution power for tracing TB transmission and to be useful for predicting diverse strain lineages in European settings. Its informative value needs to be tested in high TB-burden countries, where the use of genotyping is often complicated by dominance of geographically specific, genetically homogeneous strain lineages. Methodology/Principal Findings We tested this genotyping system for molecular epidemiological analysis of 369 M. tuberculosis isolates from 3 regions of Brazil, a high TB-burden country. Deligotyping, targeting 43 large sequence polymorphisms (LSPs), and the MIRU-VNTRplus identification database were used to assess phylogenetic predictions. High congruence between the different typing results consistently revealed the countrywide supremacy of the Latin-American-Mediterranean (LAM) lineage, comprised of three main branches. In addition to an already known RDRio branch, at least one other branch characterized by a phylogenetically informative LAM3 spoligo-signature seems to be globally distributed beyond Brazil. Nevertheless, by distinguishing 321 genotypes in this strain population, combined MIRU-VNTR typing and spoligotyping demonstrated the presence of multiple distinct clones. The use of 15 to 24 loci discriminated 21 to 25% more strains within the LAM lineage, compared to a restricted lineage-specific locus set suggested to be used after SNP analysis. Noteworthy, 23 of the 28 molecular clusters identified were exclusively composed of patient isolates from a same region, consistent with expected patterns of mostly local TB transmission. Conclusions/Significance Standard MIRU-VNTR typing combined with spoligotyping can reveal epidemiologically meaningful clonal diversity behind a dominant M. tuberculosis strain lineage in a high TB-burden country and is useful to explore international phylogenetical ramifications. PMID:21464915
Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae

PubMed Central

Lagkouvardos, Ilias; Weinmaier, Thomas; Lauro, Federico M; Cavicchioli, Ricardo; Rattei, Thomas; Horn, Matthias

2014-01-01

In the era of metagenomics and amplicon sequencing, comprehensive analyses of available sequence data remain a challenge. Here we describe an approach exploiting metagenomic and amplicon data sets from public databases to elucidate phylogenetic diversity of defined microbial taxa. We investigated the phylum Chlamydiae whose known members are obligate intracellular bacteria that represent important pathogens of humans and animals, as well as symbionts of protists. Despite their medical relevance, our knowledge about chlamydial diversity is still scarce. Most of the nine known families are represented by only a few isolates, while previous clone library-based surveys suggested the existence of yet uncharacterized members of this phylum. Here we identified more than 22 000 high quality, non-redundant chlamydial 16S rRNA gene sequences in diverse databases, as well as 1900 putative chlamydial protein-encoding genes. Even when applying the most conservative approach, clustering of chlamydial 16S rRNA gene sequences into operational taxonomic units revealed an unexpectedly high species, genus and family-level diversity within the Chlamydiae, including 181 putative families. These in silico findings were verified experimentally in one Antarctic sample, which contained a high diversity of novel Chlamydiae. In our analysis, the Rhabdochlamydiaceae, whose known members infect arthropods, represents the most diverse and species-rich chlamydial family, followed by the protist-associated Parachlamydiaceae, and a putative new family (PCF8) with unknown host specificity. Available information on the origin of metagenomic samples indicated that marine environments contain the majority of the newly discovered chlamydial lineages, highlighting this environment as an important chlamydial reservoir. PMID:23949660
Short branches lead to systematic artifacts when BLAST searches are used as surrogate for phylogenetic reconstruction.

PubMed

Dick, Amanda A; Harlow, Timothy J; Gogarten, J Peter

2017-02-01

Long Branch Attraction (LBA) is a well-known artifact in phylogenetic reconstruction when dealing with branch length heterogeneity. Here we show another phenomenon, Short Branch Attraction (SBA), which occurs when BLAST searches, a phenetic analysis, are used as a surrogate method for phylogenetic analysis. This error also results from branch length heterogeneity, but this time it is the short branches that are attracting. The SBA artifact is reciprocal and can be returned 100% of the time when multiple branches differ in length by a factor of more than two. SBA is an intended feature of BLAST searches, but becomes an issue, when top scoring BLAST hit analyses are used to infer Horizontal Gene Transfers (HGTs), assign taxonomic category with environmental sequence data in phylotyping, or gather homologous sequences for building gene families. SBA can lead researchers to believe that there has been a HGT event when only vertical descent has occurred, cause slowly evolving taxa to be over-represented and quickly evolving taxa to be under-represented in phylotyping, or systematically exclude quickly evolving taxa from analyses. SBA also contributes to the changing results of top scoring BLAST hit analyses as the database grows, because more slowly evolving taxa, or short branches, are added over time, introducing more potential for SBA. SBA can be detected by examining reciprocal best BLAST hits among a larger group of taxa, including the known closest phylogenetic neighbors. Therefore, one should look for this phenomenon when conducting best BLAST hit analyses as a surrogate method to identify HGTs, in phylotyping, or when using BLAST to gather homologous sequences. Copyright © 2016 Elsevier Inc. All rights reserved.
Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates

PubMed Central

2012-01-01

Background Although it has proven to be an important foundation for investigations of carnivoran ecology, biology and evolution, the complete species-level supertree for Carnivora of Bininda-Emonds et al. is showing its age. Additional, largely molecular sequence data are now available for many species and the advancement of computer technology means that many of the limitations of the original analysis can now be avoided. We therefore sought to provide an updated estimate of the phylogenetic relationships within all extant Carnivora, again using supertree analysis to be able to analyze as much of the global phylogenetic database for the group as possible. Results In total, 188 source trees were combined, representing 114 trees from the literature together with 74 newly constructed gene trees derived from nearly 45,000 bp of sequence data from GenBank. The greater availability of sequence data means that the new supertree is almost completely resolved and also better reflects current phylogenetic opinion (for example, supporting a monophyletic Mephitidae, Eupleridae and Prionodontidae; placing Nandinia binotata as sister to the remaining Feliformia). Following an initial rapid radiation, diversification rate analyses indicate a downturn in the net speciation rate within the past three million years as well as a possible increase some 18.0 million years ago; numerous diversification rate shifts within the order were also identified. Conclusions Together, the two carnivore supertrees remain the only complete phylogenetic estimates for all extant species and the new supertree, like the old one, will form a key tool in helping us to further understand the biology of this charismatic group of carnivores. PMID:22369503
[A phylogenetic analysis of plant communities of Teberda Biosphere Reserve].

PubMed

Shulakov, A A; Egorov, A V; Onipchenko, V G

2016-01-01

Phylogenetic analysis of communities is based on the comparison of distances on the phylogenetic tree between species of a community under study and those distances in random samples taken out of local flora. It makes it possible to determine to what extent a community composition is formed by more closely related species (i.e., "clustered") or, on the opposite, it is more even and includes species that are less related with each other. The first case is usually interpreted as a result of strong influence caused by abiotic factors, due to which species with similar ecology, a priori more closely related, would remain: In the second case, biotic factors, such as competition, may come to the fore and lead to forming a community out of distant clades due to divergence of their ecological niches: The aim of this' study Was Ad explore the phylogenetic structure in communities of the northwestern Caucasus at two spatial scales - the scale of area from 4 to 100 m2 and the smaller scale within a community. The list of local flora of the alpine belt has been composed using the database of geobotanic descriptions carried out in Teberda Biosphere Reserve at true altitudes exceeding.1800 m. It includes 585 species of flowering plants belonging to 57 families. Basal groups of flowering plants are.not represented in the list. At the scale of communities of three classes, namely Thlaspietea rotundifolii - commumties formed on screes and pebbles, Calluno-Ulicetea - alpine meadow, and Mulgedio-Aconitetea subalpine meadows, have not demonstrated significant distinction of phylogenetic structure. At intra level, for alpine meadows the larger share of closely related species. (clustered community) is detected. Significantly clustered happen to be those communities developing on rocks (class Asplenietea trichomanis) and alpine (class Juncetea trifidi). At the same time, alpine lichen proved to have even phylogenetic structure at the small scale. Alpine (class Salicetea herbaceae) that develop under conditions of winter snow accumulation were more,even at the both.scale, i.e., contained more diverse and distantly related plant species compared with random samples. (Scheuchzerio-Caricetea fuscae) aquatic communities in cold (Montio-Cardaminetea), sedge meadows (Carici rupestris-Kobresietea bellardii), and communities, in which shrubs and predominated (juniper and rhododendron elfin woods, class Loiseleurio-Vaccinietea), have been studied only at the larger scale and showed significant evenness of species composition, i.e., were phylogenetically more diverse compared with random samples.
SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
The Papillomavirus Episteme: a major update to the papillomavirus sequence database.

PubMed

Van Doorslaer, Koenraad; Li, Zhiwen; Xirasagar, Sandhya; Maes, Piet; Kaminsky, David; Liou, David; Sun, Qiang; Kaur, Ramandeep; Huyen, Yentram; McBride, Alison A

2017-01-04

The Papillomavirus Episteme (PaVE) is a database of curated papillomavirus genomic sequences, accompanied by web-based sequence analysis tools. This update describes the addition of major new features. The papillomavirus genomes within PaVE have been further annotated, and now includes the major spliced mRNA transcripts. Viral genes and transcripts can be visualized on both linear and circular genome browsers. Evolutionary relationships among PaVE reference protein sequences can be analysed using multiple sequence alignments and phylogenetic trees. To assist in viral discovery, PaVE offers a typing tool; a simplified algorithm to determine whether a newly sequenced virus is novel. PaVE also now contains an image library containing gross clinical and histopathological images of papillomavirus infected lesions. Database URL: https://pave.niaid.nih.gov/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Network portal: a database for storage, analysis and visualization of biological networks

PubMed Central

Turkarslan, Serdar; Wurtmann, Elisabeth J.; Wu, Wei-Ju; Jiang, Ning; Bare, J. Christopher; Foley, Karen; Reiss, David J.; Novichkov, Pavel; Baliga, Nitin S.

2014-01-01

The ease of generating high-throughput data has enabled investigations into organismal complexity at the systems level through the inference of networks of interactions among the various cellular components (genes, RNAs, proteins and metabolites). The wider scientific community, however, currently has limited access to tools for network inference, visualization and analysis because these tasks often require advanced computational knowledge and expensive computing resources. We have designed the network portal (http://networks.systemsbiology.net) to serve as a modular database for the integration of user uploaded and public data, with inference algorithms and tools for the storage, visualization and analysis of biological networks. The portal is fully integrated into the Gaggle framework to seamlessly exchange data with desktop and web applications and to allow the user to create, save and modify workspaces, and it includes social networking capabilities for collaborative projects. While the current release of the database contains networks for 13 prokaryotic organisms from diverse phylogenetic clades (4678 co-regulated gene modules, 3466 regulators and 9291 cis-regulatory motifs), it will be rapidly populated with prokaryotic and eukaryotic organisms as relevant data become available in public repositories and through user input. The modular architecture, simple data formats and open API support community development of the portal. PMID:24271392
Distribution of Bathyarchaeota Communities Across Different Terrestrial Settings and Their Potential Ecological Functions

NASA Astrophysics Data System (ADS)

Xiang, Xing; Wang, Ruicheng; Wang, Hongmei; Gong, Linfeng; Man, Baiying; Xu, Ying

2017-03-01

High abundance and widespread distribution of the archaeal phylum Bathyarchaeota in marine environment have been recognized recently, but knowledge about Bathyarchaeota in terrestrial settings and their correlation with environmental parameters is fairly limited. Here we reported the abundance of Bathyarchaeota members across different ecosystems and their correlation with environmental factors by constructing 16S rRNA clone libraries of peat from the Dajiuhu Peatland, coupling with bioinformatics analysis of 16S rRNA data available to date in NCBI database. In total, 1456 Bathyarchaeota sequences from 28 sites were subjected to UniFrac analysis based on phylogenetic distance and multivariate regression tree analysis of taxonomy. Both phylogenetic and taxon-based approaches showed that salinity, total organic carbon and temperature significantly influenced the distribution of Bathyarchaeota across different terrestrial habitats. By applying the ecological concept of ‘indicator species’, we identify 9 indicator groups among the 6 habitats with the most in the estuary sediments. Network analysis showed that members of Bathyarchaeota formed the “backbone” of archaeal community and often co-occurred with Methanomicrobia. These results suggest that Bathyarchaeota may play an important ecological role within archaeal communities via a potential symbiotic association with Methanomicrobia. Our results shed light on understanding of the biogeography, potential functions of Bathyarchaeota and environment conditions that influence Bathyarchaea distribution in terrestrial settings.
Molecular identification and phylogenetic analysis of human Trichostrongylus species from an endemic area of Iran.

PubMed

Sharifdini, Meysam; Derakhshani, Sedigheh; Alizadeh, Safar Ali; Ghanbarzadeh, Laleh; Mirjalali, Hamed; Mobedi, Iraj; Saraei, Mehrzad

2017-12-01

Human infections with Trichostrongylus species have been reported in most parts of Iran. The aim of this study was the identification, molecular characterization and phylogenetic analysis of human Trichostrongylus species based on ITS2 region of ribosomal DNA from Guilan Province, northern Iran. Stool samples were collected from rural inhabitants and examined by formalin-ether concentration and agar plate culture techniques. After anthelmintic treatment, male adult worms were collected from five infected cases. Genomic DNA was extracted from one male worm of each species in every treated individual and one filariform larva isolated from each case. PCR amplification of ITS2-rDNA region was performed and the products were sequenced. Among 1508 individuals, 46 (3.05%) were found infected with Trichostrongylus species using parasitological methods. Male worms of T. colubriformis, T. vitrinus and T. longispicularis were expelled from five patients after treatment. Out of 41 filariform larvae, 40 were T. colubriformis, and the other one was T. axei. Phylogenetic analysis showed that each species was placed together with reference sequences submitted to GenBank database. Intra-species similarity for all species obtained in the current study was 100%. T. colubriformis was found to be probably the most common species in this region of Iran. For the first time, the authors of the present study report the occurrence of natural human infection by T. longispicularis in the world. Therefore, the number of Trichostrongylus species infecting human in Iran now increased to ten. Copyright © 2017. Published by Elsevier B.V.
Phylogenetic relatedness determined between antibiotic resistance and 16S rRNA genes in actinobacteria.

PubMed

Sagova-Mareckova, Marketa; Ulanova, Dana; Sanderova, Petra; Omelka, Marek; Kamenik, Zdenek; Olsovska, Jana; Kopecky, Jan

2015-04-01

Distribution and evolutionary history of resistance genes in environmental actinobacteria provide information on intensity of antibiosis and evolution of specific secondary metabolic pathways at a given site. To this day, actinobacteria producing biologically active compounds were isolated mostly from soil but only a limited range of soil environments were commonly sampled. Consequently, soil remains an unexplored environment in search for novel producers and related evolutionary questions. Ninety actinobacteria strains isolated at contrasting soil sites were characterized phylogenetically by 16S rRNA gene, for presence of erm and ABC transporter resistance genes and antibiotic production. An analogous analysis was performed in silico with 246 and 31 strains from Integrated Microbial Genomes (JGI_IMG) database selected by the presence of ABC transporter genes and erm genes, respectively. In the isolates, distances of erm gene sequences were significantly correlated to phylogenetic distances based on 16S rRNA genes, while ABC transporter gene distances were not. The phylogenetic distance of isolates was significantly correlated to soil pH and organic matter content of isolation sites. In the analysis of JGI_IMG datasets the correlation between phylogeny of resistance genes and the strain phylogeny based on 16S rRNA genes or five housekeeping genes was observed for both the erm genes and ABC transporter genes in both actinobacteria and streptomycetes. However, in the analysis of sequences from genomes where both resistance genes occurred together the correlation was observed for both ABC transporter and erm genes in actinobacteria but in streptomycetes only in the erm gene. The type of erm resistance gene sequences was influenced by linkage to 16S rRNA gene sequences and site characteristics. The phylogeny of ABC transporter gene was correlated to 16S rRNA genes mainly above the genus level. The results support the concept of new specific secondary metabolite scaffolds occurring more likely in taxonomically distant producers but suggest that the antibiotic selection of gene pools is also influenced by site conditions.
Methods for determining the genetic affinity of microorganisms and viruses

NASA Technical Reports Server (NTRS)

Fox, George E. (Inventor); Willson, III, Richard C. (Inventor); Zhang, Zhengdong (Inventor)

2012-01-01

Selecting which sub-sequences in a database of nucleic acid such as 16S rRNA are highly characteristic of particular groupings of bacteria, microorganisms, fungi, etc. on a substantially phylogenetic tree. Also applicable to viruses comprising viral genomic RNA or DNA. A catalogue of highly characteristic sequences identified by this method is assembled to establish the genetic identity of an unknown organism. The characteristic sequences are used to design nucleic acid hybridization probes that include the characteristic sequence or its complement, or are derived from one or more characteristic sequences. A plurality of these characteristic sequences is used in hybridization to determine the phylogenetic tree position of the organism(s) in a sample. Those target organisms represented in the original sequence database and sufficient characteristic sequences can identify to the species or subspecies level. Oligonucleotide arrays of many probes are especially preferred. A hybridization signal can comprise fluorescence, chemiluminescence, or isotopic labeling, etc.; or sequences in a sample can be detected by direct means, e.g. mass spectrometry. The method's characteristic sequences can also be used to design specific PCR primers. The method uniquely identifies the phylogenetic affinity of an unknown organism without requiring prior knowledge of what is present in the sample. Even if the organism has not been previously encountered, the method still provides useful information about which phylogenetic tree bifurcation nodes encompass the organism.
Genetic characterization of Echinostoma revolutum and Echinoparyphium recurvatum (Trematoda: Echinostomatidae) in Thailand and phylogenetic relationships with other isolates inferred by ITS1 sequence.

PubMed

Saijuntha, Weerachai; Tantrawatpan, Chairat; Sithithaworn, Paiboon; Andrews, Ross H; Petney, Trevor N

2011-03-01

Echinostomatidae are common, widely distributed intestinal parasites causing significant disease in both animals and humans worldwide. In spite of their importance, the taxonomy of these echinostomes is still controversial. The taxonomic status of two species, Echinostoma revolutum and Echinoparyphium recurvatum, which commonly infect poultry and other birds, as well as human, is problematical. Previous phylogenetic analyses of Southeast Asian strains indicate that these species cluster as sister taxa. In the present study, the first internal transcribed spacer (ITS1) sequence was used for genetic characterization and to examine the phylogenetic relationships between an isolate from Thailand with other isolates available from GenBank database. Interspecies differences in ITS1 sequence between E. revolutum and E. recurvatum were detected at 6 (3%) of the 203 alignment positions. Of these, nucleotide deletion at positions 25, 26, and 27, pyrimidine transition at 50, 189, and pyrimidine transversion at 118 were observed. Phylogenetic analysis revealed that E. recurvatum from Thailand clustered as a sister taxa with E. revolutum and not with other members of the genus Echinoparyphium. Interestingly, this result confirms a previous report based on allozyme electrophoresis and mitochondrial DNA that E. revolutum and E. recurvatum in Southeast Asia are sister species. Hence, the taxonomic status of E. recurvatum in Thailand, as well as in Southeast Asian countries needs to be confirmed and revised using more comprehensive analyses based on morphology and other molecular techniques.
Phylogeography of the tropical planktonic foraminifera lineage globigerinella reveals isolation inconsistent with passive dispersal by ocean currents.

PubMed

Weiner, Agnes K M; Weinkauf, Manuel F G; Kurasawa, Atsushi; Darling, Kate F; Kucera, Michal; Grimm, Guido W

2014-01-01

Morphologically defined species of marine plankton often harbor a considerable level of cryptic diversity. Since many morphospecies show cosmopolitan distribution, an understanding of biogeographic and evolutionary processes at the level of genetic diversity requires global sampling. We use a database of 387 single-specimen sequences of the SSU rDNA of the planktonic foraminifera Globigerinella as a model to assess the biogeographic and phylogenetic distributions of cryptic diversity in marine microplankton on a global scale. Our data confirm the existence of multiple, well isolated genetic lineages. An analysis of their abundance and distribution indicates that our sampling is likely to approximate the actual total diversity. Unexpectedly, we observe an uneven allocation of cryptic diversity among the phylogenetic lineages. We show that this pattern is neither an artifact of sampling intensity nor a function of lineage age. Instead, we argue that it reflects an ongoing speciation process in one of the three major lineages. Surprisingly, four of the six genetic types in the hyperdiverse lineage are biogeographically restricted to the Indopacific. Their mutual co-occurrence and their hierarchical phylogenetic structure provide no evidence for an origin through sudden habitat fragmentation and their limitation to the Indopacific challenges the view of a global gene flow within the warm-water provinces. This phenomenon shows that passive dispersal is not sufficient to describe the distribution of plankton diversity. Rather, these organisms show differentiated distribution patterns shaped by species interactions and reflecting phylogenetic contingency with unique histories of diversification rates.
DNA barcoding insect–host plant associations

PubMed Central

Jurado-Rivera, José A.; Vogler, Alfried P.; Reid, Chris A.M.; Petitpierre, Eduard; Gómez-Zurita, Jesús

2008-01-01

Short-sequence fragments (‘DNA barcodes’) used widely for plant identification and inventorying remain to be applied to complex biological problems. Host–herbivore interactions are fundamental to coevolutionary relationships of a large proportion of species on the Earth, but their study is frequently hampered by limited or unreliable host records. Here we demonstrate that DNA barcodes can greatly improve this situation as they (i) provide a secure identification of host plant species and (ii) establish the authenticity of the trophic association. Host plants of leaf beetles (subfamily Chrysomelinae) from Australia were identified using the chloroplast trnL(UAA) intron as barcode amplified from beetle DNA extracts. Sequence similarity and phylogenetic analyses provided precise identifications of each host species at tribal, generic and specific levels, depending on the available database coverage in various plant lineages. The 76 species of Chrysomelinae included—more than 10 per cent of the known Australian fauna—feed on 13 plant families, with preference for Australian radiations of Myrtaceae (eucalypts) and Fabaceae (acacias). Phylogenetic analysis of beetles shows general conservation of host association but with rare host shifts between distant plant lineages, including a few cases where barcodes supported two phylogenetically distant host plants. The study demonstrates that plant barcoding is already feasible with the current publicly available data. By sequencing plant barcodes directly from DNA extractions made from herbivorous beetles, strong physical evidence for the host association is provided. Thus, molecular identification using short DNA fragments brings together the detection of species and the analysis of their interactions. PMID:19004756
Biodiversity and phylogenetic analysis of culturable bacteria indigenous to Khewra salt mine of pakistan and their industrial importance

PubMed Central

Akhtar, Nasrin; Ghauri, Muhammad A.; Iqbal, Aamira; Anwar, Munir A.; Akhtar, Kalsoom

2008-01-01

Culturable bacterial biodiversity and industrial importance of the isolates indigenous to Khewra salt mine, Pakistan was assessed. PCR Amplification of 16S rDNA of isolates was carried out by using universal primers FD1 and rP1and products were sequenced commercially. These gene sequences were compared with other gene sequences in the GenBank databases to find the closely related sequences. The alignment of these sequences with sequences available from GenBank database was carried out to construct a phylogenetic tree for these bacteria. These genes were deposited to GenBank and accession numbers were obtained. Most of the isolates belonged to different species of genus Bacillus, sharing 92-99% 16S rDNA identity with the respective type strain. Other isolates had close similarities with Escherichia coli, Staphylococcus arlettae and Staphylococcus gallinarum with 97%, 98% and 99% 16S rDNA similarity respectively. The abilities of isolates to produce industrial enzymes (amylase, carboxymethylcellulase, xylanase, cellulase and protease) were checked. All isolates were tested against starch, carboxymethylcellulose (CMC), xylane, cellulose, and casein degradation in plate assays. BPT-5, 11,18,19 and 25 indicated the production of copious amounts of carbohydrates and protein degrading enzymes. Based on this study it can be concluded that Khewra salt mine is populated with diverse bacterial groups, which are potential source of industrial enzymes for commercial applications. PMID:24031194
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

PubMed

Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

2017-05-10

Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
The phylogenetic analysis of tetraspanins projects the evolution of cell-cell interactions from unicellular to multicellular organisms.

PubMed

Huang, Shengfeng; Yuan, Shaochun; Dong, Meiling; Su, Jing; Yu, Cuiling; Shen, Yang; Xie, Xiaojin; Yu, Yanhong; Yu, Xuesong; Chen, Shangwu; Zhang, Shicui; Pontarotti, Pierre; Xu, Anlong

2005-12-01

In animals, the tetraspanins are a large superfamily of membrane proteins that play important roles in organizing various cell-cell and matrix-cell interactions and signal pathways based on such interactions. However, their origin and evolution largely remain elusive and most of the family's members are functionally unknown or less known due to difficulties of study, such as functional redundancy. In this study, we rebuilt the family's phylogeny with sequences retrieved from online databases and our cDNA library of amphioxus. We reveal that, in addition to in metazoans, various tetraspanins are extensively expressed in protozoan amoebae, fungi, and plants. We also discuss the structural evolution of tetraspanin's major extracellular domain and the relation between tetraspanin's duplication and functional redundancy. Finally, we elucidate the coevolution of tetraspanins and eukaryotes and suggest that tetraspanins play important roles in the unicell-to-multicell transition. In short, the study of tetraspanin in a phylogenetic context helps us understand the evolution of intercellular interactions.
BEASTling: A software tool for linguistic phylogenetics using BEAST 2

PubMed Central

Forkel, Robert; Kaiping, Gereon A.; Atkinson, Quentin D.

2017-01-01

We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts. PMID:28796784
BEASTling: A software tool for linguistic phylogenetics using BEAST 2.

PubMed

Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D

2017-01-01

We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gu, Shengyin; Anderson, Iain; Kunin, Victor

2007-05-07

Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

PubMed

Kassian, Alexei

2015-01-01

A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.
Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

PubMed Central

Kassian, Alexei

2015-01-01

A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456
The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies.

PubMed

Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N

2016-01-01

For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
Genetic Comparison of B. Anthracis and its Close Relatives Using AFLP and PCR Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jackson, P.J.; Hill, K.K.; Laker, M.T.

1999-02-01

Amplified Fragment length Polymorphism (AFLP) analysis allows a rapid, relatively simple analysis of a large portion of a microbial genome, providing information about the species and its phylogenetic relationship to other microbes (Vos, et al., 1995). The method simply surveys the genome for length and sequence polymorphisms. The pattern identified can be used for comparison to the genomes of other species. Unlike other methods, it does not rely on analysis of a single genetic locus that may bias the interpretation of results and it does not require any prior knowledge of the targeted organism. Moreover, a standard set of reagentsmore » can be applied to any species without using species-specific information or molecular probes. The authors are using AFLP's to rapidly identify different bacterial species. A comparison of AFLP profiles generated from a large battery of B. anthracis strains shows very little variability among different isolates (Keim, et al., 1997). By contrast, there is a significant difference between AFLP profiles generated for any B. anthracis strain and even the most closely related Bacillus species. Sufficient variability is apparent among all known microbial species to allow phylogenetic analysis based on large numbers of genetically unlinked loci. These striking differences among AFLP profiles allow unambiguous identification of previously identified species and phylogenetic placement of newly characterized isolates relative to known species based on a large number of independent genetic loci. Data generated thus far show that the method provides phylogenetic analyses that are consistent with other widely accepted phylogenetic methods. However, AFLP analysis provides a more detailed analysis of the targets and samples a much larger portion of the genome. Consequently, it provides an inexpensive, rapid means of characterizing microbial isolates to further differentiate among strains and closely related microbial species. Such information cannot be rapidly generated by other means. AFLP sample analysis quickly generates a very large amount of molecular information about microbial genomes. However, this information cannot be analyzed rapidly using manual methods. The authors are developing a large archive of electronic AFLP signatures that is being used to identify isolates collected from medical, veterinary, forensic and environmental samples. They are also developing the computational packages necessary to rapidly and unambiguously analyze the AFLP profiles and conduct a phylogenetic comparison of these data relative to information already in the database. They will use this archive and the associated algorithms to determine the species identity of previously uncharacterized isolates and place them phylogenetically relative to other microbes based on their AFLP signatures. This study provides significant new information about microbes with environmental, veterinary and medical significance. This information can be used in further studies to understand the relationships among these species and the factors that distinguish them from one another. It should also allow identification of unique factors that contribute to important microbial traits including pathogenicity and virulence. They are also using AFLP data to identify, isolate and sequence DNA fragments that are unique to particular microbial species and strains. The fragment patterns and sequence information provide insights into the complexity and organization of bacterial genomes relative to one another. They also provide the information necessary for development of species-specific PCR primers that can be used to interrogate complex samples for the presence of B. anthracis, other microbial pathogens or their remnants.« less

PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

PubMed

Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

2003-01-01

The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
Molecular Signatures of Microbial Metabolism in an Actively Growing, Silicified, Microbial Structure from Yellowstone National Park

NASA Astrophysics Data System (ADS)

Ferreira, M.; Creveling, J.; Hilburn, I.; Karlsson, E.; Pepe-Ranney, C.; Spear, J.; Dawson, S.; Geobio2008, I.

2008-12-01

Silicified structures that exhibit a putative biologic component in their formation permeate the rock record as stromatolites. We have studied a silicified microbial structure from a hot spring in Yellowstone National Park using phenotypic, phylogenetic, and metagenomic analyses to determine microbial carbon metabolic pathways and the phylogenetic affiliations of microbes present in this unique structure. In this multi-faceted approach, dominant physiologies, specifically with regards to anaerobic and aerobic metabolisms, were inferred from 16S rRNA gene sequences and 454 sequencing data from bulk DNA samples of the structure. Carbon utilization as indicated by ECO Biolog plates showed abundant heterotrophy and heterotrophic diversity throughout the microbial structure. Microbes within the structure are able to utilize all tested sources of carbohydrates, lipids/fatty acids, and protein/amino acids as carbon sources. ECO plate testing of the hot spring water yielded considerable less carbohydrate consumption (only 4 out of 13 tested carbohydrates) and similar lipids/fatty acids and protein/amino acids consumption (2 out of 3 and 5 out of 5 tested sources respectively). Full length 16S rRNA gene sequences and metagenomic 454 pyrosequencing of community DNA showed limited diversity among primary producers. From the 16S data, the majority of the autotrophs are inferred to utilize the Calvin cycle for CO2 fixation, followed by 3-hydroxypropionate/4- hydroxybutyrate CO2 fixation. However, an analysis of the metagenomic data compared to the KEGG database does not show genes directly involved with Calvin cycle carbon fixation. Further BLAST searches of our data failed to find significant matches within our 6514 metagenomic sequences to known RuBisCo sequences taken from the NCBI database. This is likely due to a far under-sampled dataset of metagenomic sequences, and the low number (958) that had matches to the KEGG pathways database. Anaerobic versus aerobic physiology also can be estimated from the 16S clone libraries. Phylogenetic analysis of recovered 16S sequences suggests that 15% of the 16S sequences can be attributed to anaerobic microbes while 42% likely come from aerobes. The remaining 43% of 16S rRNA gene sequences belong to metabolically unassigned phyla both known and novel. This preliminary study demonstrates that the small spatially stratified silicified microbial structure present on the margins of a hot spring contains a rich and complex microbial community with different trophic levels and enzymatic pathways.
MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

PubMed Central

Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

2011-01-01

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353
Ortholog-based screening and identification of genes related to intracellular survival.

PubMed

Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

2018-04-20

Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.
Microbial diversity in ikaite tufa columns: an alkaline, cold ecological niche in Greenland.

PubMed

Stougaard, Peter; Jørgensen, Flemming; Johnsen, Mads G; Hansen, Ole C

2002-08-01

Ikaite tufa columns from the Ikka Fjord in south-western Greenland constitute a natural, stable environment at low temperature and with a pH ranging from neutral at the exterior to very alkaline (pH 10.4) at the interior of the column. Phylogenetic analysis of culturable organisms revealed ten different isolates representing three of the major bacterial divisions. Nine of the isolates showed 94-99% similarity to known sequences, whereas one isolate displayed a low degree of similarity (less than 90%) to a Cyclobacterium species. Seven of the isolates were shown to be cold active alkaliphiles, whereas three isolates showed optimal growth at neutral pH. Phylogenetic analysis of DNA isolated directly from the ikaite material demonstrated the presence of a microbial flora more diverse than the culturable isolates. Whereas approximately half of the phylotypes showed 90-99% similarity to known meso- or thermophilic alkaliphiles, the rest of the sequences displayed less than 90% similarity when compared to known 16S rRNA gene sequences in databases. Thus, in the present paper, we demonstrate that ikaite columns that host a specialized macroscopic flora and fauna also contain a unique, cold active, alkaliphilic microflora.
Synthesis of phylogeny and taxonomy into a comprehensive tree of life

PubMed Central

Hinchliff, Cody E.; Smith, Stephen A.; Allman, James F.; Burleigh, J. Gordon; Chaudhary, Ruchi; Coghill, Lyndon M.; Crandall, Keith A.; Deng, Jiabin; Drew, Bryan T.; Gazis, Romina; Gude, Karl; Hibbett, David S.; Katz, Laura A.; Laughinghouse, H. Dail; McTavish, Emily Jane; Midford, Peter E.; Owen, Christopher L.; Ree, Richard H.; Rees, Jonathan A.; Soltis, Douglas E.; Williams, Tiffani; Cranston, Karen A.

2015-01-01

Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips—the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics. PMID:26385966
Synthesis of phylogeny and taxonomy into a comprehensive tree of life.

PubMed

Hinchliff, Cody E; Smith, Stephen A; Allman, James F; Burleigh, J Gordon; Chaudhary, Ruchi; Coghill, Lyndon M; Crandall, Keith A; Deng, Jiabin; Drew, Bryan T; Gazis, Romina; Gude, Karl; Hibbett, David S; Katz, Laura A; Laughinghouse, H Dail; McTavish, Emily Jane; Midford, Peter E; Owen, Christopher L; Ree, Richard H; Rees, Jonathan A; Soltis, Douglas E; Williams, Tiffani; Cranston, Karen A

2015-10-13

Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk

2014-10-09

To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less
Genome-wide identification, phylogenetic analysis, and expression profiles of ATP-binding cassette transporter genes in the oriental fruit fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae).

PubMed

Xiao, Lin-Fan; Zhang, Wei; Jing, Tian-Xing; Zhang, Meng-Yi; Miao, Ze-Qing; Wei, Dan-Dan; Yuan, Guo-Rui; Wang, Jin-Jun

2018-03-01

The ATP-binding cassette (ABC) is the largest transporter gene family and the genes play key roles in xenobiotic resistance, metabolism, and development of all phyla. However, the specific functions of ABC gene families in insects is unclear. We report a genome-wide identification, phylogenetic, and transcriptional analysis of the ABC genes in the oriental fruit fly, Bactrocera dorsalis (Hendel). We identified a total of 47 ABC genes (BdABCs) from the transcriptomic and genomic databases of B. dorsalis and classified these genes into eight subfamilies (A-H), including 7 ABCAs, 7 ABCBs, 9 ABCCs, 2 ABCDs, 1 ABCE, 3 ABCFs, 15 ABCGs, and 3 ABCHs. Comparative phylogenetic analysis of the ABCs suggests an orthologous relationship between B. dorsalis and other insect species in which these genes have been related to pesticide resistance and essential biological processes. Comparison of transcriptome and relative expression patterns of BdABCs indicated diverse multifunctions within different B. dorsalis tissues. The expression of 4, 10, and 14 BdABCs from 18 BdABCs was significantly upregulated after exposure to LD 50 s of malathion, avermectin, and beta-cypermethrin, respectively. The maximum expression level of most BdABCs (including BdABCFs, BdABCGs, and BdABCHs) occurred at 48h post exposures, whereas BdABCEs peaked at 24h after treatment. Furthermore, RNA interference-mediated suppression of BdABCB7 resulted in increased toxicity of malathion against B. dorsalis. These data suggest that ABC transporter genes might play key roles in xenobiotic metabolism and biosynthesis in B. dorsalis. Copyright © 2017 Elsevier Inc. All rights reserved.
Molecular Phylogenetic Analysis of Archaeal Intron-Containing Genes Coding for rRNA Obtained from a Deep-Subsurface Geothermal Water Pool

PubMed Central

Takai, Ken; Horikoshi, Koki

1999-01-01

Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021
ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation

PubMed Central

Kristensen, David M.; Wolf, Yuri I.; Koonin, Eugene V.

2017-01-01

The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of ‘index’ orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. PMID:28053163
An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

PubMed Central

Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

2004-01-01

Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051
Phylogenetic Diversity and Genotypical Complexity of H9N2 Influenza A Viruses Revealed by Genomic Sequence Analysis

PubMed Central

Dong, Guoying; Luo, Jing; Zhang, Hong; Wang, Chengmin; Duan, Mingxing; Deliberto, Thomas Jude; Nolte, Dale Louis; Ji, Guangju; He, Hongxuan

2011-01-01

H9N2 influenza A viruses have become established worldwide in terrestrial poultry and wild birds, and are occasionally transmitted to mammals including humans and pigs. To comprehensively elucidate the genetic and evolutionary characteristics of H9N2 influenza viruses, we performed a large-scale sequence analysis of 571 viral genomes from the NCBI Influenza Virus Resource Database, representing the spectrum of H9N2 influenza viruses isolated from 1966 to 2009. Our study provides a panoramic framework for better understanding the genesis and evolution of H9N2 influenza viruses, and for describing the history of H9N2 viruses circulating in diverse hosts. Panorama phylogenetic analysis of the eight viral gene segments revealed the complexity and diversity of H9N2 influenza viruses. The 571 H9N2 viral genomes were classified into 74 separate lineages, which had marked host and geographical differences in phylogeny. Panorama genotypical analysis also revealed that H9N2 viruses include at least 98 genotypes, which were further divided according to their HA lineages into seven series (A–G). Phylogenetic analysis of the internal genes showed that H9N2 viruses are closely related to H3, H4, H5, H7, H10, and H14 subtype influenza viruses. Our results indicate that H9N2 viruses have undergone extensive reassortments to generate multiple reassortants and genotypes, suggesting that the continued circulation of multiple genotypical H9N2 viruses throughout the world in diverse hosts has the potential to cause future influenza outbreaks in poultry and epidemics in humans. We propose a nomenclature system for identifying and unifying all lineages and genotypes of H9N2 influenza viruses in order to facilitate international communication on the evolution, ecology and epidemiology of H9N2 influenza viruses. PMID:21386964
Metagenomics and the protein universe

PubMed Central

Godzik, Adam

2011-01-01

Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084
Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

PubMed Central

Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

2012-01-01

Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086
Open Reading Frame Phylogenetic Analysis on the Cloud

PubMed Central

2013-01-01

Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843
Acquisition of HIV by African-Born Residents of Victoria, Australia: Insights from Molecular Epidemiology

PubMed Central

Lemoh, Chris; Ryan, Claire E.; Sekawi, Zamberi; Hearps, Anna C.; Aleksic, Eman; Chibo, Doris; Grierson, Jeffrey; Baho, Samia; Street, Alan; Hellard, Margaret; Biggs, Beverley-Ann; Crowe, Suzanne M.

2013-01-01

African-born Australians are a recognised “priority population” in Australia's Sixth National HIV/AIDS Strategy. We compared exposure location and route for African-born people living with HIV (PLHIV) in Victoria, Australia, with HIV-1 pol subtype from drug resistance assays and geographical origin suggested by phylogenetic analysis of env gene. Twenty adult HIV positive African-born Victorian residents were recruited via treating doctors. HIV exposure details were obtained from interviews and case notes. Viral RNA was extracted from participant stored plasma or whole blood. The env V3 region was sequenced and compared to globally representative reference HIV-1 sequences in the Los Alamos National Library HIV Database. Twelve participants reported exposure via heterosexual sex and two via iatrogenic blood exposures; four were men having sex with men (MSM); two were exposed via unknown routes. Eight participants reported exposure in their countries of birth, seven in Australia, three in other countries and two in unknown locations. Genotype results (pol) were available for ten participants. HIV env amplification was successful in eighteen cases. HIV-1 subtype was identified in all participants: eight both pol and env; ten env alone and two pol alone. Twelve were subtype C, four subtype B, three subtype A and one subtype CRF02_AG. Reported exposure location was consistent with the phylogenetic clustering of env sequences. African Australians are members of multiple transnational social and sexual networks influencing their exposure to HIV. Phylogenetic analysis may complement traditional surveillance to discern patterns of HIV exposure, providing focus for HIV prevention programs in mobile populations. PMID:24391866
A Bacillus anthracis Genome Sequence from the Sverdlovsk 1979 Autopsy Specimens

PubMed Central

Sahl, Jason W.; Pearson, Talima; Okinaka, Richard; Schupp, James M.; Gillece, John D.; Heaton, Hannah; Birdsell, Dawn; Hepp, Crystal; Fofanov, Viacheslav; Noseda, Ramón; Fasanella, Antonio; Hoffmaster, Alex; Wagner, David M.

2016-01-01

ABSTRACT Anthrax is a zoonotic disease that occurs naturally in wild and domestic animals but has been used by both state-sponsored programs and terrorists as a biological weapon. A Soviet industrial production facility in Sverdlovsk, USSR, proved deficient in 1979 when a plume of spores was accidentally released and resulted in one of the largest known human anthrax outbreaks. In order to understand this outbreak and others, we generated a Bacillus anthracis population genetic database based upon whole-genome analysis to identify all single-nucleotide polymorphisms (SNPs) across a reference genome. Phylogenetic analysis has defined three major clades (A, B, and C), B and C being relatively rare compared to A. The A clade has numerous subclades, including a major polytomy named the trans-Eurasian (TEA) group. The TEA radiation is a dominant evolutionary feature of B. anthracis, with many contemporary populations having resulted from a large spatial dispersal of spores from a single source. Two autopsy specimens from the Sverdlovsk outbreak were deep sequenced to produce draft B. anthracis genomes. This allowed the phylogenetic placement of the Sverdlovsk strain into a clade with two Asian live vaccine strains, including the Russian Tsiankovskii strain. The genome was examined for evidence of drug resistance manipulation or other genetic engineering, but none was found. The Soviet Sverdlovsk strain genome is consistent with a wild-type strain from Russia that had no evidence of genetic manipulation during its industrial production. This work provides insights into the world’s largest biological weapons program and provides an extensive B. anthracis phylogenetic reference. PMID:27677796
Dasytricha dominance in Surti buffalo rumen revealed by 18S rRNA sequences and real-time PCR assay.

PubMed

Singh, K M; Tripathi, A K; Pandya, P R; Rank, D N; Kothari, R K; Joshi, C G

2011-09-01

The genetic diversity of protozoa in Surti buffalo rumen was studied by amplified ribosomal DNA restriction analysis, 18S rDNA sequence homology and phylogenetic and Real-time PCR analysis methods. Three animals were fed diet comprised green fodder Napier bajra 21 (Pennisetum purpureum), mature pasture grass (Dicanthium annulatum) and concentrate mixture (20% crude protein, 65% total digestible nutrients). A protozoa-specific primer (P-SSU-342f) and a eukarya-specific primer (Medlin B) were used to amplify a 1,360 bp fragment of DNA encoding protozoal small subunit (SSU) ribosomal RNA from rumen fluid. A total of 91 clones were examined and identified 14 different 18S RNA sequences based on PCR-RFLP pattern. These 14 phylotypes were distributed into four genera-based 18S rDNA database sequences and identified as Dasytricha (57 clones), Isotricha (14 clones), Ostracodinium (11 clones) and Polyplastron (9 clones). Phylogenetic analyses were also used to infer the makeup of protozoa communities in the rumen of Surti buffalo. Out of 14 sequences, 8 sequences (69 clones) clustered with the Dasytricha ruminantium-like clone and 4 sequences (13 clones) were also phylogenetically placed with the Isotricha prostoma-like clone. Moreover, 2 phylotypes (9 clones) were related to Polyplastron multivesiculatum-like clone. In addition, the number of 18S rDNA gene copies of Dasytricha ruminantium (0.05% to ciliate protozoa) was higher than Entodinium sp. (2.0 × 10(5) vs. 1.3 × 10(4)) in per ml ruminal fluid.
Genome Sequences and Phylogenetic Analysis of K88- and F18-Positive Porcine Enterotoxigenic Escherichia coli

PubMed Central

Shepard, Sara M.; Danzeisen, Jessica L.; Isaacson, Richard E.; Seemann, Torsten; Achtman, Mark

2012-01-01

Porcine enterotoxigenic Escherichia coli (ETEC) continues to result in major morbidity and mortality in the swine industry via postweaning diarrhea. The key virulence factors of ETEC strains, their serotypes, and their fimbrial components have been well studied. However, most studies to date have focused on plasmid-encoded traits related to colonization and toxin production, and the chromosomal backgrounds of these strains have been largely understudied. Here, we generated the genomic sequences of K88-positive and F18-positive porcine ETEC strains and examined the phylogenetic distribution of clinical porcine ETEC strains and their plasmid-associated genetic content. The genomes of porcine ETEC strains UMNK88 and UMNF18 were both found to contain remarkable plasmid complements containing known virulence factors, potential novel virulence factors, and antimicrobial resistance-associated elements. The chromosomes of these strains also possessed several unique genomic islands containing hypothetical genes with similarity to classical virulence factors, although phage-associated genomic islands dominated the accessory genomes of these strains. Phylogenetic analysis of 78 clinical isolates associated with neonatal and porcine diarrhea revealed that a limited subset of porcine ETEC lineages exist that generally contain common toxin and fimbrial profiles, with many of the isolates belonging to the ST10, ST23, and ST169 multilocus sequencing types. These lineages were generally distinct from existing human ETEC database isolates. Overall, most porcine ETEC strains appear to have emerged from a limited subset of E. coli lineages that either have an increased propensity to carry plasmid-encoded virulence factors or have the appropriate ETEC core genome required for virulence. PMID:22081385

Phylogenetic radiation of the greenbottle flies (Diptera, Calliphoridae, Luciliinae)

PubMed Central

Williams, Kirstin A.; Lamb, Jennifer; Villet, Martin H.

2016-01-01

Abstract The subfamily Luciliinae is diverse and geographically widespread. Its four currently recognised genera (Dyscritomyia Grimshaw, 1901, Hemipyrellia Townsend, 1918, Hypopygiopsis Townsend 1916 and Lucilia Robineau-Desvoidy, 1830) contain species that range from saprophages to obligate parasites, but their pattern of phylogenetic diversification is unclear. The 28S rRNA, COI and Period genes of 14 species of Lucilia and Hemipyrellia were partially sequenced and analysed together with sequences of 11 further species from public databases. The molecular data confirmed molecular paraphyly in three species-pairs in Lucilia that hamper barcode identifications of those six species. Lucilia sericata and Lucilia cuprina were confirmed as mutual sister species. The placements of Dyscritomyia and Hypopygiopsis were ambiguous, since both made Lucilia paraphyletic in some analyses. Recognising Hemipyrellia as a genus consistently left Lucilia s.l. paraphyletic, and the occasionally-recognised (sub)genus Phaenicia was consistently paraphyletic, so these taxa should be synonymised with Lucilia to maintain monophyly. Analysis of a matrix of 14 morphological characters scored for adults of all genera and for most of the species included in the molecular analysis confirmed several of these findings. The different degrees of parasitism were phylogenetically clustered within this genus but did not form a graded series of evolutionary stages, and there was no particular relationship between feeding habits and biogeography. Because of the ubiquity of hybridization, introgression and incomplete lineage sorting in blow flies, we recommend that using a combination of mitochondrial and nuclear markers should be a procedural standard for medico-criminal forensic identifications of insects. PMID:27103874
Saltatory Evolution of the Ectodermal Neural Cortex Gene Family at the Vertebrate Origin

PubMed Central

Feiner, Nathalie; Murakami, Yasunori; Breithut, Lisa; Mazan, Sylvie; Meyer, Axel; Kuraku, Shigehiro

2013-01-01

The ectodermal neural cortex (ENC) gene family, whose members are implicated in neurogenesis, is part of the kelch repeat superfamily. To date, ENC genes have been identified only in osteichthyans, although other kelch repeat-containing genes are prevalent throughout bilaterians. The lack of elaborate molecular phylogenetic analysis with exhaustive taxon sampling has obscured the possible link of the establishment of this gene family with vertebrate novelties. In this study, we identified ENC homologs in diverse vertebrates by means of database mining and polymerase chain reaction screens. Our analysis revealed that the ENC3 ortholog was lost in the basal eutherian lineage through single-gene deletion and that the triplication between ENC1, -2, and -3 occurred early in vertebrate evolution. Including our original data on the catshark and the zebrafish, our comparison revealed high conservation of the pleiotropic expression pattern of ENC1 and shuffling of expression domains between ENC1, -2, and -3. Compared with many other gene families including developmental key regulators, the ENC gene family is unique in that conventional molecular phylogenetic inference could identify no obvious invertebrate ortholog. This suggests a composite nature of the vertebrate-specific gene repertoire, consisting not only of de novo genes introduced at the vertebrate origin but also of long-standing genes with no apparent invertebrate orthologs. Some of the latter, including the ENC gene family, may be too rapidly evolving to provide sufficient phylogenetic signals marking orthology to their invertebrate counterparts. Such gene families that experienced saltatory evolution likely remain to be explored and might also have contributed to phenotypic evolution of vertebrates. PMID:23843192
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).

PubMed

Natale, D A; Shankavaram, U T; Galperin, M Y; Wolf, Y I; Aravind, L; Koonin, E V

2000-01-01

Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs)

PubMed Central

Natale, Darren A; Shankavaram, Uma T; Galperin, Michael Y; Wolf, Yuri I; Aravind, L; Koonin, Eugene V

2000-01-01

Background: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. Results: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Conclusions: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange. PMID:11178258
Host range, host ecology, and distribution of more than 11800 fish parasite species

USGS Publications Warehouse

Strona, Giovanni; Palomares, Maria Lourdes D.; Bailly, Nicholas; Galli, Paolo; Lafferty, Kevin D.

2013-01-01

Our data set includes 38 008 fish parasite records (for Acanthocephala, Cestoda, Monogenea, Nematoda, Trematoda) compiled from the scientific literature, Internet databases, and museum collections paired to the corresponding host ecological, biogeographical, and phylogenetic traits (maximum length, growth rate, life span, age at maturity, trophic level, habitat preference, geographical range size, taxonomy). The data focus on host features, because specific parasite traits are not consistently available across records. For this reason, the data set is intended as a flexible resource able to extend the principles of ecological niche modeling to the host–parasite system, providing researchers with the data to model parasite niches based on their distribution in host species and the associated host features. In this sense, the database offers a framework for testing general ecological, biogeographical, and phylogenetic hypotheses based on the identification of hosts as parasite habitat. Potential applications of the data set are, for example, the investigation of species–area relationships or the taxonomic distribution of host-specificity. The provided host–parasite list is that currently used by Fish Parasite Ecology Software Tool (FishPEST, http://purl.oclc.org/fishpest), which is a website that allows researchers to model several aspects of the relationships between fish parasites and their hosts. The database is intended for researchers who wish to have more freedom to analyze the database than currently possible with FishPEST. However, for readers who have not seen FishPEST, we recommend using this as a starting point for interacting with the database.
Sex Determination, Sex Chromosomes, and Karyotype Evolution in Insects.

PubMed

Blackmon, Heath; Ross, Laura; Bachtrog, Doris

2017-01-01

Insects harbor a tremendous diversity of sex determining mechanisms both within and between groups. For example, in some orders such as Hymenoptera, all members are haplodiploid, whereas Diptera contain species with homomorphic as well as male and female heterogametic sex chromosome systems or paternal genome elimination. We have established a large database on karyotypes and sex chromosomes in insects, containing information on over 13000 species covering 29 orders of insects. This database constitutes a unique starting point to report phylogenetic patterns on the distribution of sex determination mechanisms, sex chromosomes, and karyotypes among insects and allows us to test general theories on the evolutionary dynamics of karyotypes, sex chromosomes, and sex determination systems in a comparative framework. Phylogenetic analysis reveals that male heterogamety is the ancestral mode of sex determination in insects, and transitions to female heterogamety are extremely rare. Many insect orders harbor species with complex sex chromosomes, and gains and losses of the sex-limited chromosome are frequent in some groups. Haplodiploidy originated several times within insects, and parthenogenesis is rare but evolves frequently. Providing a single source to electronically access data previously distributed among more than 500 articles and books will not only accelerate analyses of the assembled data, but also provide a unique resource to guide research on which taxa are likely to be informative to address specific questions, for example, for genome sequencing projects or large-scale comparative studies. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Molecular Phylogenetics: Concepts for a Newcomer.

PubMed

Ajawatanawong, Pravech

Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.
Identification of a novel astrovirus in domestic sheep in Hungary.

PubMed

Reuter, Gábor; Pankovics, Péter; Delwart, Eric; Boros, Ákos

2012-02-01

The family Astroviridae consists of two genera, Avastrovirus and Mamastrovirus, whose members are associated with gastroenteritis in avian and mammalian hosts, respectively. We serendipitously identified a novel ovine astrovirus in a fecal specimen from a domestic sheep (Ovis aries) in Hungary by viral metagenomic analysis. Sequencing of the fragment indicated that it was an ORF1b/ORF2/3'UTR sequence, and it has been submitted to the GenBank database as ovine astrovirus type 2 (OAstV-2/Hungary/2009) with accession number JN592482. The unique sequence characteristics and the phylogenetic position of OAstV-2 suggest that genetically divergent lineages of astroviruses exist in sheep.
JCoDA: a tool for detecting evolutionary selection.

PubMed

Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

2010-05-27

The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
JCoDA: a tool for detecting evolutionary selection

PubMed Central

2010-01-01

Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
The complete mitochondrial genome of Sika deer Cervus nippon hortulorum (Artiodactyla: Cervidae) and phylogenetic studies.

PubMed

Liu, Yan-Hua; Liu, Xin-Xin; Zhang, Ming-Hai

2016-07-01

Sika deer (Cervus nippon Temminck 1836) are classified in the order Artiodactyla, family Cervidae, subfamily Cervinae. At present, the phylogenetic studies of C. nippon are problematic. In this study, we first determined and described the complete mitochondrial sequence of the wild C. nippon hortulorum. The complete mitogenome sequence is 16 566 bp in length, including 13 protein-coding genes, two rRNA genes, 22 tRNA genes, a putative control region (CR) and a light-strand replication origin (OL). The overall base composition was 33.4% A, 28.6% T, 24.5% C, 13.5% G, with a 62.0% AT bias. The 13 protein-coding genes encode 3782 amino acids in total. To further validate the new determined sequences and phylogeny of Sika deer, phylogenetic trees involving 15 most closely related species available in GenBank database were constructed. These results are expected to provide useful molecular data for deer species identification and further phylogenetic studies of Artiodactyla.
Metagenomics of prebiotic and probiotic supplemented broilers gastrointestinal tract microbiome

USDA-ARS?s Scientific Manuscript database

Phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt) is a recently developed computational approach for prediction of functional composition of a microbiome comparing marker gene data with a reference genome database. The procedure established significant link ...
Short Communication Phylogenetic Characterization of HIV Type 1 CRF01_AE V3 Envelope Sequences in Pregnant Women in Northern Vietnam

PubMed Central

Caridha, Rozina; Ha, Tran Thi Thanh; Gaseitsiwe, Simani; Hung, Pham Viet; Anh, Nguyen Mai; Bao, Nguyen Huy; Khang, Dinh Duy; Hien, Nguyen Tran; Cam, Phung Dac; Chiodi, Francesca

2012-01-01

Abstract Characterization of HIV-1 strains is important for surveillance of the HIV-1 epidemic. In Vietnam HIV-1-infected pregnant women often fail to receive the care they are entitled to. Here, we analyzed phylogenetically HIV-1 env sequences from 37 HIV-1-infected pregnant women from Ha Noi (n=22) and Hai Phong (n=15), where they delivered in 2005–2007. All carried CRF01_AE in the gp120 V3 region. In 21 women CRF01_AE was also found in the reverse transcriptase gene. We compared their env gp120 V3 sequences phylogenetically in a maximum likelihood tree to those of 198 other CRF01_AE sequences in Vietnam and 229 from neighboring countries, predominantly Thailand, from the HIV-1 database. Altogether 464 sequences were analyzed. All but one of the maternal sequences colocalized with sequences from northern Vietnam. The maternal sequences had evolved the least when compared to sequences collected in Ha Noi in 2002, as shown by analysis of synonymous and nonsynonymous changes, than to other Vietnamese sequences collected earlier and/or elsewhere. Since the HIV-1 epidemic in women in Vietnam may still be underestimated, characterization of HIV-1 in pregnant women is important to observe how HIV-1 has evolved and follow its molecular epidemiology. PMID:21936713
Isolation and phylogenetic characterization of iron-sulfur-oxidizing heterotrophic bacteria indigenous to nickel laterite ores of Sulawesi, Indonesia: Implications for biohydrometallurgy

NASA Astrophysics Data System (ADS)

Chaerun, Siti Khodijah; Hung, Sutina; Mubarok, Mohammad Zaki; Sanwani, Edy

2015-09-01

The main objective of this study was to isolate and phylogenetically identify the indigenous iron-sulfur-oxidizing heterotrophic bacteria capable of bioleaching nickel from laterite mineral ores. The bacteria were isolated from a nickel laterite mine area in South Sulawesi Province, Indonesia. Seven bacterial strains were successfully isolated from laterite mineral ores (strains SKC/S-1 to SKC/S-7) and they were capable of bioleaching of nickel from saprolite and limonite ores. Using EzTaxon-e database, the 16S rRNA gene sequences of the seven bacterial strains were subjected to phylogenetic analysis, resulting in a complete hierarchical classification system, and they were identified as Pseudomonas taiwanensis BCRC 17751 (98.59% similarity), Bacillus subtilis subsp. inaquosorum BGSC 3A28 (99.14% and 99.32% similarities), Paenibacillus pasadenensis SAFN-007 (98.95% and 99.33% similarities), Bacillus methylotrophicus CBMB 205 (99.37% similarity), and Bacillus altitudinis 41KF2b (99.37% similarity). It is noteworthy that members of the phylum Firmicutes (in particular the genus Bacillus) predominated in this study, therefore making them to have the high potential to be candidates for the bioleaching of nickel from laterite mineral ores. To our knowledge, this is the first report on the predominance of the phylum Firmicutes in the Sulawesi laterite mineral ores.
Phylogenetic analyses of the genus Aeromonas based on housekeeping gene sequencing and its influence on systematics.

PubMed

Navarro, Aaron; Martínez-Murcia, Antonio

2018-04-19

The phylogenies derived from housekeeping gene sequence alignments, although mere evolutionary hypotheses, have increased our knowledge about the Aeromonas genetic diversity, providing a robust species delineation framework invaluable for reliable, easy and fast species identification. Previous classifications of Aeromonas, have been fully surpassed by recently developed phylogenetic (natural) classification obtained from the analysis of so-called "molecular chronometers". Despite ribosomal RNAs cannot split all known Aeromonas species, the conserved nature of 16S rRNA offers reliable alignments containing mosaics of sequence signatures which may serve as targets of genus-specific oligonucleotides for subsequent identification/detection tests in samples without culturing. On the contrary, some housekeeping genes coding for proteins show a much better chronometric capacity to discriminate highly related strains. Although both, species and loci, do not all evolve at exactly the same rate, published Aeromonas phylogenies were congruent to each other, indicating that, phylogenetic markers are synchronized and a concatenated multi-gene phylogeny, may be "the mirror" of the entire genomic relationships. Thanks to MLPA approaches, the discovery of new Aeromonas species and strains of rarely isolated species is today more frequent and, consequently, should be extensively promoted for isolate screening and species identification. Although, accumulated data still should be carefully catalogued to inherit a reliable database. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Nucleotide Sequence Database Comparison for Routine Dermatophyte Identification by Internal Transcribed Spacer 2 Genetic Region DNA Barcoding.

PubMed

Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

2018-05-01

Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.
Genetic polymorphisms of pharmacogenomic VIP variants in the Yi population from China.

PubMed

Yan, Mengdan; Li, Dianzhen; Zhao, Guige; Li, Jing; Niu, Fanglin; Li, Bin; Chen, Peng; Jin, Tianbo

2018-03-30

Drug response and target therapeutic dosage are different among individuals. The variability is largely genetically determined. With the development of pharmacogenetics and pharmacogenomics, widespread research have provided us a wealth of information on drug-related genetic polymorphisms, and the very important pharmacogenetic (VIP) variants have been identified for the major populations around the world whereas less is known regarding minorities in China, including the Yi ethnic group. Our research aims to screen the potential genetic variants in Yi population on pharmacogenomics and provide a theoretical basis for future medication guidance. In the present study, 80 VIP variants (selected from the PharmGKB database) were genotyped in 100 unrelated and healthy Yi adults recruited for our research. Through statistical analysis, we made a comparison between the Yi and other 11 populations listed in the HapMap database for significant SNPs detection. Two specific SNPs were subsequently enrolled in an observation on global allele distribution with the frequencies downloaded from ALlele FREquency Database. Moreover, F-statistics (Fst), genetic structure and phylogenetic tree analyses were conducted for determination of genetic similarity between the 12 ethnic groups. Using the χ2 tests, rs1128503 (ABCB1), rs7294 (VKORC1), rs9934438 (VKORC1), rs1540339 (VDR) and rs689466 (PTGS2) were identified as the significantly different loci for further analysis. The global allele distribution revealed that the allele "A" of rs1540339 and rs9934438 were more frequent in Yi people, which was consistent with the most populations in East Asia. F-statistics (Fst), genetic structure and phylogenetic tree analyses demonstrated that the Yi and CHD shared a closest relationship on their genetic backgrounds. Additionally, Yi was considered similar to the Han people from Shaanxi province among the domestic ethnic populations in China. Our results demonstrated significant differences on several polymorphic SNPs and supplement the pharmacogenomic information for the Yi population, which could provide new strategies for optimizing clinical medication in accordance with the genetic determinants of drug toxicity and efficacy. Copyright © 2018 Elsevier B.V. All rights reserved.
Isolation and identification of multidrug-resistant Staphylococcus haemolyticus from a laboratory-breeding mouse.

PubMed

Huang, Fengying; Meng, Qiuping; Tan, Guanghong; Huang, Yonghao; Wang, Hua; Mei, Wenli; Dai, Haofu

2011-06-01

To analysis and identify a bacterium strain isolated from laboratory breeding mouse far away from a hospital. Phenotype of the isolate was investigated by conventional microbiological methods, including Gram-staining, colony morphology, tests for haemolysis, catalase, coagulase, and antimicrobial susceptibility test. The mecA and 16S rRNA genes were amplified by the polymerase chain reaction (PCR) and sequenced. The base sequence of the PCR product was compared with known 16S rRNA gene sequences in the GenBank database by phylogenetic analysis and multiple sequence alignment. The isolate in this study was a gram positive, coagulase negative, and catalase positive coccus. The isolate was resistant to oxacillin, methicillin, penicillin, ampicillin, cefazolin, ciprofloxacin erythromycin, et al. PCR results indicated that the isolate was mecA gene positive and its 16S rRNA was 1 465 bp. Phylogenetic analysis of the resultant 16S rRNA indicated the isolate belonged to genus Saphylococcus, and multiple sequence alignment showed that the isolate was Saphylococcus haemolyticus with only one base difference from the corresponding 16S rRNA deposited in the GenBank. 16S rRNA gene sequencing is a suitable technique for non-specialist researchers. Laboratory animals are possible sources of lethal pathogens, and researchers must adapt protective measures when they manipulate animals. Copyright © 2011 Hainan Medical College. Published by Elsevier B.V. All rights reserved.
Uncommonly isolated clinical Pseudomonas: identification and phylogenetic assignation.

PubMed

Mulet, M; Gomila, M; Ramírez, A; Cardew, S; Moore, E R B; Lalucat, J; García-Valdés, E

2017-02-01

Fifty-two Pseudomonas strains that were difficult to identify at the species level in the phenotypic routine characterizations employed by clinical microbiology laboratories were selected for genotypic-based analysis. Species level identifications were done initially by partial sequencing of the DNA dependent RNA polymerase sub-unit D gene (rpoD). Two other gene sequences, for the small sub-unit ribosonal RNA (16S rRNA) and for DNA gyrase sub-unit B (gyrB) were added in a multilocus sequence analysis (MLSA) study to confirm the species identifications. These sequences were analyzed with a collection of reference sequences from the type strains of 161 Pseudomonas species within an in-house multi-locus sequence analysis database. Whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of these strains complemented the DNA sequenced-based phylogenetic analyses and were observed to be in accordance with the results of the sequence data. Twenty-three out of 52 strains were assigned to 12 recognized species not commonly detected in clinical specimens and 29 (56 %) were considered representatives of at least ten putative new species. Most strains were distributed within the P. fluorescens and P. aeruginosa lineages. The value of rpoD sequences in species-level identifications for Pseudomonas is emphasized. The correct species identifications of clinical strains is essential for establishing the intrinsic antibiotic resistance patterns and improved treatment plans.
Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum.

PubMed

Gatesy, John; Springer, Mark S

2014-11-01

Large datasets are required to solve difficult phylogenetic problems that are deep in the Tree of Life. Currently, two divergent systematic methods are commonly applied to such datasets: the traditional supermatrix approach (= concatenation) and "shortcut" coalescence (= coalescence methods wherein gene trees and the species tree are not co-estimated). When applied to ancient clades, these contrasting frameworks often produce congruent results, but in recent phylogenetic analyses of Placentalia (placental mammals), this is not the case. A recent series of papers has alternatively disputed and defended the utility of shortcut coalescence methods at deep phylogenetic scales. Here, we examine this exchange in the context of published phylogenomic data from Mammalia; in particular we explore two critical issues - the delimitation of data partitions ("genes") in coalescence analysis and hidden support that emerges with the combination of such partitions in phylogenetic studies. Hidden support - increased support for a clade in combined analysis of all data partitions relative to the support evident in separate analyses of the various data partitions, is a hallmark of the supermatrix approach and a primary rationale for concatenating all characters into a single matrix. In the most extreme cases of hidden support, relationships that are contradicted by all gene trees are supported when all of the genes are analyzed together. A valid fear is that shortcut coalescence methods might bypass or distort character support that is hidden in individual loci because small gene fragments are analyzed in isolation. Given the extensive systematic database for Mammalia, the assumptions and applicability of shortcut coalescence methods can be assessed with rigor to complement a small but growing body of simulation work that has directly compared these methods to concatenation. We document several remarkable cases of hidden support in both supermatrix and coalescence paradigms and argue that in most instances, the emergent support in the shortcut coalescence analyses is an artifact. By referencing rigorous molecular clock studies of Mammalia, we suggest that inaccurate gene trees that imply unrealistically deep coalescences debilitate shortcut coalescence analyses of the placental dataset. We document contradictory coalescence results for Placentalia, and outline a critical conundrum that challenges the general utility of shortcut coalescence methods at deep phylogenetic scales. In particular, the basic unit of analysis in coalescence analysis, the coalescence-gene, is expected to shrink in size as more taxa are analyzed, but as the amount of data for reconstruction of a gene tree ratchets downward, the number of nodes in the gene tree that need to be resolved ratchets upward. Some advocates of shortcut coalescence methods have attempted to address problems with inaccurate gene trees by concatenating multiple coalescence-genes to yield "gene trees" that better match the species tree. However, this hybrid concatenation/coalescence approach, "concatalescence," contradicts the most basic biological rationale for performing a coalescence analysis in the first place. We discuss this reality in the context of recent simulation work that also suggests inaccurate reconstruction of gene trees is more problematic for shortcut coalescence methods than deep coalescence of independently segregating loci is for concatenation methods. Copyright © 2014 Elsevier Inc. All rights reserved.

Preserving the evolutionary potential of floras in biodiversity hotspots.

PubMed

Forest, Félix; Grenyer, Richard; Rouget, Mathieu; Davies, T Jonathan; Cowling, Richard M; Faith, Daniel P; Balmford, Andrew; Manning, John C; Procheş, Serban; van der Bank, Michelle; Reeves, Gail; Hedderson, Terry A J; Savolainen, Vincent

2007-02-15

One of the biggest challenges for conservation biology is to provide conservation planners with ways to prioritize effort. Much attention has been focused on biodiversity hotspots. However, the conservation of evolutionary process is now also acknowledged as a priority in the face of global change. Phylogenetic diversity (PD) is a biodiversity index that measures the length of evolutionary pathways that connect a given set of taxa. PD therefore identifies sets of taxa that maximize the accumulation of 'feature diversity'. Recent studies, however, concluded that taxon richness is a good surrogate for PD. Here we show taxon richness to be decoupled from PD, using a biome-wide phylogenetic analysis of the flora of an undisputed biodiversity hotspot--the Cape of South Africa. We demonstrate that this decoupling has real-world importance for conservation planning. Finally, using a database of medicinal and economic plant use, we demonstrate that PD protection is the best strategy for preserving feature diversity in the Cape. We should be able to use PD to identify those key regions that maximize future options, both for the continuing evolution of life on Earth and for the benefit of society.
Feasibility and effectiveness of a brief, intensive phylogenetics workshop in a middle-income country.

PubMed

Pollett, S; Leguia, M; Nelson, M I; Maljkovic Berry, I; Rutherford, G; Bausch, D G; Kasper, M; Jarman, R; Melendrez, M

2016-01-01

There is an increasing role for bioinformatic and phylogenetic analysis in tropical medicine research. However, scientists working in low- and middle-income regions may lack access to training opportunities in these methods. To help address this gap, a 5-day intensive bioinformatics workshop was offered in Lima, Peru. The syllabus is presented here for others who want to develop similar programs. To assess knowledge gained, a 20-point knowledge questionnaire was administered to participants (21 participants) before and after the workshop, covering topics on sequence quality control, alignment/formatting, database retrieval, models of evolution, sequence statistics, tree building, and results interpretation. Evolution/tree-building methods represented the lowest scoring domain at baseline and after the workshop. There was a considerable median gain in total knowledge scores (increase of 30%, p<0.001) with gains as high as 55%. A 5-day workshop model was effective in improving the pathogen-applied bioinformatics knowledge of scientists working in a middle-income country setting. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
The complete mitochondrial genome of the Anabas testudineus (Perciformes, Anabantidae) and its comparison with other related fish species.

PubMed

Behera, Bijay Kumar; Baisvar, Vishwamitra Singh; Kumari, Kavita; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Rao, A R; Rai, Anil

2017-03-01

In the present study, the complete mitochondrial genome sequence of Anabas testudineusis reported using PGM sequencer (Ion Torrent, Life Technologies, La Jolla, CA). The complete mitogenome of climbing perch, A. testudineusis obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP), which is 16 603 bp in length. The mitogenome of A. testudineus composed of 13 protein- coding genes, two rRNA, and 22 tRNAs. Here, 20 tRNAs genes showed typical clover leaf model, and D-Loop as the control region along with gene order and organization, being closely similar to Osphronemidae and most of other Perciformes fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of earlier reported A. testudineus. The phylogenetic analysis of Anabantidae depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of A. testudineus would be helpful in understanding the population genetics, phylogenetics, and evolution of Anabantidae.
Using complementary approaches to identify trans-domain nuclear gene transfers in the extremophile Galdieria sulphuraria (Rhodophyta).

PubMed

Pandey, Ravi S; Saxena, Garima; Bhattacharya, Debashish; Qiu, Huan; Azad, Rajeev K

2017-02-01

Identification of horizontal gene transfers (HGTs) has primarily relied on phylogenetic tree based methods, which require a rich sampling of sequenced genomes to ensure a reliable inference. Because the success of phylogenetic approaches depends on the breadth and depth of the database, researchers usually apply stringent filters to detect only the most likely gene transfers in the genomes of interest. One such study focused on a highly conservative estimate of trans-domain gene transfers in the extremophile eukaryote, Galdieria sulphuraria (Galdieri) Merola (Rhodophyta), by applying multiple filters in their phylogenetic pipeline. This led to the identification of 75 inter-domain acquisitions from Bacteria or Archaea. Because of the evolutionary, ecological, and potential biotechnological significance of foreign genes in algae, alternative approaches and pipelines complementing phylogenetics are needed for a more comprehensive assessment of HGT. We present here a novel pipeline that uncovered 17 novel foreign genes of prokaryotic origin in G. sulphuraria, results that are supported by multiple lines of evidence including composition-based, comparative data, and phylogenetics. These genes encode a variety of potentially adaptive functions, from metabolite transport to DNA repair. © 2016 Phycological Society of America.
Applying phylogenetic analysis to viral livestock diseases: moving beyond molecular typing.

PubMed

Olvera, Alex; Busquets, Núria; Cortey, Marti; de Deus, Nilsa; Ganges, Llilianne; Núñez, José Ignacio; Peralta, Bibiana; Toskano, Jennifer; Dolz, Roser

2010-05-01

Changes in livestock production systems in recent years have altered the presentation of many diseases resulting in the need for more sophisticated control measures. At the same time, new molecular assays have been developed to support the diagnosis of animal viral disease. Nucleotide sequences generated by these diagnostic techniques can be used in phylogenetic analysis to infer phenotypes by sequence homology and to perform molecular epidemiology studies. In this review, some key elements of phylogenetic analysis are highlighted, such as the selection of the appropriate neutral phylogenetic marker, the proper phylogenetic method and different techniques to test the reliability of the resulting tree. Examples are given of current and future applications of phylogenetic reconstructions in viral livestock diseases. Copyright 2009 Elsevier Ltd. All rights reserved.
The origins and evolutionary history of human non-coding RNA regulatory networks.

PubMed

Sherafatian, Masih; Mowla, Seyed Javad

2017-04-01

The evolutionary history and origin of the regulatory function of animal non-coding RNAs are not well understood. Lack of conservation of long non-coding RNAs and small sizes of microRNAs has been major obstacles in their phylogenetic analysis. In this study, we tried to shed more light on the evolution of ncRNA regulatory networks by changing our phylogenetic strategy to focus on the evolutionary pattern of their protein coding targets. We used available target databases of miRNAs and lncRNAs to find their protein coding targets in human. We were able to recognize evolutionary hallmarks of ncRNA targets by phylostratigraphic analysis. We found the conventional 3'-UTR and lesser known 5'-UTR targets of miRNAs to be enriched at three consecutive phylostrata. Firstly, in eukaryata phylostratum corresponding to the emergence of miRNAs, our study revealed that miRNA targets function primarily in cell cycle processes. Moreover, the same overrepresentation of the targets observed in the next two consecutive phylostrata, opisthokonta and eumetazoa, corresponded to the expansion periods of miRNAs in animals evolution. Coding sequence targets of miRNAs showed a delayed rise at opisthokonta phylostratum, compared to the 3' and 5' UTR targets of miRNAs. LncRNA regulatory network was the latest to evolve at eumetazoa.
DNA-Based Identification of Forensically Important Blow Flies (Diptera: Calliphoridae) From India.

PubMed

Bharti, Meenakshi; Singh, Baneshwar

2017-09-01

Correct species identification is the first and the most important criteria in entomological evidence-based postmortem interval (PMI) estimation. Although morphological keys are available for species identification of adult blow flies, keys for immature stages are either lacking or are incomplete. In this study, cytochrome oxidase subunit 1 (COI) reference data were developed from nine species (belonging to three subfamilies, namely, Calliphorinae, Luciliinae, and Chrysomyinae) of blow flies from India. Seven of the nine species included in this study were found suitable for DNA-based identification using COI gene, because they showed nonoverlapping intra- (0.0-0.3%) and inter-(1.96-18.14%) specific diversity, and formed well-supported monophyletic clade in phylogenetic analysis. The remaining two species (i.e., Chrysomya megacephala (Fabricius) and Chrysomya chani Kurahashi) cannot be distinguished reliably using our database because they had a very low interspecific diversity (0.11%), and Ch. megacephala was paraphyletic with respect to Ch. chani in the phylogenetic analysis. We conclude that the COI gene is a useful marker for DNA-based identification of blow flies from India. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Direct Analysis of Genes Encoding 16S rRNA from Complex Communities Reveals Many Novel Molecular Species within the Human Gut

PubMed Central

Suau, Antonia; Bonnet, Régis; Sutren, Malène; Godon, Jean-Jacques; Gibson, Glenn R.; Collins, Matthew D.; Doré, Joel

1999-01-01

The human intestinal tract harbors a complex microbial ecosystem which plays a key role in nutrition and health. Although this microbiota has been studied in great detail by culture techniques, microscopic counts on human feces suggest that 60 to 80% of the observable bacteria cannot be cultivated. Using comparative analysis of cloned 16S rRNA gene (rDNA) sequences, we have investigated the bacterial diversity (both cultivated and noncultivated bacteria) within an adult-male fecal sample. The 284 clones obtained from 10-cycle PCR were classified into 82 molecular species (at least 98% similarity). Three phylogenetic groups contained 95% of the clones: the Bacteroides group, the Clostridium coccoides group, and the Clostridium leptum subgroup. The remaining clones were distributed among a variety of phylogenetic clusters. Only 24% of the molecular species recovered corresponded to described organisms (those whose sequences were available in public databases), and all of these were established members of the dominant human fecal flora (e.g., Bacteroides thetaiotaomicron, Fusobacterium prausnitzii, and Eubacterium rectale). However, the majority of generated rDNA sequences (76%) did not correspond to known organisms and clearly derived from hitherto unknown species within this human gut microflora. PMID:10543789
Identification of the chitinase genes from the diamondback moth, Plutella xylostella.

PubMed

Liao, Z H; Kuo, T C; Kao, C H; Chou, T M; Kao, Y H; Huang, R N

2016-12-01

Chitinases have an indispensable function in chitin metabolism and are well characterized in numerous insect species. Although the diamondback moth (DBM) Plutella xylostella, which has a high reproductive potential, short generation time, and characteristic adaptation to adverse environments, has become one of the most serious pests of cruciferous plants worldwide, the information on the chitinases of the moth is presently limited. In the present study, using degenerated polymerase chain reaction (PCR) and rapid amplification of cDNA ends-PCR strategies, four chitinase genes of P. xylostella were cloned, and an exhaustive search was conducted for chitinase-like sequences from the P. xylostella genome and transcriptomic database. Based on the domain analysis of the deduced amino acid sequences and the phylogenetic analysis of the catalytic domain sequences, we identified 15 chitinase genes from P. xylostella. Two of the gut-specific chitinases did not cluster with any of the known phylogenetic groups of chitinases and might be in a new group of the chitinase family. Moreover, in our study, group VIII chitinase was not identified. The structures, classifications and expression patterns of the chitinases of P. xylostella were further delineated, and with this information, further investigations on the functions of chitinase genes in DBM could be facilitated.
Saprophytic and Potentially Pathogenic Fusarium Species from Peat Soil in Perak and Pahang

PubMed Central

Karim, Nurul Farah Abdul; Mohd, Masratulhawa; Nor, Nik Mohd Izham Mohd; Zakaria, Latiffah

2016-01-01

Isolates of Fusarium were discovered in peat soil samples collected from peat swamp forest, waterlogged peat soil, and peat soil from oil palm plantations. Morphological characteristics were used to tentatively identify the isolates, and species confirmation was based on the sequence of translation elongation factor-1α (TEF-1α) and phylogenetic analysis. Based on the closest match of Basic Local Alignment Search Tool (BLAST) searches against the GenBank and Fusarium-ID databases, five Fusarium species were identified, namely F. oxysporum (60%), F. solani (23%), F. proliferatum (14%), F. semitectum (1%), and F. verticillioides (1%). From a neighbour-joining tree of combined TEF-1α and β-tubulin sequences, isolates from the same species were clustered in the same clade, though intraspecies variations were observed from the phylogenetic analysis. The Fusarium species isolated in the present study are soil inhabitants and are widely distributed worldwide. These species can act as saprophytes and decomposers as well as plant pathogens. The presence of Fusarium species in peat soils suggested that peat soils could be a reservoir of plant pathogens, as well-known plant pathogenic species such F. oxysporum, F. solani, F. proliferatum, and F. verticillioides were identified. The results of the present study provide knowledge on the survival and distribution of Fusarium species. PMID:27019679
Saturnia jonasii Butler, 1877 on Jejudo Island, a new saturnid moth of South Korea with DNA data and morphology (Lepidoptera: Saturniidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2015-04-10

Saturnia (Rinaca) jonasii Butler, 1877 is distributed in Japan, including Tsushima Island and Taiwan, whereas S. boisduvalii Eversmann, 1846 is distributed in northern areas, such as China, Russia, and South Korea. In the present study we found that the specimens from Mt. Hallasan on Jejudo, a southern remote offshore island, were S. jonasii, rather than S. boisduvalii based on morphology, DNA barcode, and nuclear elongation factor 1 alpha (EF-1α) sequences. The major morphological differences between the two species included the shape of wing pattern elements of fore- and hindwings and male and female genitalia. A DNA barcode analysis of the sequences of the Jejudo specimens and S. boisduvalii, along with those of Saturnia species obtained from a public database showed a minimum sequence divergence of 4.26% (28 bp). A phylogenetic analysis also showed clustering of the Jejudo specimens with S. jonasii, separating S. boisduvalii (Bayesian posterior probability = 0.99). The EF-1α-based sequence and phylogenetic analyses of the two species from Jejudo Island and the Korean mainland showed the uniqueness of the Jejudo specimens from S. boisduvalii collected on the Korean mainland, indicating distribution of S. jonasii on Jejudo Island in South Korea, instead of S. boisduvalii.
Transcriptome Analysis in Sheepgrass (Leymus chinensis): A Dominant Perennial Grass of the Eurasian Steppe

PubMed Central

Chen, Shuangyan; Huang, Xin; Yan, Xueqing; Liang, Ye; Wang, Yuezhu; Li, Xiaofeng; Peng, Xianjun; Ma, Xingyong; Zhang, Lexin; Cai, Yueyue; Ma, Tian; Cheng, Liqin; Qi, Dongmei; Zheng, Huajun; Yang, Xiaohan; Li, Xiaoxia; Liu, Gongshe

2013-01-01

Background Sheepgrass [Leymus chinensis (Trin.) Tzvel.] is an important perennial forage grass across the Eurasian Steppe and is known for its adaptability to various environmental conditions. However, insufficient data resources in public databases for sheepgrass limited our understanding of the mechanism of environmental adaptations, gene discovery and molecular marker development. Results The transcriptome of sheepgrass was sequenced using Roche 454 pyrosequencing technology. We assembled 952,328 high-quality reads into 87,214 unigenes, including 32,416 contigs and 54,798 singletons. There were 15,450 contigs over 500 bp in length. BLAST searches of our database against Swiss-Prot and NCBI non-redundant protein sequences (nr) databases resulted in the annotation of 54,584 (62.6%) of the unigenes. Gene Ontology (GO) analysis assigned 89,129 GO term annotations for 17,463 unigenes. We identified 11,675 core Poaceae-specific and 12,811 putative sheepgrass-specific unigenes by BLAST searches against all plant genome and transcriptome databases. A total of 2,979 specific freezing-responsive unigenes were found from this RNAseq dataset. We identified 3,818 EST-SSRs in 3,597 unigenes, and some SSRs contained unigenes that were also candidates for freezing-response genes. Characterizations of nucleotide repeats and dominant motifs of SSRs in sheepgrass were also performed. Similarity and phylogenetic analysis indicated that sheepgrass is closely related to barley and wheat. Conclusions This research has greatly enriched sheepgrass transcriptome resources. The identified stress-related genes will help us to decipher the genetic basis of the environmental and ecological adaptations of this species and will be used to improve wheat and barley crops through hybridization or genetic transformation. The EST-SSRs reported here will be a valuable resource for future gene-phenotype studies and for the molecular breeding of sheepgrass and other Poaceae species. PMID:23861841
Transcriptome analysis in sheepgrass (Leymus chinensis): a dominant perennial grass of the Eurasian Steppe.

PubMed

Chen, Shuangyan; Huang, Xin; Yan, Xueqing; Liang, Ye; Wang, Yuezhu; Li, Xiaofeng; Peng, Xianjun; Ma, Xingyong; Zhang, Lexin; Cai, Yueyue; Ma, Tian; Cheng, Liqin; Qi, Dongmei; Zheng, Huajun; Yang, Xiaohan; Li, Xiaoxia; Liu, Gongshe

2013-01-01

Sheepgrass [Leymus chinensis (Trin.) Tzvel.] is an important perennial forage grass across the Eurasian Steppe and is known for its adaptability to various environmental conditions. However, insufficient data resources in public databases for sheepgrass limited our understanding of the mechanism of environmental adaptations, gene discovery and molecular marker development. The transcriptome of sheepgrass was sequenced using Roche 454 pyrosequencing technology. We assembled 952,328 high-quality reads into 87,214 unigenes, including 32,416 contigs and 54,798 singletons. There were 15,450 contigs over 500 bp in length. BLAST searches of our database against Swiss-Prot and NCBI non-redundant protein sequences (nr) databases resulted in the annotation of 54,584 (62.6%) of the unigenes. Gene Ontology (GO) analysis assigned 89,129 GO term annotations for 17,463 unigenes. We identified 11,675 core Poaceae-specific and 12,811 putative sheepgrass-specific unigenes by BLAST searches against all plant genome and transcriptome databases. A total of 2,979 specific freezing-responsive unigenes were found from this RNAseq dataset. We identified 3,818 EST-SSRs in 3,597 unigenes, and some SSRs contained unigenes that were also candidates for freezing-response genes. Characterizations of nucleotide repeats and dominant motifs of SSRs in sheepgrass were also performed. Similarity and phylogenetic analysis indicated that sheepgrass is closely related to barley and wheat. This research has greatly enriched sheepgrass transcriptome resources. The identified stress-related genes will help us to decipher the genetic basis of the environmental and ecological adaptations of this species and will be used to improve wheat and barley crops through hybridization or genetic transformation. The EST-SSRs reported here will be a valuable resource for future gene-phenotype studies and for the molecular breeding of sheepgrass and other Poaceae species.
Community phylogenetics at the biogeographical scale: cold tolerance, niche conservatism and the structure of North American forests.

PubMed

Hawkins, Bradford A; Rueda, Marta; Rangel, Thiago F; Field, Richard; Diniz-Filho, José Alexandre F; Linder, Peter

2014-01-01

Aim The fossil record has led to a historical explanation for forest diversity gradients within the cool parts of the Northern Hemisphere, founded on a limited ability of woody angiosperm clades to adapt to mid-Tertiary cooling. We tested four predictions of how this should be manifested in the phylogenetic structure of 91,340 communities: (1) forests to the north should comprise species from younger clades (families) than forests to the south; (2) average cold tolerance at a local site should be associated with the mean family age (MFA) of species; (3) minimum temperature should account for MFA better than alternative environmental variables; and (4) traits associated with survival in cold climates should evolve under a niche conservatism constraint. Location The contiguous United States. Methods We extracted angiosperms from the US Forest Service's Forest Inventory and Analysis database. MFA was calculated by assigning age of the family to which each species belongs and averaging across the species in each community. We developed a phylogeny to identify phylogenetic signal in five traits: realized cold tolerance, seed size, seed dispersal mode, leaf phenology and height. Phylogenetic signal representation curves and phylogenetic generalized least squares were used to compare patterns of trait evolution against Brownian motion. Eleven predictors structured at broad or local scales were generated to explore relationships between environment and MFA using random forest and general linear models. Results Consistent with predictions, (1) southern communities comprise angiosperm species from older families than northern communities, (2) cold tolerance is the trait most strongly associated with local MFA, (3) minimum temperature in the coldest month is the environmental variable that best describes MFA, broad-scale variables being much stronger correlates than local-scale variables, and (4) the phylogenetic structures of cold tolerance and at least one other trait associated with survivorship in cold climates indicate niche conservatism. Main conclusions Tropical niche conservatism in the face of long-term climate change, probably initiated in the Late Cretaceous associated with the rise of the Rocky Mountains, is a strong driver of the phylogenetic structure of the angiosperm component of forest communities across the USA. However, local deterministic and/or stochastic processes account for perhaps a quarter of the variation in the MFA of local communities.
Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices.

PubMed

Dececchi, T Alex; Mabee, Paula M; Blackburn, David C

2016-01-01

Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.
Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices

PubMed Central

Dececchi, T. Alex; Mabee, Paula M.; Blackburn, David C.

2016-01-01

Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications (‘monographs’) and those used in phylogenetic analyses (‘matrices’). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life. PMID:27191170
Minimizing the average distance to a closest leaf in a phylogenetic tree.

PubMed

Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

2013-11-01

When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.
Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

PubMed

Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

2012-01-01

Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.
Conformation of phylogenetic relationship of Penaeidae shrimp based on morphometric and molecular investigations.

PubMed

Rajakumaran, P; Vaseeharan, B; Jayakumar, R; Chidambara, R

2014-01-01

Understanding of accurate phylogenetic relationship among Penaeidae shrimp is important for academic and fisheries industry. The Morphometric and Randomly amplified polymorphic DNA (RAPD) analysis was used to make the phylogenetic relationsip among 13 Penaeidae shrimp. For morphometric analysis forty variables and total lengths of shrimp were measured for each species, and removed the effect of size variation. The size normalized values obtained was subjected to UPGMA (Unweighted Pair-Group Method with Arithmetic Mean) cluster analysis. For RAPD analysis, the four primers showed reliable differentiation between species, and used correlation coefficient between the DNA banding patterns of 13 Penaeidae species to construct UPGMA dendrogram. Phylogenetic relationship from morphometric and molecular analysis for Penaeidae species found to be congruent. We concluded that as the results from morphometry investigations concur with molecular one, phylogenetic relationship obtained for the studied Penaeidae are considered to be reliable.
Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach.

PubMed

Panahiazar, Maryam; Sheth, Amit P; Ranabahu, Ajith; Vos, Rutger A; Leebens-Mack, Jim

2013-01-01

Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.

The Gypsy Database (GyDB) of mobile genetic elements

PubMed Central

Lloréns, C.; Futami, R.; Bezemer, D.; Moya, A.

2008-01-01

In this article, we introduce the Gypsy Database (GyDB) of mobile genetic elements, an in-progress database devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic elements. In this first version, we contemplate eukaryotic Ty3/Gypsy and Retroviridae long terminal repeats (LTR) retroelements. Phylogenetic analyses based on the gag-pro-pol internal region commonly presented by these two groups strongly support a certain number of previously described Ty3/Gypsy lineages originally reported from reverse-transcriptase (RT) analyses. Vertebrate retroviruses (Retroviridae) are also constituted in several monophyletic groups consistent with genera proposed by the ICTV nomenclature, as well as with the current tendency to classify both endogenous and exogenous retroviruses by three major classes (I, II and III). Our inference indicates that all protein domains codified by the gag-pro-pol internal region of these two groups agree in a collective presentation of a particular evolutionary history, which may be used as a main criterion to differentiate their molecular diversity in a comprehensive collection of phylogenies and non-redundant molecular profiles useful in the identification of new Ty3/Gypsy and Retroviridae species. The GyDB project is available at http://gydb.uv.es. PMID:17895280
Gene function in early mouse embryonic stem cell differentiation

PubMed Central

Sene, Kagnew Hailesellasse; Porter, Christopher J; Palidwor, Gareth; Perez-Iratxeta, Carolina; Muro, Enrique M; Campbell, Pearl A; Rudnicki, Michael A; Andrade-Navarro, Miguel A

2007-01-01

Background Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. Results We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. Conclusion Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database. PMID:17394647
BioRuby: bioinformatics software for the Ruby programming language.

PubMed

Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

2010-10-15

The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. katayama@bioruby.org
A proposed model for the flowering signaling pathway of sugarcane under photoperiodic control.

PubMed

Coelho, C P; Costa Netto, A P; Colasanti, J; Chalfun-Júnior, A

2013-04-25

Molecular analysis of floral induction in Arabidopsis has identified several flowering time genes related to 4 response networks defined by the autonomous, gibberellin, photoperiod, and vernalization pathways. Although grass flowering processes include ancestral functions shared by both mono- and dicots, they have developed their own mechanisms to transmit floral induction signals. Despite its high production capacity and its important role in biofuel production, almost no information is available about the flowering process in sugarcane. We searched the Sugarcane Expressed Sequence Tags database to look for elements of the flowering signaling pathway under photoperiodic control. Sequences showing significant similarity to flowering time genes of other species were clustered, annotated, and analyzed for conserved domains. Multiple alignments comparing the sequences found in the sugarcane database and those from other species were performed and their phylogenetic relationship assessed using the MEGA 4.0 software. Electronic Northerns were run with Cluster and TreeView programs, allowing us to identify putative members of the photoperiod-controlled flowering pathway of sugarcane.
IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

PubMed Central

Lee, Wonhoon; Park, Jongsun; Choi, Jaeyoung; Jung, Kyongyong; Park, Bongsoo; Kim, Donghan; Lee, Jaeyoung; Ahn, Kyohun; Song, Wonho; Kang, Seogchan; Lee, Yong-Hwan; Lee, Seunghwan

2009-01-01

Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site . PMID:19351385
treespace: Statistical exploration of landscapes of phylogenetic trees.

PubMed

Jombart, Thibaut; Kendall, Michelle; Almagro-Garcia, Jacob; Colijn, Caroline

2017-11-01

The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low-dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.

PubMed

Beccati, Alan; Gerken, Jan; Quast, Christian; Yilmaz, Pelin; Glöckner, Frank Oliver

2017-09-30

Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. The SILVA Tree Viewer is a web application designed for visualizing large phylogenetic trees without requiring the download of any software tool or data files. The SILVA Tree Viewer is based on Web Geographic Information Systems (Web-GIS) technology with a PostgreSQL backend. It enables zoom and pan functionalities similar to Google Maps. The SILVA Tree Viewer enables access to two phylogenetic (guide) trees provided by the SILVA database: the SSU Ref NR99 inferred from high-quality, full-length small subunit sequences, clustered at 99% sequence identity and the LSU Ref inferred from high-quality, full-length large subunit sequences. The Tree Viewer provides tree navigation, search and browse tools as well as an interactive feedback system to collect any kinds of requests ranging from taxonomy to data curation and improving the tool itself.
Phylogenetic relationships and taxonomic revision of Paranoplocephala Lühe, 1910 sensu lato (Cestoda, Cyclophyllidea, Anoplocephalidae)

USDA-ARS?s Scientific Manuscript database

An extensive phylogenetic analysis and genus-level taxonomic revision of Paranoplocephala Lühe, 1910 -like cestodes (Cyclophyllidea, Anoplocephalidae) are presented. The phylogenetic analysis is based on DNA sequences of two partial mitochondrial genes, i.e. cytochrome c oxidase subunit 1 (cox1) and...
Comparative genomics reveals phylogenetic distribution patterns of secondary metabolites in Amycolatopsis species.

PubMed

Adamek, Martina; Alanjary, Mohammad; Sales-Ortells, Helena; Goodfellow, Michael; Bull, Alan T; Winkler, Anika; Wibberg, Daniel; Kalinowski, Jörn; Ziemert, Nadine

2018-06-01

Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis' strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.
Assessment of phylogenetic relationship of rare plant species collected from Saudi Arabia using internal transcribed spacer sequences of nuclear ribosomal DNA.

PubMed

Al-Qurainy, F; Khan, S; Nadeem, M; Tarroum, M; Alaklabi, A

2013-03-11

The rare and endangered plants of any country are important genetic resources that often require urgent conservation measures. Assessment of phylogenetic relationships and evaluation of genetic diversity is very important prior to implementation of conservation strategies for saving rare and endangered plant species. We used internal transcribed spacer sequences of nuclear ribosomal DNA for the evaluation of sequence identity from the available taxa in the GenBank database by using the Basic Local Alignment Search Tool (BLAST). Two rare plant species viz, Heliotropium strigosum claded with H. pilosum (98% branch support) and Pancratium tortuosum claded with P. tenuifolium (61% branch support) clearly. However, some species, viz Scadoxus multiflorus, Commiphora myrrha and Senecio hadiensis showed close relationships with more than one species. We conclude that nuclear ribosomal internal transcribed spacer sequences are useful markers for phylogenetic study of these rare plant species in Saudi Arabia.
Cloning and in-silico analysis of beta-1,3-xylanase from psychrophilic yeast, Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Nor, Nooraisyah Mohamad; Bakar, Farah Diba Abu; Mahadi, Nor Muhammad; Murad, Abdul Munir Abdul

2015-09-01

A beta-1,3-xylanase (EC 3.2.1.32) gene from psychrophilic yeast, Glaciozyma antarctica has been identified via genome data mining. The enzyme was grouped into GH26 family based on Carbohydrate Active Enzyme (CaZY) database. The molecular weight of this protein was predicted to be 42 kDa and is expected to be soluble for expression. The presence of signal peptide suggested that this enzyme may be released extracellularly into the marine environment of the host's habitat. This supports the theory that such enzymatic activity is required for degradation of nutrients of polysaccharide origins into simpler carbohydrates outside the environment before it could be taken up inside the cell. The sequence for this protein showed very little conservation (< 30%) with other beta-1,3-xylanases from available databases. Based on the phylogenetic analysis, this protein also showed distant relationship to other xylanases from eukaryotic origin. The protein may have undergone major substitution in its gene sequence order to adapt to the cold climate. This is the first report of beta-1,3-xylanase gene isolated from a psychrophilic yeast.
Toward unification of taxonomy databases in a distributed computer environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi

1994-12-31

All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less
Equine infectious anemia virus in naturally infected horses from the Brazilian Pantanal.

PubMed

Cursino, Andreia Elisa; Vilela, Ana Paula Pessoa; Franco-Luiz, Ana Paula Moreira; de Oliveira, Jaquelline Germano; Nogueira, Márcia Furlan; Júnior, João Pessoa Araújo; de Aguiar, Daniel Moura; Kroon, Erna Geessien

2018-05-11

Equine infectious anemia (EIA) has a worldwide distribution, and is widespread in Brazil. The Brazilian Pantanal presents with high prevalence comprising equine performance and indirectly the livestock industry, since the horses are used for cattle management. Although EIA is routinely diagnosed by the agar gel immunodiffusion test (AGID), this serological assay has some limitations, so PCR-based detection methods have the potential to overcome these limitations and act as complementary tests to those currently used. Considering the limited number of equine infectious anemia virus (EIAV) sequences which are available in public databases and the great genome variability, studies of EIAV detection and characterization molecular remain important. In this study we detected EIAV proviral DNA from 23 peripheral blood mononuclear cell (PBMCs) samples of naturally infected horses from Brazilian Pantanal using a semi-nested-PCR (sn-PCR). The serological profile of the animals was also evaluated by AGID and ELISA for gp90 and p26. Furthermore, the EIAV PCR amplified DNA was sequenced and phylogenetically analyzed. Here we describe the first EIAV sequences of the 5' LTR of the tat gene in naturally infected horses from Brazil, which presented with 91% similarity to EIAV reference sequences. The Brazilian EIAV sequences also presented variable nucleotide similarities among themselves, ranging from 93,5% to 100%. Phylogenetic analysis showed that Brazilian EIAV sequences grouped in a separate clade relative to other reference sequences. Thus this molecular detection and characterization may provide information about EIAV circulation in Brazilian territories and improve phylogenetic inferences.
ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation.

PubMed

Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V

2017-01-04

The Alignable Tight Genomic Clusters (ATGCs) database is a collection of closely related bacterial and archaeal genomes that provides several tools to aid research into evolutionary processes in the microbial world. Each ATGC is a taxonomy-independent cluster of 2 or more completely sequenced genomes that meet the objective criteria of a high degree of local gene order (synteny) and a small number of synonymous substitutions in the protein-coding genes. As such, each ATGC is suited for analysis of microevolutionary variations within a cohesive group of organisms (e.g. species), whereas the entire collection of ATGCs is useful for macroevolutionary studies. The ATGC database includes many forms of pre-computed data, in particular ATGC-COGs (Clusters of Orthologous Genes), multiple sequence alignments, a set of 'index' orthologs representing the most well-conserved members of each ATGC-COG, the phylogenetic tree of the organisms within each ATGC, etc. Although the ATGC database contains several million proteins from thousands of genomes organized into hundreds of clusters (roughly a 4-fold increase since the last version of the ATGC database), it is now built with completely automated methods and will be regularly updated following new releases of the NCBI RefSeq database. The ATGC database is hosted jointly at the University of Iowa at dmk-brain.ecn.uiowa.edu/ATGC/ and the NCBI at ftp.ncbi.nlm.nih.gov/pub/kristensen/ATGC/atgc_home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Bioinformatics: A History of Evolution "In Silico"

ERIC Educational Resources Information Center

Ondrej, Vladan; Dvorak, Petr

2012-01-01

Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…
Inquiry-Based Learning of Molecular Phylogenetics

ERIC Educational Resources Information Center

Campo, Daniel; Garcia-Vazquez, Eva

2008-01-01

Reconstructing phylogenies from nucleotide sequences is a challenge for students because it strongly depends on evolutionary models and computer tools that are frequently updated. We present here an inquiry-based course aimed at learning how to trace a phylogeny based on sequences existing in public databases. Computer tools are freely available…
First report on the occurrence of Theileria sp. OT3 in China.

PubMed

Tian, Zhancheng; Liu, Guangyuan; Yin, Hong; Xie, Junren; Wang, Suyan; Yuan, Xiaosong; Wang, Fangfang; Luo, Jin

2014-04-01

Theileria sp. OT3 was firstly detected and identified from clinically healthy sheep in Xinjiang Uygur Autonomous Region of China (XUAR) through comparing the complete 18S rDNA gene sequences available in GenBank database and the phylogenetic status based on the internal transcribed spacers (ITS1, ITS2) as well as the intervening 5.8S coding region of the rRNA gene by the methods of a partitioned multi-locus analysis in BEAST and Maximum likelihood analysis in PhyML. Moreover, the findings were confirmed by the species-specific PCR for Theileria sp. OT3 and the prevalence of Theileria sp. OT3 was 14.9% in the north of XUAR. This study is the first report on the occurrence of Theileria sp. OT3 in China. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.

PubMed

Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H

2014-02-26

Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
Transcriptome-wide analysis of WRKY transcription factors in wheat and their leaf rust responsive expression profiling.

PubMed

Satapathy, Lopamudra; Singh, Dharmendra; Ranjan, Prashant; Kumar, Dhananjay; Kumar, Manish; Prabhu, Kumble Vinod; Mukhopadhyay, Kunal

2014-12-01

WRKY, a plant-specific transcription factor family, has important roles in pathogen defense, abiotic cues and phytohormone signaling, yet little is known about their roles and molecular mechanism of function in response to rust diseases in wheat. We identified 100 TaWRKY sequences using wheat Expressed Sequence Tag database of which 22 WRKY sequences were novel. Identified proteins were characterized based on their zinc finger motifs and phylogenetic analysis clustered them into six clades consisting of class IIc and class III WRKY proteins. Functional annotation revealed major functions in metabolic and cellular processes in control plants; whereas response to stimuli, signaling and defense in pathogen inoculated plants, their major molecular function being binding to DNA. Tag-based expression analysis of the identified genes revealed differential expression between mock and Puccinia triticina inoculated wheat near isogenic lines. Gene expression was also performed with six rust-related microarray experiments at Gene Expression Omnibus database. TaWRKY10, 15, 17 and 56 were common in both tag-based and microarray-based differential expression analysis and could be representing rust specific WRKY genes. The obtained results will bestow insight into the functional characterization of WRKY transcription factors responsive to leaf rust pathogenesis that can be used as candidate genes in molecular breeding programs to improve biotic stress tolerance in wheat.
HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

PubMed

Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

2016-01-04

The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Spatiotemporal Phylogenetic Analysis and Molecular Characterisation of Infectious Bursal Disease Viruses Based on the VP2 Hyper-Variable Region

PubMed Central

Dolz, Roser; Valle, Rosa; Perera, Carmen L.; Bertran, Kateri; Frías, Maria T.; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J.

2013-01-01

Background Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Methodology/Principal Findings Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. Conclusions/Significance To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide. PMID:23805195
Spatiotemporal Phylogenetic Analysis and Molecular Characterisation of Infectious Bursal Disease Viruses Based on the VP2 Hyper-Variable Region.

PubMed

Alfonso-Morales, Abdulahi; Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L; Bertran, Kateri; Frías, Maria T; Majó, Natàlia; Ganges, Llilianne; Pérez, Lester J

2013-01-01

Infectious bursal disease is a highly contagious and acute viral disease caused by the infectious bursal disease virus (IBDV); it affects all major poultry producing areas of the world. The current study was designed to rigorously measure the global phylogeographic dynamics of IBDV strains to gain insight into viral population expansion as well as the emergence, spread and pattern of the geographical structure of very virulent IBDV (vvIBDV) strains. Sequences of the hyper-variable region of the VP2 (HVR-VP2) gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank database; Cuban sequences were obtained in the current work. All sequences were analysed by Bayesian phylogeographic analysis, implemented in the Bayesian Evolutionary Analysis Sampling Trees (BEAST), Bayesian Tip-association Significance testing (BaTS) and Spatial Phylogenetic Reconstruction of Evolutionary Dynamics (SPREAD) software packages. Selection pressure on the HVR-VP2 was also assessed. The phylogeographic association-trait analysis showed that viruses sampled from individual countries tend to cluster together, suggesting a geographic pattern for IBDV strains. Spatial analysis from this study revealed that strains carrying sequences that were linked to increased virulence of IBDV appeared in Iran in 1981 and spread to Western Europe (Belgium) in 1987, Africa (Egypt) around 1990, East Asia (China and Japan) in 1993, the Caribbean Region (Cuba) by 1995 and South America (Brazil) around 2000. Selection pressure analysis showed that several codons in the HVR-VP2 region were under purifying selection. To our knowledge, this work is the first study applying the Bayesian phylogeographic reconstruction approach to analyse the emergence and spread of vvIBDV strains worldwide.
Updated Rice Kinase Database RKD 2.0: enabling transcriptome and functional analysis of rice kinase genes.

PubMed

Chandran, Anil Kumar Nalini; Yoo, Yo-Han; Cao, Peijian; Sharma, Rita; Sharma, Manoj; Dardick, Christopher; Ronald, Pamela C; Jung, Ki-Hong

2016-12-01

Protein kinases catalyze the transfer of a phosphate moiety from a phosphate donor to the substrate molecule, thus playing critical roles in cell signaling and metabolism. Although plant genomes contain more than 1000 genes that encode kinases, knowledge is limited about the function of each of these kinases. A major obstacle that hinders progress towards kinase characterization is functional redundancy. To address this challenge, we previously developed the rice kinase database (RKD) that integrated omics-scale data within a phylogenetics context. An updated version of rice kinase database (RKD) that contains metadata derived from NCBI GEO expression datasets has been developed. RKD 2.0 facilitates in-depth transcriptomic analyses of kinase-encoding genes in diverse rice tissues and in response to biotic and abiotic stresses and hormone treatments. We identified 261 kinases specifically expressed in particular tissues, 130 that are significantly up- regulated in response to biotic stress, 296 in response to abiotic stress, and 260 in response to hormones. Based on this update and Pearson correlation coefficient (PCC) analysis, we estimated that 19 out of 26 genes characterized through loss-of-function studies confer dominant functions. These were selected because they either had paralogous members with PCC values of <0.5 or had no paralog. Compared with the previous version of RKD, RKD 2.0 enables more effective estimations of functional redundancy or dominance because it uses comprehensive expression profiles rather than individual profiles. The integrated analysis of RKD with PCC establishes a single platform for researchers to select rice kinases for functional analyses.
EDGAR: A software framework for the comparative analysis of prokaryotic genomes

PubMed Central

Blom, Jochen; Albaum, Stefan P; Doppmeier, Daniel; Pühler, Alfred; Vorhölter, Frank-Jörg; Zakrzewski, Martha; Goesmann, Alexander

2009-01-01

Background The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons. Results To support these studies EDGAR – "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" – was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy. Conclusion EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface , where the precomputed data sets can be browsed. PMID:19457249
Compilation of small ribosomal subunit RNA structures.

PubMed Central

Neefs, J M; Van de Peer, Y; De Rijk, P; Chapelle, S; De Wachter, R

1993-01-01

The database on small ribosomal subunit RNA structure contained 1804 nucleotide sequences on April 23, 1993. This number comprises 365 eukaryotic, 65 archaeal, 1260 bacterial, 30 plastidial, and 84 mitochondrial sequences. These are stored in the form of an alignment in order to facilitate the use of the database as input for comparative studies on higher-order structure and for reconstruction of phylogenetic trees. The elements of the postulated secondary structure for each molecule are indicated by special symbols. The database is available on-line directly from the authors by ftp and can also be obtained from the EMBL nucleotide sequence library by electronic mail, ftp, and on CD ROM disk. PMID:8332525
DIMA 3.0: Domain Interaction Map.

PubMed

Luo, Qibin; Pagel, Philipp; Vilne, Baiba; Frishman, Dmitrij

2011-01-01

Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46,900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.
New high through put approach to study ancient microbial phylogenetic diversity in permafrost

NASA Astrophysics Data System (ADS)

Spirina, E.; Cole, J.; Chai, B.; Gilichinksy, D.; Tiedje, J.

2003-04-01

The study of microbial diversity in the deep ancient permafrost can help to answer many questions: (1) what kind of mechanisms keeps microbial cells alive, (2) how many of phylogenetic groups exist in situ and never had been cultivated, (3) what is the difference between modern and ancient microorganisms? From this point, distinct environments were examined: Arctic and Antarctic modern soil and permafrost. 16S rDNA genes were amplified from genomic DNA extracted from both original frozen samples and the same samples incubated at 10oC for 8 weeks under both aerobic and anaerobic conditions to determine those capable to grow. High throughput DNA sequencing was performed on the cloned PCR products to obtain partial 16S rDNA gene sequences. The unique script was written to automatically compare over 2,000 partial sequences with those rrn sequences in the Ribosomal Database Project (RDP) release 8.1 using the SEQUENCE MATCH. Sequences were grouped into categories from the RDPs phylogenetic hierarchy based on the closest database matches. Investigation revealed significant microbial diversity; two phylogenetic groups were predominant in all samples: Proteobacteria and Gram Positive Bacteria. Microbial community composition within those groups is different from sample to sample. However, similar genera, such as Arthrobacter, Bacillus, Citrobacter, Caulobacter, Comamonas, Flavobacterium, Nocardioides, Pseudomonas, Rhodocyclus, Rhodococcus, Sphingobacterium, Sphingomonas, Streptococcus, Terrabacter appeared in both polar regions. The greatest microbial diversity was detected in Arctic surface samples. According to RDPs phylogenetic hierarchy those organisms are related to Proteobacteria_SD, Gram Positive Bacteria_SD, Leptospirillum-Nitrospira, Nitrospina_SD, Flexibacter-Cytophaga-Bacteroides, Planctomyces and Relatives. Both the aerobic and anaerobic low temperatures soil incubation yielded some microbes not detected in the original samples. It should be possible, using phylogenetic diversity from the same organisms from modern top layers to the several millions years old, to find out what are the differences among members of the same species as we go back in time. Then, if we compare those mutations rate with geological time, we can speculate on how fast or slow evolution or adaptation takes place and for that particular type of organism. This is a beginning of studies concerning the biological clocks extending back the duration of the permanently frozen state in the terrestrial and extraterrestrial soils, i. e. the age of biota.
Evaluation of protein spectra cluster analysis for Streptococcus spp. identification from various swine clinical samples.

PubMed

Matajira, Carlos E C; Moreno, Luisa Z; Gomes, Vasco T M; Silva, Ana Paula S; Mesquita, Renan E; Doto, Daniela S; Calderaro, Franco F; de Souza, Fernando N; Christ, Ana Paula G; Sato, Maria Inês Z; Moreno, Andrea M

2017-03-01

Traditional microbiological methods enable genus-level identification of Streptococcus spp. isolates. However, as the species of this genus show broad phenotypic variation, species-level identification or even differentiation within the genus is difficult. Herein we report the evaluation of protein spectra cluster analysis for the identification of Streptococcus species associated with disease in swine by means of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). A total of 250 S. suis-like isolates obtained from pigs with clinical signs of encephalitis, arthritis, pneumonia, metritis, and urinary or septicemic infection were studied. The isolates came from pigs in different Brazilian states from 2001 to 2014. The MALDI-TOF MS analysis identified 86% (215 of 250) as S. suis and 14% (35 of 250) as S. alactolyticus, S. dysgalactiae, S. gallinaceus, S. gallolyticus, S. gordonii, S. henryi, S. hyointestinalis, S. hyovaginalis, S. mitis, S. oralis, S. pluranimalium, and S. sanguinis. The MALDI-TOF MS identification was confirmed in 99.2% of the isolates by 16S rDNA sequencing, with MALDI-TOF MS misidentifying 2 S. pluranimalium as S. hyovaginalis. Isolates were also tested by a biochemical automated system that correctly identified all isolates of 8 of the 10 species in the database. Neither the isolates of the 3 species not in the database ( S. gallinaceus, S. henryi, and S. hyovaginalis) nor the isolates of 2 species that were in the database ( S. oralis and S. pluranimalium) could be identified. The topology of the protein spectra cluster analysis appears to sustain the species phylogenetic similarities, further supporting identification by MALDI-TOF MS examination as a rapid and accurate alternative to 16S rDNA sequencing.
[Ultrastructural observation on nymphal Armillifer sp. by scanning electron microscopy and phylogenetic analysis based on 18S rRNA].

PubMed

Li, Jian; Shi, Yun-Liang; Shi, Wei; Fang, Fang; Zhou, Qing-An; Li, Wen-Wen; He, Guo-Sheng; Huang, Wei-Yi

2012-04-30

To observe the ultrastructure of nymphal Armillifer sp. isolated from Macaca fascicularis by using scanning electron microscope (SEM), and analyze the phylogenetic relationships based on 18S rRNA gene sequences. The parasite samples stored in 70% alcohol were fixed by glutaraldehyde and osmium peroxide. Ultrastructural characters of those samples were observed under SEM. Amplification and sequencing of the 18S rRNA gene were performed following the extraction of total genome DNA. Sequence analysis was performed based on multiple alignment using ClustalX1.83, while phylogenetic analysis was made by Neighbor-Joining method using MEGA4.0. The nymphs were in cylindrical shape, the body slightly claviform tapering to posterior end. Abdominal annuli were gradually widened from anterior to posterior parts, the 12th-13th abdominal annuli of which were similar in width. The annuli ranged closer in the front half body, whereas in the latter part there were certain gaps between them. The circular-shaped mouth located in the middle of head ventrally. Folds were seen in inner margin of the mouth with a pair of curved hooks on both sides above it which practically disposed in a straight line. Two pairs of large sensory papillae were observed symmetrically over the last thoracic annulus of cephalothoraxs lying below the outer hook, and the first abdominal annulus was near the median ventral line. The number of abdominal annuli was 29, not including 2 incomplete terminal annuli. Rounded sensory papillae were fully distributed on the body surface, except the dorsal side of head and the ventral part of the terminal annulus. Agglomerate-like anus opening was observed at the end of ventral abdominal annuli and distinctly sub-terminal. These morphological features demonstrated that the nymphs were highly similar with that of Armillifer moniliformis Diesing, 1835. A fragment of 18SrRNA gene (1 836 bp) sequences was obtained by PCR combined with sequencing, and was registered to the GeneBank database with an accession number HM048870. The phylogenetic tree indicated that A. moniliformis, A.agkistrodon and A.armillatus were at the same clade with a bootstrap value at 95%, and A. moniliformis and A. agkistrodon were solo at a clade with a bootstrap value of 75%. The nymphs isolated from Macaca fascicularis are identified as A. moniliformis temporarily.
Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study

PubMed Central

Weißenborn, Sandra; Walther, Dirk

2017-01-01

Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570
Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice

PubMed Central

Shavit Grievink, Liat; Penny, David; Holland, Barbara R.

2013-01-01

Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this. PMID:23471508
A composite molecular phylogeny of living lemuroid primates.

PubMed

DelPero, Massimiliano; Pozzi, Luca; Masters, Judith C

2006-01-01

Lemuroid phylogeny is a source of lively debate among primatologists. Reconstructions based on morphological, physiological, behavioural and molecular data have yielded a diverse array of tree topologies with few nodes in common. In the last decade, molecular phylogenetic studies have grown in popularity, and a wide range of sequences has been brought to bear on the problem, but consensus has remained elusive. We present an analysis based on a composite molecular data set of approx. 6,400 bp assembled from the National Center for Biotechnology Information (NCBI) database, including both mitochondrial and nuclear genes, and diverse analytical methods. Our analysis consolidates some of the nodes that were insecure in previous reconstructions, but is still equivocal on the placement of some taxa. We conducted a similar analysis of a composite data set of approx. 3,600 bp to investigate the controversial relationships within the family Lemuridae. Here our analysis was more successful; only the position of Eulemur coronatus remained uncertain. Copyright 2006 S. Karger AG, Basel.
Assessing the diversity of AM fungi in arid gypsophilous plant communities.

PubMed

Alguacil, M M; Roldán, A; Torres, M P

2009-10-01

In the present study, we used PCR-Single-Stranded Conformation Polymorphism (SSCP) techniques to analyse arbuscular mycorrhizal fungi (AMF) communities in four sites within a 10 km(2) gypsum area in Southern Spain. Four common plant species from these ecosystems were selected. The AM fungal small-subunit (SSU) rRNA genes were subjected to PCR, cloning, SSCP analysis, sequencing and phylogenetic analyses. A total of 1443 SSU rRNA sequences were analysed, for 21 AM fungal types: 19 belonged to the genus Glomus, 1 to the genus Diversispora and 1 to the Scutellospora. Four sequence groups were identified, which showed high similarity to sequences of known glomalean species or isolates: Glo G18 to Glomus constrictum, Glo G1 to Glomus intraradices, Glo G16 to Glomus clarum, Scut to Scutellospora dipurpurescens and Div to one new genus in the family Diversisporaceae identified recently as Otospora bareai. There were three sequence groups that received strong support in the phylogenetic analysis, and did not seem to be related to any sequences of AM fungi in culture or previously found in the database; thus, they could be novel taxa within the genus Glomus: Glo G4, Glo G2 and Glo G14. We have detected the presence of both generalist and potential specialist AMF in gypsum ecosystems. The AMF communities were different in the plant studied suggesting some degree of preference in the interactions between these symbionts.
Isolation, Phylogenetic Analysis and Anti-infective Activity Screening of Marine Sponge-Associated Actinomycetes

PubMed Central

Abdelmohsen, Usama Ramadan; Pimentel-Elardo, Sheila M.; Hanora, Amro; Radwan, Mona; Abou-El-Ela, Soad H.; Ahmed, Safwat; Hentschel, Ute

2010-01-01

Terrestrial actinomycetes are noteworthy producers of a multitude of antibiotics, however the marine representatives are much less studied in this regard. In this study, 90 actinomycetes were isolated from 11 different species of marine sponges that had been collected from offshore Ras Mohamed (Egypt) and from Rovinj (Croatia). Phylogenetic characterization of the isolates based on 16S rRNA gene sequencing supported their assignment to 18 different actinomycete genera representing seven different suborders. Fourteen putatively novel species were identified based on sequence similarity values below 98.2% to other strains in the NCBI database. A putative new genus related to Rubrobacter was isolated on M1 agar that had been amended with sponge extract, thus highlighting the need for innovative cultivation protocols. Testing for anti-infective activities was performed against clinically relevant, Gram-positive (Enterococcus faecalis, Staphylococcus aureus) and Gram-negative (Escherichia coli, Pseudomonas aeruginosa) bacteria, fungi (Candida albicans) and human parasites (Leishmania major, Trypanosoma brucei). Bioactivities against these pathogens were documented for 10 actinomycete isolates. These results show a high diversity of actinomycetes associated with marine sponges as well as highlight their potential to produce anti-infective agents. PMID:20411105
MALDI-TOF mass spectrometry as a tool for differentiation of Bradyrhizobium species: application to the identification of Lupinus nodulating strains.

PubMed

Sánchez-Juanes, Fernando; Ferreira, Laura; Alonso de la Vega, Pablo; Valverde, Angel; Barrios, Milagros León; Rivas, Raúl; Mateos, Pedro F; Martínez-Molina, Eustoquio; González-Buitrago, José Manuel; Trujillo, Martha E; Velázquez, Encarna

2013-12-01

Genus Bradyrhizobium includes slow growing bacteria able to nodulate different legumes as well as species isolated from plant tumours. The slow growth presented by the members of this genus and the phylogenetic closeness of most of its species difficults their identification. In the present work we applied for the first time Matrix-Assisted Laser Desorption Ionization-Time-of-Flight Mass Spectrometry (MALDI-TOF MS) to the analysis of Bradyrhizobium species after the extension of MALDI Biotyper 2.0 database with the currently valid species of this genus. With this methodology it was possible to identify strains belonging to phylogenetically closely related species of genus Bradyrhizobium allowing the discrimination among species with rrs gene identities higher than 99%. The application of MALDI-TOF MS to strains isolated from nodules of different Lupinus species in diverse geographical locations allowed their correct identification when comparing with the results of rrs gene and ITS analyses. The nodulation of Lupinus gredensis, an endemic species of the west of Spain, by B. canariense supports the European origin of this species. Copyright © 2013. Published by Elsevier GmbH.
Comprehensive evolutionary and phylogenetic analysis of Hepacivirus N (HNV).

PubMed

da Silva, M S; Junqueira, D M; Baumbach, L F; Cibulski, S P; Mósena, A C S; Weber, M N; Silveira, S; de Moraes, G M; Maia, R D; Coimbra, V C S; Canal, C W

2018-05-24

Hepaciviruses (HVs) have been detected in several domestic and wild animals and present high genetic diversity. The actual classification divides the genus Hepacivirus into 14 species (A-N), according to their phylogenetic relationships, including the bovine hepacivirus [Hepacivirus N (HNV)]. In this study, we confirmed HNV circulation in Brazil and sequenced the whole genome of two strains. Based on the current classification of HCV, which is divided into genotypes and subtypes, we analysed all available bovine hepacivirus sequences in the GenBank database and proposed an HNV classification. All of the sequences were grouped into a single genotype, putatively named 'genotype 1'. This genotype can be clearly divided into four subtypes: A and D containing sequences from Germany and Brazil, respectively, and B and C containing Ghanaian sequences. In addition, the NS3-coding region was used to estimate the time to the most recent common ancestor (TMRCA) of each subtype, using a Bayesian approach and a relaxed molecular clock model. The analyses indicated a common origin of the virus circulating in Germany and Brazil. Ghanaian sequences seemed to have an older TMRCA, indicating a long time of circulation of these viruses in the African continent.
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Ian J.; Weyna, Theodore R.; Fong, Stephen S.

Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial darkmore » matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Lastly, our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.« less
Genome-wide identification and analysis of the chicken basic helix-loop-helix factors.

PubMed

Liu, Wu-Yi; Zhao, Chun-Jiang

2010-01-01

Members of the basic helix-loop-helix (bHLH) family of transcription factors play important roles in a wide range of developmental processes. In this study, we conducted a genome-wide survey using the chicken (Gallus gallus) genomic database, and identified 104 bHLH sequences belonging to 42 gene families in an effort to characterize the chicken bHLH transcription factor family. Phylogenetic analyses revealed that chicken has 50, 21, 15, 4, 8, and 3 bHLH members in groups A, B, C, D, E, and F, respectively, while three members belonging to none of these groups were classified as ''orphans". A comparison between chicken and human bHLH repertoires suggested that both organisms have a number of lineage-specific bHLH members in the proteomes. Chromosome distribution patterns and phylogenetic analyses strongly suggest that the bHLH members should have arisen through gene duplication at an early date. Gene Ontology (GO) enrichment statistics showed 51 top GO annotations of biological processes counted in the frequency. The present study deepens our understanding of the chicken bHLH transcription factor family and provides much useful information for further studies using chicken as a model system.
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome

DOE PAGES

Miller, Ian J.; Weyna, Theodore R.; Fong, Stephen S.; ...

2016-09-29

Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial darkmore » matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Lastly, our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.« less
Pulotu: Database of Austronesian Supernatural Beliefs and Practices

PubMed Central

Watts, Joseph; Sheehan, Oliver; Greenhill, Simon J.; Gomes-Ng, Stephanie; Atkinson, Quentin D.; Bulbulia, Joseph; Gray, Russell D.

2015-01-01

Scholars have debated naturalistic theories of religion for thousands of years, but only recently have scientists begun to test predictions empirically. Existing databases contain few variables on religion, and are subject to Galton’s Problem because they do not sufficiently account for the non-independence of cultures or systematically differentiate the traditional states of cultures from their contemporary states. Here we present Pulotu: the first quantitative cross-cultural database purpose-built to test evolutionary hypotheses of supernatural beliefs and practices. The Pulotu database documents the remarkable diversity of the Austronesian family of cultures, which originated in Taiwan, spread west to Madagascar and east to Easter Island–a region covering over half the world’s longitude. The focus of Austronesian beliefs range from localised ancestral spirits to powerful creator gods. A wide range of practices also exist, such as headhunting, elaborate tattooing, and the construction of impressive monuments. Pulotu is freely available, currently contains 116 cultures, and has 80 variables describing supernatural beliefs and practices, as well as social and physical environments. One major advantage of Pulotu is that it has separate sections on the traditional states of cultures, the post-contact history of cultures, and the contemporary states of cultures. A second major advantage is that cultures are linked to a language-based family tree, enabling the use phylogenetic methods, which can be used to address Galton’s Problem by accounting for common ancestry, to infer deep prehistory, and to model patterns of trait evolution over time. We illustrate the power of phylogenetic methods by performing an ancestral state reconstruction on the Pulotu variable “headhunting", finding evidence that headhunting was practiced in proto-Austronesian culture. Quantitative cross-cultural databases explicitly linking cultures to a phylogeny have the potential to revolutionise the field of comparative religious studies in the same way that genetic databases have revolutionised the field of evolutionary biology. PMID:26398231

Pulotu: Database of Austronesian Supernatural Beliefs and Practices.

PubMed

Watts, Joseph; Sheehan, Oliver; Greenhill, Simon J; Gomes-Ng, Stephanie; Atkinson, Quentin D; Bulbulia, Joseph; Gray, Russell D

2015-01-01

Scholars have debated naturalistic theories of religion for thousands of years, but only recently have scientists begun to test predictions empirically. Existing databases contain few variables on religion, and are subject to Galton's Problem because they do not sufficiently account for the non-independence of cultures or systematically differentiate the traditional states of cultures from their contemporary states. Here we present Pulotu: the first quantitative cross-cultural database purpose-built to test evolutionary hypotheses of supernatural beliefs and practices. The Pulotu database documents the remarkable diversity of the Austronesian family of cultures, which originated in Taiwan, spread west to Madagascar and east to Easter Island-a region covering over half the world's longitude. The focus of Austronesian beliefs range from localised ancestral spirits to powerful creator gods. A wide range of practices also exist, such as headhunting, elaborate tattooing, and the construction of impressive monuments. Pulotu is freely available, currently contains 116 cultures, and has 80 variables describing supernatural beliefs and practices, as well as social and physical environments. One major advantage of Pulotu is that it has separate sections on the traditional states of cultures, the post-contact history of cultures, and the contemporary states of cultures. A second major advantage is that cultures are linked to a language-based family tree, enabling the use phylogenetic methods, which can be used to address Galton's Problem by accounting for common ancestry, to infer deep prehistory, and to model patterns of trait evolution over time. We illustrate the power of phylogenetic methods by performing an ancestral state reconstruction on the Pulotu variable "headhunting", finding evidence that headhunting was practiced in proto-Austronesian culture. Quantitative cross-cultural databases explicitly linking cultures to a phylogeny have the potential to revolutionise the field of comparative religious studies in the same way that genetic databases have revolutionised the field of evolutionary biology.
LRR-RLK family from two Citrus species: genome-wide identification and evolutionary aspects.

PubMed

Magalhães, Diogo M; Scholte, Larissa L S; Silva, Nicholas V; Oliveira, Guilherme C; Zipfel, Cyril; Takita, Marco A; De Souza, Alessandra A

2016-08-12

Leucine-rich repeat receptor-like kinases (LRR-RLKs) represent the largest subfamily of plant RLKs. The functions of most LRR-RLKs have remained undiscovered, and a few that have been experimentally characterized have been shown to have important roles in growth and development as well as in defense responses. Although RLK subfamilies have been previously studied in many plants, no comprehensive study has been performed on this gene family in Citrus species, which have high economic importance and are frequent targets for emerging pathogens. In this study, we performed in silico analysis to identify and classify LRR-RLK homologues in the predicted proteomes of Citrus clementina (clementine) and Citrus sinensis (sweet orange). In addition, we used large-scale phylogenetic approaches to elucidate the evolutionary relationships of the LRR-RLKs and further narrowed the analysis to the LRR-XII group, which contains several previously described cell surface immune receptors. We built integrative protein signature databases for Citrus clementina and Citrus sinensis using all predicted protein sequences obtained from whole genomes. A total of 300 and 297 proteins were identified as LRR-RLKs in C. clementina and C. sinensis, respectively. Maximum-likelihood phylogenetic trees were estimated using Arabidopsis LRR-RLK as a template and they allowed us to classify Citrus LRR-RLKs into 16 groups. The LRR-XII group showed a remarkable expansion, containing approximately 150 paralogs encoded in each Citrus genome. Phylogenetic analysis also demonstrated the existence of two distinct LRR-XII clades, each one constituted mainly by RD and non-RD kinases. We identified 68 orthologous pairs from the C. clementina and C. sinensis LRR-XII genes. In addition, among the paralogs, we identified a subset of 78 and 62 clustered genes probably derived from tandem duplication events in the genomes of C. clementina and C. sinensis, respectively. This work provided the first comprehensive evolutionary analysis of the LRR-RLKs in Citrus. A large expansion of LRR-XII in Citrus genomes suggests that it might play a key role in adaptive responses in host-pathogen co-evolution, related to the perennial life cycle and domestication of the citrus crop species.
New Insights into the Diversity of the Genus Faecalibacterium.

PubMed

Benevides, Leandro; Burman, Sriti; Martin, Rebeca; Robert, Véronique; Thomas, Muriel; Miquel, Sylvie; Chain, Florian; Sokol, Harry; Bermudez-Humaran, Luis G; Morrison, Mark; Langella, Philippe; Azevedo, Vasco A; Chatel, Jean-Marc; Soares, Siomar

2017-01-01

Faecalibacterium prausnitzii is a commensal bacterium, ubiquitous in the gastrointestinal tracts of animals and humans. This species is a functionally important member of the microbiota and studies suggest it has an impact on the physiology and health of the host. F. prausnitzii is the only identified species in the genus Faecalibacterium , but a recent study clustered strains of this species in two different phylogroups. Here, we propose the existence of distinct species in this genus through the use of comparative genomics. Briefly, we performed analyses of 16S rRNA gene phylogeny, phylogenomics, whole genome Multi-Locus Sequence Typing (wgMLST), Average Nucleotide Identity (ANI), gene synteny, and pangenome to better elucidate the phylogenetic relationships among strains of Faecalibacterium . For this, we used 12 newly sequenced, assembled, and curated genomes of F. prausnitzii , which were isolated from feces of healthy volunteers from France and Australia, and combined these with published data from 5 strains downloaded from public databases. The phylogenetic analysis of the 16S rRNA sequences, together with the wgMLST profiles and a phylogenomic tree based on comparisons of genome similarity, all supported the clustering of Faecalibacterium strains in different genospecies. Additionally, the global analysis of gene synteny among all strains showed a highly fragmented profile, whereas the intra-cluster analyses revealed larger and more conserved collinear blocks. Finally, ANI analysis substantiated the presence of three distinct clusters-A, B, and C-composed of five, four, and four strains, respectively. The pangenome analysis of each cluster corroborated the classification of these clusters into three distinct species, each containing less variability than that found within the global pangenome of all strains. Here, we propose that comparison of pangenome subsets and their associated α values may be used as an alternative approach, together with ANI, in the in silico classification of new species. Altogether, our results provide evidence not only for the reconsideration of the phylogenetic and genomic relatedness among strains currently assigned to F. prausnitzii , but also the need for lineage (strain-based) differentiation of this taxon to better define how specific members might be associated with positive or negative host interactions.
ESTs and EST-linked polymorphisms for genetic mapping and phylogenetic reconstruction in the guppy, Poecilia reticulata

PubMed Central

Dreyer, Christine; Hoffmann, Margarete; Lanz, Christa; Willing, Eva-Maria; Riester, Markus; Warthmann, Norman; Sprecher, Andrea; Tripathi, Namita; Henz, Stefan R; Weigel, Detlef

2007-01-01

Background The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy. PMID:17686157
Bioinformatic analysis of the nucleotide binding site-encoding disease-resistance genes in foxtail millet (Setaria italica (L.) Beauv.).

PubMed

Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G

2014-08-28

The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.
Reptilian Transcriptomes v2.0: An Extensive Resource for Sauropsida Genomics and Transcriptomics

PubMed Central

Tzika, Athanasia C.; Ullate-Agote, Asier; Grbic, Djordje; Milinkovitch, Michel C.

2015-01-01

Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the “Reptilian Transcriptomes Database 2.0,” which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. PMID:26133641
Identification of functional enolase genes of the silkworm Bombyx mori from public databases with a combination of dry and wet bench processes.

PubMed

Kikuchi, Akira; Nakazato, Takeru; Ito, Katsuhiko; Nojima, Yosui; Yokoyama, Takeshi; Iwabuchi, Kikuo; Bono, Hidemasa; Toyoda, Atsushi; Fujiyama, Asao; Sato, Ryoichi; Tabunoki, Hiroko

2017-01-13

Various insect species have been added to genomic databases over the years. Thus, researchers can easily obtain online genomic information on invertebrates and insects. However, many incorrectly annotated genes are included in these databases, which can prevent the correct interpretation of subsequent functional analyses. To address this problem, we used a combination of dry and wet bench processes to select functional genes from public databases. Enolase is an important glycolytic enzyme in all organisms. We used a combination of dry and wet bench processes to identify functional enolases in the silkworm Bombyx mori (BmEno). First, we detected five annotated enolases from public databases using a Hidden Markov Model (HMM) search, and then through cDNA cloning, Northern blotting, and RNA-seq analysis, we revealed three functional enolases in B. mori: BmEno1, BmEno2, and BmEnoC. BmEno1 contained a conserved key amino acid residue for metal binding and substrate binding in other species. However, BmEno2 and BmEnoC showed a change in this key amino acid. Phylogenetic analysis showed that BmEno2 and BmEnoC were distinct from BmEno1 and other enolases, and were distributed only in lepidopteran clusters. BmEno1 was expressed in all of the tissues used in our study. In contrast, BmEno2 was mainly expressed in the testis with some expression in the ovary and suboesophageal ganglion. BmEnoC was weakly expressed in the testis. Quantitative RT-PCR showed that the mRNA expression of BmEno2 and BmEnoC correlated with testis development; thus, BmEno2 and BmEnoC may be related to lepidopteran-specific spermiogenesis. We identified and characterized three functional enolases from public databases with a combination of dry and wet bench processes in the silkworm B. mori. In addition, we determined that BmEno2 and BmEnoC had species-specific functions. Our strategy could be helpful for the detection of minor genes and functional genes in non-model organisms from public databases.
Analyzing Phylogenetic Trees with Timed and Probabilistic Model Checking: The Lactose Persistence Case Study.

PubMed

Requeno, José Ignacio; Colom, José Manuel

2014-12-01

Model checking is a generic verification technique that allows the phylogeneticist to focus on models and specifications instead of on implementation issues. Phylogenetic trees are considered as transition systems over which we interrogate phylogenetic questions written as formulas of temporal logic. Nonetheless, standard logics become insufficient for certain practices of phylogenetic analysis since they do not allow the inclusion of explicit time and probabilities. The aim of this paper is to extend the application of model checking techniques beyond qualitative phylogenetic properties and adapt the existing logical extensions and tools to the field of phylogeny. The introduction of time and probabilities in phylogenetic specifications is motivated by the study of a real example: the analysis of the ratio of lactose intolerance in some populations and the date of appearance of this phenotype.
Analyzing phylogenetic trees with timed and probabilistic model checking: the lactose persistence case study.

PubMed

Requeno, José Ignacio; Colom, José Manuel

2014-10-23

Model checking is a generic verification technique that allows the phylogeneticist to focus on models and specifications instead of on implementation issues. Phylogenetic trees are considered as transition systems over which we interrogate phylogenetic questions written as formulas of temporal logic. Nonetheless, standard logics become insufficient for certain practices of phylogenetic analysis since they do not allow the inclusion of explicit time and probabilities. The aim of this paper is to extend the application of model checking techniques beyond qualitative phylogenetic properties and adapt the existing logical extensions and tools to the field of phylogeny. The introduction of time and probabilities in phylogenetic specifications is motivated by the study of a real example: the analysis of the ratio of lactose intolerance in some populations and the date of appearance of this phenotype.
Phylogenetic versus functional signals in the evolution of form-function relationships in terrestrial vision.

PubMed

Motani, Ryosuke; Schmitz, Lars

2011-08-01

Phylogeny is deeply pertinent to evolutionary studies. Traits that perform a body function are expected to be strongly influenced by physical "requirements" of the function. We investigated if such traits exhibit phylogenetic signals, and, if so, how phylogenetic noises bias quantification of form-function relationships. A form-function system that is strongly influenced by physics, namely the relationship between eye morphology and visual optics in amniotes, was used. We quantified the correlation between form (i.e., eye morphology) and function (i.e., ocular optics) while varying the level of phylogenetic bias removal through adjusting Pagel's λ. Ocular soft-tissue dimensions exhibited the highest correlation with ocular optics when 1% of phylogenetic bias expected from Brownian motion was removed (i.e., λ= 0.01); the value for hard-tissue data were 8%. A small degree of phylogenetic bias therefore exists in morphology despite of the stringent functional constraints. We also devised a phylogenetically informed discriminant analysis and recorded the effects of phylogenetic bias on this method using the same data. Use of proper λ values during phylogenetic bias removal improved misidentification rates in resulting classifications when prior probabilities were assumed to be equal. Even a small degree of phylogenetic bias affected the classification resulting from phylogenetically informed discriminant analysis. © 2011 The Author(s). Evolution© 2011 The Society for the Study of Evolution.
New family of pectinase genes PGU1b-PGU3b of the pectinolytic yeast Saccharomyces bayanus var. uvarum.

PubMed

Naumov, G I; Shalamitskiy, M Yu; Naumova, E S

2016-03-01

Using yeast genome databases and literature data, we have conducted a phylogenetic analysis of pectinase PGU genes from Saccharomyces strains assigned to the biological species S. arboricola, S. bayanus (var. uvarum), S. cariocanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, and hybrid taxon S. pastorianus (syn. S. carlsbergensis). Single PGU genes were observed in all Saccharomyces species, except S. bayanus. The superfamily of divergent PGU genes has been documented in S. bayanus var. uvarum for the first time. Chromosomal localization of new PGU1b, PGU2b, and PGU3b genes in the yeast S. bayanus var. uvarum has been determined by molecular karyotyping and Southern hybridization.
Visualizing Phylogenetic Treespace Using Cartographic Projections

NASA Astrophysics Data System (ADS)

Sundberg, Kenneth; Clement, Mark; Snell, Quinn

Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger datasets.
BrassiBase: introduction to a novel knowledge database on Brassicaceae evolution.

PubMed

Kiefer, Markus; Schmickl, Roswitha; German, Dmitry A; Mandáková, Terezie; Lysak, Martin A; Al-Shehbaz, Ihsan A; Franzke, Andreas; Mummenhoff, Klaus; Stamatakis, Alexandros; Koch, Marcus A

2014-01-01

The Brassicaceae family (mustards or crucifers) includes Arabidopsis thaliana as one of the most important model species in plant biology and a number of important crop plants such as the various Brassica species (e.g. cabbage, canola and mustard). Moreover, the family comprises an increasing number of species that serve as study systems in many fields of plant science and evolutionary research. However, the systematics and taxonomy of the family are very complex and access to scientifically valuable and reliable information linked to species and genus names and its interpretation are often difficult. BrassiBase is a continuously developing and growing knowledge database (http://brassibase.cos.uni-heidelberg.de) that aims at providing direct access to many different types of information ranging from taxonomy and systematics to phylo- and cytogenetics. Providing critically revised key information, the database intends to optimize comparative evolutionary research in this family and supports the introduction of the Brassicaceae as the model family for evolutionary biology and plant sciences. Some features that should help to accomplish these goals within a comprehensive taxonomic framework have now been implemented in the new version 1.1.9. A 'Phylogenetic Placement Tool' should help to identify critical accessions and germplasm and provide a first visualization of phylogenetic relationships. The 'Cytogenetics Tool' provides in-depth information on genome sizes, chromosome numbers and polyploidy, and sets this information into a Brassicaceae-wide context.
Multi-locus phylogeny of dolphins in the subfamily Lissodelphininae: character synergy improves phylogenetic resolution

PubMed Central

Harlin-Cognato, April D; Honeycutt, Rodney L

2006-01-01

Background Dolphins of the genus Lagenorhynchus are anti-tropically distributed in temperate to cool waters. Phylogenetic analyses of cytochrome b sequences have suggested that the genus is polyphyletic; however, many relationships were poorly resolved. In this study, we present a combined-analysis phylogenetic hypothesis for Lagenorhynchus and members of the subfamily Lissodelphininae, which is derived from two nuclear and two mitochondrial data sets and the addition of 34 individuals representing 9 species. In addition, we characterize with parsimony and Bayesian analyses the phylogenetic utility and interaction of characters with statistical measures, including the utility of highly consistent (non-homoplasious) characters as a conservative measure of phylogenetic robustness. We also explore the effects of removing sources of character conflict on phylogenetic resolution. Results Overall, our study provides strong support for the monophyly of the subfamily Lissodelphininae and the polyphyly of the genus Lagenorhynchus. In addition, the simultaneous parsimony analysis resolved and/or improved resolution for 12 nodes including: (1) L. albirostris, L. acutus; (2) L. obscurus and L. obliquidens; and (3) L. cruciger and L. australis. In addition, the Bayesian analysis supported the monophyly of the Cephalorhynchus, and resolved ambiguities regarding the relationship of L. australis/L. cruciger to other members of the genus Lagenorhynchus. The frequency of highly consistent characters varied among data partitions, but the rate of evolution was consistent within data partitions. Although the control region was the greatest source of character conflict, removal of this data partition impeded phylogenetic resolution. Conclusion The simultaneous analysis approach produced a more robust phylogenetic hypothesis for Lagenorhynchus than previous studies, thus supporting a phylogenetic approach employing multiple data partitions that vary in overall rate of evolution. Even in cases where there was apparent conflict among characters, our data suggest a synergistic interaction in the simultaneous analysis, and speak against a priori exclusion of data because of potential conflicts, primarily because phylogenetic results can be less robust. For example, the removal of the control region, the putative source of character conflict, produced spurious results with inconsistencies among and within topologies from parsimony and Bayesian analyses. PMID:17078887
A methodological investigation of hominoid craniodental morphology and phylogenetics.

PubMed

Bjarnason, Alexander; Chamberlain, Andrew T; Lockwood, Charles A

2011-01-01

The evolutionary relationships of extant great apes and humans have been largely resolved by molecular studies, yet morphology-based phylogenetic analyses continue to provide conflicting results. In order to further investigate this discrepancy we present bootstrap clade support of morphological data based on two quantitative datasets, one dataset consisting of linear measurements of the whole skull from 5 hominoid genera and the second dataset consisting of 3D landmark data from the temporal bone of 5 hominoid genera, including 11 sub-species. Using similar protocols for both datasets, we were able to 1) compare distance-based phylogenetic methods to cladistic parsimony of quantitative data converted into discrete character states, 2) vary outgroup choice to observe its effect on phylogenetic inference, and 3) analyse male and female data separately to observe the effect of sexual dimorphism on phylogenies. Phylogenetic analysis was sensitive to methodological decisions, particularly outgroup selection, where designation of Pongo as an outgroup and removal of Hylobates resulted in greater congruence with the proposed molecular phylogeny. The performance of distance-based methods also justifies their use in phylogenetic analysis of morphological data. It is clear from our analyses that hominoid phylogenetics ought not to be used as an example of conflict between the morphological and molecular, but as an example of how outgroup and methodological choices can affect the outcome of phylogenetic analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.
BrucellaBase: Genome information resource.

PubMed

Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

2016-09-01

Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html. Copyright © 2016 Elsevier B.V. All rights reserved.
Isolation, sequence identification and tissue expression profiles of 3 novel porcine genes: ASPA, NAGA, and HEXA.

PubMed

Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa

2008-01-01

The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes.
Bifidobacterium aquikefiri sp. nov., isolated from water kefir.

PubMed

Laureys, David; Cnockaert, Margo; De Vuyst, Luc; Vandamme, Peter

2016-03-01

A novel Bifidobacterium , strain LMG 28769 T , was isolated from a household water kefir fermentation process. Cells were Gram-stain-positive, non-motile, non-spore-forming, catalase-negative, oxidase-negative and facultatively anaerobic short rods. Analysis of its 16S rRNA gene sequence revealed Bifidobacterium crudilactis and Bifidobacterium psychraerophilum (97.4 and 97.1 % similarity towards the respective type strain sequences) as nearest phylogenetic neighbours. Its assignment to the genus Bifidobacterium was confirmed by the presence of fructose 6-phosphate phosphoketolase activity. Analysis of the hsp60 gene sequence revealed very low similarity with nucleotide sequences in the NCBI nucleotide database. The genotypic and phenotypic analyses allowed the differentiation of strain LMG 28769 T from all recognized Bifidobacterium species. Strain LMG 28769 T ( = CCUG 67145 T = R 54638 T ) therefore represents a novel species, for which the name Bifidobacterium aquikefiri sp. nov. is proposed.
DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.

PubMed

Parson, W; Gusmão, L; Hares, D R; Irwin, J A; Mayr, W R; Morling, N; Pokorak, E; Prinz, M; Salas, A; Schneider, P M; Parsons, T J

2014-11-01

The DNA Commission of the International Society of Forensic Genetics (ISFG) regularly publishes guidelines and recommendations concerning the application of DNA polymorphisms to the question of human identification. Previous recommendations published in 2000 addressed the analysis and interpretation of mitochondrial DNA (mtDNA) in forensic casework. While the foundations set forth in the earlier recommendations still apply, new approaches to the quality control, alignment and nomenclature of mitochondrial sequences, as well as the establishment of mtDNA reference population databases, have been developed. Here, we describe these developments and discuss their application to both mtDNA casework and mtDNA reference population databasing applications. While the generation of mtDNA for forensic casework has always been guided by specific standards, it is now well-established that data of the same quality are required for the mtDNA reference population data used to assess the statistical weight of the evidence. As a result, we introduce guidelines regarding sequence generation, as well as quality control measures based on the known worldwide mtDNA phylogeny, that can be applied to ensure the highest quality population data possible. For both casework and reference population databasing applications, the alignment and nomenclature of haplotypes is revised here and the phylogenetic alignment proffered as acceptable standard. In addition, the interpretation of heteroplasmy in the forensic context is updated, and the utility of alignment-free database searches for unbiased probability estimates is highlighted. Finally, we discuss statistical issues and define minimal standards for mtDNA database searches. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.

PubMed

Sonar, Krushna; Kabra, Ritika; Singh, Shailza

2018-03-12

TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.

The Evolutionary Ecology of Plant Disease: A Phylogenetic Perspective.

PubMed

Gilbert, Gregory S; Parker, Ingrid M

2016-08-04

An explicit phylogenetic perspective provides useful tools for phytopathology and plant disease ecology because the traits of both plants and microbes are shaped by their evolutionary histories. We present brief primers on phylogenetic signal and the analytical tools of phylogenetic ecology. We review the literature and find abundant evidence of phylogenetic signal in pathogens and plants for most traits involved in disease interactions. Plant nonhost resistance mechanisms and pathogen housekeeping functions are conserved at deeper phylogenetic levels, whereas molecular traits associated with rapid coevolutionary dynamics are more labile at branch tips. Horizontal gene transfer disrupts the phylogenetic signal for some microbial traits. Emergent traits, such as host range and disease severity, show clear phylogenetic signals. Therefore pathogen spread and disease impact are influenced by the phylogenetic structure of host assemblages. Phylogenetically rare species escape disease pressure. Phylogenetic tools could be used to develop predictive tools for phytosanitary risk analysis and reduce disease pressure in multispecies cropping systems.
Relationships among genera of the Saccharomycotina from multigene sequence analysis

USDA-ARS?s Scientific Manuscript database

Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...
Phylogenetic inference under varying proportions of indel-induced alignment gaps

PubMed Central

Dwivedi, Bhakti; Gadagkar, Sudhindra R

2009-01-01

Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy. PMID:19698168
Transcriptome Analysis in Sheepgrass (Leymus chinensis). A Dominant Perennial Grass of the Eurasian Steppe

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Shuangyan; Huang, Xin; Yang, Xiaohan

BACKGROUND: Sheepgrass [Leymus chinensis (Trin.) Tzvel.] is an important perennial forage grass across the Eurasian Steppe and is known for its adaptability to various environmental conditions. However, insufficient data resources in public databases for sheepgrass limited our understanding of the mechanism of environmental adaptations, gene discovery and molecular marker development. RESULTS: The transcriptome of sheepgrass was sequenced using Roche 454 pyrosequencing technology. We assembled 952,328 high-quality reads into 87,214 unigenes, including 32,416 contigs and 54,798 singletons. There were 15,450 contigs over 500 bp in length. BLAST searches of our database against Swiss-Prot and NCBI non-redundant protein sequences (nr) databases resultedmore » in the annotation of 54,584 (62.6%) of the unigenes. Gene Ontology (GO) analysis assigned 89,129 GO term annotations for 17,463 unigenes. We identified 11,675 core Poaceae-specific and 12,811 putative sheepgrass-specific unigenes by BLAST searches against all plant genome and transcriptome databases. A total of 2,979 specific freezing-responsive unigenes were found from this RNAseq dataset. We identified 3,818 EST-SSRs in 3,597 unigenes, and some SSRs contained unigenes that were also candidates for freezing-response genes. Characterizations of nucleotide repeats and dominant motifs of SSRs in sheepgrass were also performed. Similarity and phylogenetic analysis indicated that sheepgrass is closely related to barley and wheat. CONCLUSIONS: This research has greatly enriched sheepgrass transcriptome resources. The identified stress-related genes will help us to decipher the genetic basis of the environmental and ecological adaptations of this species and will be used to improve wheat and barley crops through hybridization or genetic transformation. The EST-SSRs reported here will be a valuable resource for future gene-phenotype studies and for the molecular breeding of sheepgrass and other Poaceae species.« less
Polymorphisms and resistance mutations in the protease and reverse transcriptase genes of HIV-1 F subtype Romanian strains.

PubMed

Paraschiv, Simona; Otelea, Dan; Dinu, Magdalena; Maxim, Daniela; Tinischi, Mihaela

2007-03-01

To evaluate the prevalence of resistance mutations in the genome of HIV-1 F subtype strains isolated from Romanian antiretroviral (ARV) treatment-naïve patients and to assess the phylogenetic relatedness of these strains with other HIV-1 strains. Twenty-nine HIV-1 strains isolated from treatment-naïve adolescents (n=15) and adults (n=14) were included in this study. Resistance genotyping was performed by using Big Dye Terminator chemistry provided by the ViroSeq Genotyping System. The sequences of the protease and reverse transcriptase genes were aligned (ClustalW) and a phylogenetic tree was built (MEGA 3 software). For subtyping purposes, all the nucleotide sequences were submitted to the Stanford database. All the studied strains were found to harbor accessory mutations in the protease gene. The most frequent mutation was M36I (29 of 29 strains), followed by L63T, K20R, and L10V. The number of polymorphisms associated with protease inhibitor resistance was different for the two age groups. Intraphylogenetic divergence was greater for adults than for adolescents infected in childhood. All the strains were found to belong to the F1 subtype. The phylogenetic analysis revealed that Romanian strains clustered together, but distinctly from F1 HIV-1 strains isolated in other parts of the world (Brazil, Finland, and Belgium). Protease secondary mutations are present with high frequency in the HIV-1 F subtype strains isolated from Romanian ARV treatment-naïve patients, but no major resistance mutations were found.
Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach

NASA Astrophysics Data System (ADS)

Kotaru, Appala Raju; Joshi, Ramesh C.

Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.
GreenPhylDB v2.0: comparative and functional genomics in plants.

PubMed

Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G

2011-01-01

GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.
Detection and molecular characterization of infectious bronchitis virus isolated from recent outbreaks in broiler flocks in Thailand.

PubMed

Pohuang, Tawatchai; Chansiripornchai, Niwat; Tawatsin, Achara; Sasipreeyajan, Jiroj

2009-09-01

Thirteen field isolates of infectious bronchitis virus (IBV) were isolated from broiler flocks in Thailand between January and June 2008. The 878-bp of the S1 gene covering a hypervariable region was amplified and sequenced. Phylogenetic analysis based on that region revealed that these viruses were separated into two groups (I and II). IBV isolates in group I were not related to other IBV strains published in the GenBank database. Group 1 nucleotide sequence identities were less than 85% and amino acid sequence identities less than 84% in common with IBVs published in the GenBank database. This group likely represents the strains indigenous to Thailand. The isolates in group II showed a close relationship with Chinese IBVs. They had nucleotide sequence identities of 97-98% and amino acid sequence identities 96-98% in common with Chinese IBVs (strain A2, SH and QXIBV). This finding indicated that the recent Thai IBVs evolved separately and at least two groups of viruses are circulating in Thailand.
Harnessing mtDNA variation to resolve ambiguity in ‘Redfish’ sold in Europe

PubMed Central

Moore, Lauren; Pampoulie, Christophe; Di Muri, Cristina; Vandamme, Sara; Mariani, Stefano

2017-01-01

Morphology-based identification of North Atlantic Sebastes has long been controversial and misidentification may produce misleading data, with cascading consequences that negatively affect fisheries management and seafood labelling. North Atlantic Sebastes comprises of four species, commonly known as ‘redfish’, but little is known about the number, identity and labelling accuracy of redfish species sold across Europe. We used a molecular approach to identify redfish species from ‘blind’ specimens to evaluate the performance of the Barcode of Life (BOLD) and Genbank databases, as well as carrying out a market product accuracy survey from retailers across Europe. The conventional BOLD approach proved ambiguous, and phylogenetic analysis based on mtDNA control region sequences provided a higher resolution for species identification. By sampling market products from four countries, we found the presence of two species of redfish (S. norvegicus and S. mentella) and one unidentified Pacific rockfish marketed in Europe. Furthermore, public databases revealed the existence of inaccurate reference sequences, likely stemming from species misidentification from previous studies, which currently hinders the efficacy of DNA methods for the identification of Sebastes market samples. PMID:29018597
Evolution of gastropod mitochondrial genome arrangements

PubMed Central

2008-01-01

Background Gastropod mitochondrial genomes exhibit an unusually great variety of gene orders compared to other metazoan mitochondrial genome such as e.g those of vertebrates. Hence, gastropod mitochondrial genomes constitute a good model system to study patterns, rates, and mechanisms of mitochondrial genome rearrangement. However, this kind of evolutionary comparative analysis requires a robust phylogenetic framework of the group under study, which has been elusive so far for gastropods in spite of the efforts carried out during the last two decades. Here, we report the complete nucleotide sequence of five mitochondrial genomes of gastropods (Pyramidella dolabrata, Ascobulla fragilis, Siphonaria pectinata, Onchidella celtica, and Myosotella myosotis), and we analyze them together with another ten complete mitochondrial genomes of gastropods currently available in molecular databases in order to reconstruct the phylogenetic relationships among the main lineages of gastropods. Results Comparative analyses with other mollusk mitochondrial genomes allowed us to describe molecular features and general trends in the evolution of mitochondrial genome organization in gastropods. Phylogenetic reconstruction with commonly used methods of phylogenetic inference (ME, MP, ML, BI) arrived at a single topology, which was used to reconstruct the evolution of mitochondrial gene rearrangements in the group. Conclusion Four main lineages were identified within gastropods: Caenogastropoda, Vetigastropoda, Patellogastropoda, and Heterobranchia. Caenogastropoda and Vetigastropoda are sister taxa, as well as, Patellogastropoda and Heterobranchia. This result rejects the validity of the derived clade Apogastropoda (Caenogastropoda + Heterobranchia). The position of Patellogastropoda remains unclear likely due to long-branch attraction biases. Within Heterobranchia, the most heterogeneous group of gastropods, neither Euthyneura (because of the inclusion of P. dolabrata) nor Pulmonata (polyphyletic) nor Opisthobranchia (because of the inclusion S. pectinata) were recovered as monophyletic groups. The gene order of the Vetigastropoda might represent the ancestral mitochondrial gene order for Gastropoda and we propose that at least three major rearrangements have taken place in the evolution of gastropods: one in the ancestor of Caenogastropoda, another in the ancestor of Patellogastropoda, and one more in the ancestor of Heterobranchia. PMID:18302768
Evolutionary lineages of marine snails identified using molecular phylogenetics and geometric morphometric analysis of shells.

PubMed

Vaux, Felix; Trewick, Steven A; Crampton, James S; Marshall, Bruce A; Beu, Alan G; Hills, Simon F K; Morgan-Richards, Mary

2018-06-15

The relationship between morphology and inheritance is of perennial interest in evolutionary biology and palaeontology. Using three marine snail genera Penion, Antarctoneptunea and Kelletia, we investigate whether systematics based on shell morphology accurately reflect evolutionary lineages indicated by molecular phylogenetics. Members of these gastropod genera have been a taxonomic challenge due to substantial variation in shell morphology, conservative radular and soft tissue morphology, few known ecological differences, and geographical overlap between numerous species. Sampling all sixteen putative taxa identified across the three genera, we infer mitochondrial and nuclear ribosomal DNA phylogenetic relationships within the group, and compare this to variation in adult shell shape and size. Results of phylogenetic analysis indicate that each genus is monophyletic, although the status of some phylogenetically derived and likely more recently evolved taxa within Penion is uncertain. The recently described species P. lineatus is supported by genetic evidence. Morphology, captured using geometric morphometric analysis, distinguishes the genera and matches the molecular phylogeny, although using the same dataset, species and phylogenetic subclades are not identified with high accuracy. Overall, despite abundant variation, we find that shell morphology accurately reflects genus-level classification and the corresponding deep phylogenetic splits identified in this group of marine snails. Copyright © 2018 Elsevier Inc. All rights reserved.
On the use of cartographic projections in visualizing phylo-genetic tree space

PubMed Central

2010-01-01

Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger data sets. PMID:20529355
Confirmation of Two Sibling Species among Anopheles fluviatilis Mosquitoes in South and Southeastern Iran by Analysis of Cytochrome Oxidase I Gene.

PubMed

Naddaf, Saied Reza; Oshaghi, Mohammad Ali; Vatandoost, Hassan

2012-12-01

Anopheles fluviatilis, one of the major malaria vectors in Iran, is assumed to be a complex of sibling species. The aim of this study was to evaluate Cytochrome oxidase I (COI) gene alongside 28S-D3 as a diagnostic tool for identification of An. fluviatilis sibling species in Iran. DNA sample belonging to 24 An. fluviatilis mosquitoes from different geographical areas in south and southeastern Iran were used for amplification of COI gene followed by sequencing. The 474-475 bp COI sequences obtained in this study were aligned with 59 similar sequences of An. fluviatilis and a sequence of Anopheles minimus, as out group, from GenBank database. The distances between group and individual sequences were calculated and phylogenetic tree for obtained sequences was generated by using Kimura two parameter (K2P) model of neighbor-joining method. Phylogenetic analysis using COI gene grouped members of Fars Province (central Iran) in two distinct clades separate from other Iranian members representing Hormozgan, Kerman, and Sistan va Baluchestan Provinces. The mean distance between Iranian and Indian individuals was 1.66%, whereas the value between Fars Province individuals and the group comprising individuals from other areas of Iran was 2.06%. Presence of 2.06% mean distance between individuals from Fars Province and those from other areas of Iran is indicative of at least two sibling species in An. fluviatilis mosquitoes of Iran. This finding confirms earlier results based on RAPD-PCR and 28S-D3 analysis.
Verification of Ribosomal Proteins of Aspergillus fumigatus for Use as Biomarkers in MALDI-TOF MS Identification.

PubMed

Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi

2016-01-01

We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus .
Mitochondrial DNA identification of game and harvested freshwater fish species.

PubMed

Kyle, C J; Wilson, C C

2007-02-14

The use of DNA in forensics has grown rapidly for human applications along with the concomitant development of bioinformatics and demographic databases to help fully realize the potential of this molecular information. Similar techniques are also used routinely in many wildlife cases, such as species identification in food products, poaching and the illegal trade of endangered species. The use of molecular techniques in forensic cases related to wildlife and the development of associated databases has, however, mainly focused on large mammals with the exception of a few high-profile species. There is a need to develop similar databases for aquatic species for fisheries enforcement, given the large number of exploited and endangered fish species, the intensity of exploitation, and challenges in identifying species and their derived products. We sequenced a 500bp fragment of the mitochondrial cytochrome b gene from representative individuals from 26 harvested fish taxa from Ontario, Canada, focusing on species that support major commercial and recreational fisheries. Ontario provides a unique model system for the development of a fish species database, as the province contains an evolutionarily diverse array of freshwater fish families representing more than one third of all freshwater fish in Canada. Inter- and intraspecific sequence comparisons using phylogenetic analysis and a BLAST search algorithm provided rigorous statistical metrics for species identification. This methodology and these data will aid in fisheries enforcement, providing a tool to easily and accurately identify fish species in enforcement investigations that would have otherwise been difficult or impossible to pursue.
Mucosal and Cutaneous Human Papillomaviruses Detected in Raw Sewages

PubMed Central

La Rosa, Giuseppina; Fratini, Marta; Accardi, Luisa; D'Oro, Graziana; Della Libera, Simonetta; Muscillo, Michele; Di Bonito, Paola

2013-01-01

Epitheliotropic viruses can find their way into sewage. The aim of the present study was to investigate the occurrence, distribution, and genetic diversity of Human Papillomaviruses (HPVs) in urban wastewaters. Sewage samples were collected from treatment plants distributed throughout Italy. The DNA extracted from these samples was analyzed by PCR using five PV-specific sets of primers targeting the L1 (GP5/GP6, MY09/MY11, FAP59/64, SKF/SKR) and E1 regions (PM-A/PM-B), according to the protocols previously validated for the detection of mucosal and cutaneous HPV genotypes. PCR products underwent sequencing analysis and the sequences were aligned to reference genomes from the Papillomavirus Episteme database. Phylogenetic analysis was then performed to assess the genetic relationships among the different sequences and between the sequences of the samples and those of the prototype strains. A broad spectrum of sequences related to mucosal and cutaneous HPV types was detected in 81% of the sewage samples analyzed. Surprisingly, sequences related to the anogenital HPV6 and 11 were detected in 19% of the samples, and sequences related to the “high risk” oncogenic HPV16 were identified in two samples. Sequences related to HPV9, HPV20, HPV25, HPV76, HPV80, HPV104, HPV110, HPV111, HPV120 and HPV145 beta Papillomaviruses were detected in 76% of the samples. In addition, similarity searches and phylogenetic analysis of some sequences suggest that they could belong to putative new genotypes of the beta genus. In this study, for the first time, the presence of HPV viruses strongly related to human cancer is reported in sewage samples. Our data increases the knowledge of HPV genomic diversity and suggests that virological analysis of urban sewage can provide key information useful in supporting epidemiological studies. PMID:23341898
Inferring Evolution of Habitat Usage and Body Size in Endangered, Seasonal Cynopoeciline Killifishes from the South American Atlantic Forest through an Integrative Approach (Cyprinodontiformes: Rivulidae)

PubMed Central

Costa, Wilson J. E. M.

2016-01-01

Cynopoecilines comprise a diversified clade of small killifishes occurring in the Atlantic Forest, one of the most endangered biodiversity hotspots in the world. They are found in temporary pools of savannah-like and dense forest habitats, and most of them are highly threatened with extinction if not already extinct. The greatest gap in our knowledge of cynopoecilines stems from the absence of an integrative approach incorporating molecular phylogenetic data of species still found in their habitats with phylogenetic data taken from the rare and possibly extinct species without accessible molecular information. An integrative analysis combining 115 morphological characters with a multigene dataset of 2,108 bp comprising three nuclear loci (GLYT1, ENC1, Rho), provided a robust phylogeny of cynopoeciline killifishes, which was herein used to attain an accurate phylogenetic placement of nearly extinct species. The analysis indicates that the most recent common ancestor of the Cynopoecilini lived in open vegetation habitats of the Atlantic Forest of eastern Brazil and was a miniature species, reaching between 25 and 28 mm of standard length. The rare cases of cynopoecilines specialized in inhabiting pools within dense forests are interpreted as derived from four independent evolutionary events. Shifts in habitat usage and biogeographic patterns are tentatively associated to Cenozoic paleogeographic events, but the evolutionary history of cynopoecilines may be partially lost by a combination of poor past sampling and recent habitat decline. A sharp evolutionary shift directed to increased body size in a clade encompassing the genera Campellolebias and Cynopoecilus may be related to a parallel acquisition of an internally-fertilizing reproductive strategy, unique among aplocheiloid killifishes. This study reinforces the importance of adding morphological information to molecular databases as a tool to understand the biological complexity of organisms under intense pressure from loss of habitat. PMID:27428070
Inferring Evolution of Habitat Usage and Body Size in Endangered, Seasonal Cynopoeciline Killifishes from the South American Atlantic Forest through an Integrative Approach (Cyprinodontiformes: Rivulidae).

PubMed

Costa, Wilson J E M

2016-01-01

Cynopoecilines comprise a diversified clade of small killifishes occurring in the Atlantic Forest, one of the most endangered biodiversity hotspots in the world. They are found in temporary pools of savannah-like and dense forest habitats, and most of them are highly threatened with extinction if not already extinct. The greatest gap in our knowledge of cynopoecilines stems from the absence of an integrative approach incorporating molecular phylogenetic data of species still found in their habitats with phylogenetic data taken from the rare and possibly extinct species without accessible molecular information. An integrative analysis combining 115 morphological characters with a multigene dataset of 2,108 bp comprising three nuclear loci (GLYT1, ENC1, Rho), provided a robust phylogeny of cynopoeciline killifishes, which was herein used to attain an accurate phylogenetic placement of nearly extinct species. The analysis indicates that the most recent common ancestor of the Cynopoecilini lived in open vegetation habitats of the Atlantic Forest of eastern Brazil and was a miniature species, reaching between 25 and 28 mm of standard length. The rare cases of cynopoecilines specialized in inhabiting pools within dense forests are interpreted as derived from four independent evolutionary events. Shifts in habitat usage and biogeographic patterns are tentatively associated to Cenozoic paleogeographic events, but the evolutionary history of cynopoecilines may be partially lost by a combination of poor past sampling and recent habitat decline. A sharp evolutionary shift directed to increased body size in a clade encompassing the genera Campellolebias and Cynopoecilus may be related to a parallel acquisition of an internally-fertilizing reproductive strategy, unique among aplocheiloid killifishes. This study reinforces the importance of adding morphological information to molecular databases as a tool to understand the biological complexity of organisms under intense pressure from loss of habitat.
Comparative Analysis of Begonia Plastid Genomes and Their Utility for Species-Level Phylogenetics

PubMed Central

Harrison, Nicola; Harrison, Richard J.

2016-01-01

Recent, rapid radiations make species-level phylogenetics difficult to resolve. We used a multiplexed, high-throughput sequencing approach to identify informative genomic regions to resolve phylogenetic relationships at low taxonomic levels in Begonia from a survey of sixteen species. A long-range PCR method was used to generate draft plastid genomes to provide a strong phylogenetic backbone, identify fast evolving regions and provide informative molecular markers for species-level phylogenetic studies in Begonia. PMID:27058864
An attempt to reconstruct phylogenetic relationships within Caribbean nummulitids: simulating relationships and tracing character evolution

NASA Astrophysics Data System (ADS)

Eder, Wolfgang; Ives Torres-Silva, Ana; Hohenegger, Johann

2017-04-01

Phylogenetic analysis and trees based on molecular data are broadly applied and used to infer genetical and biogeographic relationship in recent larger foraminifera. Molecular phylogenetic is intensively used within recent nummulitids, however for fossil representatives these trees are only of minor informational value. Hence, within paleontological studies a phylogenetic approach through morphometric analysis is of much higher value. To tackle phylogenetic relationships within the nummulitid family, a much higher number of morphological character must be measured than are commonly used in biometric studies, where mostly parameters describing embryonic size (e.g., proloculus diameter, deuteroloculus diameter) and/or the marginal spiral (e.g., spiral diagrams, spiral indices) are studied. For this purpose 11 growth-independent and/or growth-invariant characters have been used to describe the morphological variability of equatorial thin sections of seven Carribbean nummulitid taxa (Nummulites striatoreticulatus, N. macgillavry, Palaeonummulites willcoxi, P.floridensis, P. soldadensis, P.trinitatensis and P.ocalanus) and one outgroup taxon (Ranikothalia bermudezi). Using these characters, phylogenetic trees were calculated using a restricted maximum likelihood algorithm (REML), and results are cross-checked by ordination and cluster analysis. Square-change parsimony method has been run to reconstruct ancestral states, as well as to simulate the evolution of the chosen characters along the calculated phylogenetic tree and, independent - contrast analysis was used to estimate confidence intervals. Based on these simulations, phylogenetic tendencies of certain characters proposed for nummulitids (e.g., Cope's rule or nepionic acceleration) can be tested, whether these tendencies are valid for the whole family or only for certain clades. At least, within the Carribean nummulitids, phylogenetic trends along some growth-independent characters of the embryo (e.g., first chamber length and P/D ratio) and some growth-invariant characters of the chamber sequence (e.g., backbend angle, initial chamber base length and chamber length increase) are evident.

Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study

PubMed Central

Poon, Art F. Y.; Gustafson, Réka; Daly, Patricia; Zerr, Laura; Demlow, S. Ellen; Wong, Jason; Woods, Conan K; Hogg, Robert S.; Krajden, Mel; Moore, David; Kendall, Perry; Montaner, Julio S. G.; Harrigan, P. Richard

2016-01-01

Background Due to the rapid evolution of HIV, infections with similar genetic sequences are likely to be related by recent transmission events. Clusters of related infections can represent subpopulations with high rates of HIV transmission. Here we describe the implementation of an automated “near real-time” system using clustering analysis of routinely collected HIV resistance genotypes to monitor and characterize HIV transmission hotspots in British Columbia (BC). Methods A monitoring system was implemented on the BC Drug Treatment Database, which currently holds over 32000 anonymized HIV genotypes for nearly 9000 residents of BC living with HIV. On average, five to six new HIV genotypes are deposited in the database every day, which triggers an automated re-analysis of the entire database. Clusters of five or more individuals were extracted on the basis of short phylogenetic distances between their respective HIV sequences. Monthly reports on the growth and characteristics of clusters were generated by the system and distributed to public health officers. Findings In June 2014, the monitoring system detected the expansion of a cluster by 11 new cases over three months, including eight cases with transmitted drug resistance. This cluster generally comprised young men who have sex with men. The subsequent report precipitated an enhanced public health follow-up to ensure linkage to care and treatment initiation in the affected subpopulation. Of the nine cases associated with this follow-up, all had already been linked to care and five cases had started treatment. Subsequent to the follow-up, three additional cases started treatment and the majority of cases achieved suppressed viral loads. Over the following 12 months, 12 new cases were detected in this cluster with a marked reduction in the onward transmission of drug resistance. Interpretation Our findings demonstrate the first application of an automated phylogenetic system monitoring a clinical database to detect a recent HIV outbreak and support the ensuing public health response. By making secondary use of routinely collected HIV genotypes, this approach is cost-effective, attains near realtime monitoring of new cases, and can be implemented in all settings where HIV genotyping is the standard of care. Funding This work was supported by the BC Centre for Excellence in HIV/AIDS and by grants from the Canadian Institutes for Health Research (CIHR HOP-111406, HOP-107544), the Genome BC, Genome Canada and CIHR Partnership in Genomics and Personalized Health (Large-Scale Applied Research Project HIV142 contract to PRH, JSGM, and AFYP), and by the US National Institute on Drug Abuse (1-R01-DA036307-01, 5-R01-031055-02, R01-DA021525-06, and R01-DA011591). PMID:27126490
A phylogenetic analysis of the megadiverse Chalcidoidea (Hymenoptera)

USDA-ARS?s Scientific Manuscript database

Chalcidoidea (Hymenoptera) are extremely diverse with an estimated 500,000 species. We present the first phylogenetic analysis of the superfamily based on a cladistic analysis of both morphological and molecular data. A total of 233 morphological characters were scored for 300 taxa and 265 genera, a...
Phylogenetic Analysis of Ruminant Theileria spp. from China Based on 28S Ribosomal RNA Gene

PubMed Central

Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze

2013-01-01

Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode. PMID:24327775
Phylogenetic analysis of ruminant Theileria spp. from China based on 28S ribosomal RNA gene.

PubMed

Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze; Yin, Hong; Luo, Jianxun

2013-10-01

Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode.
Phylogeny, host-parasite relationship and zoogeography

PubMed Central

1999-01-01

Phylogeny is the evolutionary history of a group or the lineage of organisms and is reconstructed based on morphological, molecular and other characteristics. The genealogical relationship of a group of taxa is often expressed as a phylogenetic tree. The difficulty in categorizing the phylogeny is mainly due to the existence of frequent homoplasies that deceive observers. At the present time, cladistic analysis is believed to be one of the most effective methods of reconstructing a phylogenetic tree. Excellent computer program software for phylogenetic analysis is available. As an example, cladistic analysis was applied for nematode genera of the family Acuariidae, and the phylogenetic tree formed was compared with the system used currently. Nematodes in the genera Nippostrongylus and Heligmonoides were also analyzed, and the validity of the reconstructed phylogenetic trees was observed from a zoogeographical point of view. Some of the theories of parasite evolution were briefly reviewed as well. Coevolution of parasites and humans was discussed with special reference to the evolutionary relationship between Enterobius and primates. PMID:10634036
Global Diversity and Phylogeny of the Asteroidea (Echinodermata)

PubMed Central

Mah, Christopher L.; Blake, Daniel B.

2012-01-01

Members of the Asteroidea (phylum Echinodermata), popularly known as starfish or sea stars, are ecologically important and diverse members of marine ecosystems in all of the world's oceans. We present a comprehensive overview of diversity and phylogeny as they have figured into the evolution of the Asteroidea from Paleozoic to the living fauna. Living post-Paleozoic asteroids, the Neoasteroidea, are morphologically separate from those in the Paleozoic. Early Paleozoic asteroid faunas were diverse and displayed morphology that foreshadowed later living taxa. Preservation presents significant difficulties, but fossil occurrence and current accounts suggests a diverse Paleozoic fauna, which underwent extinction around the Permian-Triassic interval was followed by re-diversification of at least one surviving lineage. Ongoing phylogenetic classification debates include the status of the Paxillosida and the Concentricycloidea. Fossil and molecular evidence has been and continues to be part of the ongoing evolution of asteroid phylogenetic research. The modern lineages of asteroids include the Valvatacea, the Forcipulatacea, the Spinlosida, and the Velatida. We present an overview of diversity in these taxa, as well as brief notes on broader significance, ecology, and functional morphology of each. Although much asteroid taxonomy is stable, many new taxa remain to be discovered with many new species currently awaiting description. The Goniasteridae is currently one of the most diverse families within the Asteroidea. New data from molecular phylogenetics and the advent of global biodiversity databases, such as the World Asteroidea Database (http://www.marinespecies.org/Asteroidea/) present important new springboards for understanding the global biodiversity and evolution of asteroids. PMID:22563389
Computational-based structural, functional and phylogenetic analysis of Enterobacter phytases.

PubMed

Pramanik, Krishnendu; Kundu, Shreyasi; Banerjee, Sandipan; Ghosh, Pallab Kumar; Maiti, Tushar Kanti

2018-06-01

Myo-inositol hexakisphosphate phosphohydrolases (i.e., phytases) are known to be a very important enzyme responsible for solubilization of insoluble phosphates. In the present study, Enterobacter phytases have characterized by different phylogenetic, structural and functional parameters using some standard bio-computational tools. Results showed that majority of the Enterobacter phytases are acidic in nature as most of the isoelectric points were under 7.0. The aliphatic indices predicted for the selected proteins were below 40 indicating their thermostable nature. The average molecular weight of the proteins was 48 kDa. The lower values of GRAVY of the said proteins implied that they have better interactions with water. Secondary structure prediction revealed that alpha-helical content was highest among the other forms such as sheets, coils, etc. Moreover, the predicted 3D structure of Enterobacter phytases divulged that the proteins consisted of four monomeric polypeptide chains i.e., it was a tetrameric protein. The predicted tertiary model of E. aerogenes (A0A0M3HCJ2) was deposited in Protein Model Database (Acc. No.: PM0080561) for further utilization after a thorough quality check from QMEAN and SAVES server. Functional analysis supported their classification as histidine acid phosphatases. Besides, multiple sequence alignment revealed that "DG-DP-LG" was the most highly conserved residues within the Enterobacter phytases. Thus, the present study will be useful in selecting suitable phytase-producing microbe exclusively for using in the animal food industry as a food additive.
Identification of Streptococcus mitis321A vaccine antigens based on reverse vaccinology

PubMed Central

Zhang, Qiao; Lin, Kexiong; Wang, Changzheng; Xu, Zhi; Yang, Li; Ma, Qianli

2018-01-01

Streptococcus mitis (S. mitis) may transform into highly pathogenic bacteria. The aim of the present study was to identify potential antigen targets for designing an effective vaccine against the pathogenic S. mitis321A. The genome of S. mitis321A was sequenced using an Illumina Hiseq2000 instrument. Subsequently, Glimmer 3.02 and Tandem Repeat Finder (TRF) 4.04 were used to predict genes and tandem repeats, respectively, with DNA sequence function analysis using the Basic Local Alignment Search Tool (BLAST) in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Cluster of Orthologous Groups of proteins (COG) databases. Putative gene antigen candidates were screened with BLAST ahead of phylogenetic tree analysis. The DNA sequence assembly size was 2,110,680 bp with 40.12% GC, 6 scaffolds and 9 contig. Consequently, 1,944 genes were predicted, and 119 TRF, 56 microsatellite DNA, 10 minisatellite DNA and 154 transposons were acquired. The predicted genes were associated with various pathways and functions concerning membrane transport and energy metabolism. Multiple putative genes encoding surface proteins, secreted proteins and virulence factors, as well as essential genes were determined. The majority of essential genes belonged to a phylogenetic lineage, while 321AGL000129 and 321AGL000299 were on the same branch. The current study provided useful information regarding the biological function of the S. mitis321A genome and recommends putative antigen candidates for developing a potent vaccine against S. mitis. PMID:29620181
Genetic variations in the Dravidian population of South West coast of India: Implications in designing case-control studies.

PubMed

D'Cunha, Anitha; Pandit, Lekha; Malli, Chaithra

2017-06-01

Indian data have been largely missing from genome-wide databases that provide information on genetic variations in different populations. This hinders association studies for complex disorders in India. This study was aimed to determine whether the complex genetic structure and endogamy among Indians could potentially influence the design of case-control studies for autoimmune disorders in the south Indian population. A total of 12 single nucleotide variations (SNVs) related to genes associated with autoimmune disorders were genotyped in 370 healthy individuals belonging to six different caste groups in southern India. Allele frequencies were estimated; genetic divergence and phylogenetic relationship within the various caste groups and other HapMap populations were ascertained. Allele frequencies for all genotyped SNVs did not vary significantly among the different groups studied. Wright's FSTwas 0.001 per cent among study population and 0.38 per cent when compared with Gujarati in Houston (GIH) population on HapMap data. The analysis of molecular variance results showed a 97 per cent variation attributable to differences within the study population and <1 per cent variation due to differences between castes. Phylogenetic analysis showed a separation of Dravidian population from other HapMap populations and particularly from GIH population. Despite the complex genetic origins of the Indian population, our study indicated a low level of genetic differentiation among Dravidian language-speaking people of south India. Case-control studies of association among Dravidians of south India may not require stratification based on language and caste.
Epidemiological dynamics of norovirus GII.4 variant New Orleans 2009.

PubMed

Medici, Maria Cristina; Tummolo, Fabio; De Grazia, Simona; Calderaro, Adriana; De Conto, Flora; Terio, Valentina; Chironna, Maria; Bonura, Floriana; Pucci, Marzia; Bányai, Kristián; Martella, Vito; Giammanco, Giovanni Maurizio

2015-09-01

Norovirus (NoV) is one of the major causes of diarrhoeal disease with epidemic, outbreak and sporadic patterns in humans of all ages worldwide. NoVs of genotype GII.4 cause nearly 80-90 % of all NoV infections in humans. Periodically, some GII.4 strains become predominant, generating major pandemic variants. Retrospective analysis of the GII.4 NoV strains detected in Italy between 2007 and 2013 indicated that the pandemic variant New Orleans 2009 emerged in Italy in the late 2009, became predominant in 2010-2011 and continued to circulate in a sporadic fashion until April 2013. Upon phylogenetic analysis based on the small diagnostic regions A and C, the late New Orleans 2009 NoVs circulating during 2011-2013 appeared to be genetically different from the early New Orleans 2009 strains that circulated in 2010. For a selection of strains, a 3.2 kb genome portion at the 3' end was sequenced. In the partial ORF1 and in the full-length ORF2 and ORF3, the 2011-2013 New Orleans NoVs comprised at least three distinct genetic subclusters. By comparison with sequences retrieved from the databases, these subclusters were also found to circulate globally, suggesting that the local circulation reflected repeated introductions of different strains, rather than local selection of novel viruses. Phylogenetic subclustering did not correlate with changes in residues located in predicted putative capsid epitopes, although several changes affected the P2 domain in epitopes A, C, D and E.
Geographic distribution of hepatitis C virus genotype 6 subtypes in Thailand.

PubMed

Akkarathamrongsin, Srunthron; Praianantathavorn, Kesmanee; Hacharoen, Nisachol; Theamboonlers, Apiradee; Tangkijvanich, Pisit; Tanaka, Yasuhito; Mizokami, Masashi; Poovorawan, Yong

2010-02-01

The nucleotide sequence of hepatitis C virus (HCV) genotype 6 found mostly in south China and south-east Asia, displays profound genetic diversity. The aim of this study to determine the genetic variability of HCV genotype 6 (HCV-6) in Thailand and locate the subtype distribution of genotype 6 in various geographic areas. Four hundred nineteen anti-HCV positive serum samples were collected from patients residing in - the central part of the country. HCV RNA positive samples based on reverse transcriptase- polymerase chain reaction (RT-PCR) of the 5'UTR were amplified with primers specific for the core and NS5B regions. Nucleotide sequences of both regions were analyzed for the genotype by phylogenetic analysis. To determine geographic distribution of HCV-6 subtypes, a search of the international database on subtype distribution in the respective countries was conducted. Among 375 HCV RNA positive samples, 71 had HCV-6 based on phylogenetic analysis of partial core and NS5B regions. The subtype distribution in order of predominance was 6f (56%), 6n (22%), 6i (11%), 6j (10%), and 6e (1%). Among the 13 countries with different subtypes of HCV-6, most sequences have been reported from Vietnam. Subtype 6f was found exclusively in Thailand where five distinct HCV-6 subtypes are circulating. HCV-6, which is endemic in south China and south-east Asia, displays profound genetic diversity and may have evolved over a considerable period of time. (c) 2009 Wiley-Liss, Inc.
SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

PubMed Central

2013-01-01

Background Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. Results In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. Conclusion We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets. PMID:24229408
SUNPLIN: simulation with uncertainty for phylogenetic investigations.

PubMed

Martins, Wellington S; Carmo, Welton C; Longo, Humberto J; Rosa, Thierson C; Rangel, Thiago F

2013-11-15

Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.
Rapid birth-and-death evolution of the xenobiotic metabolizing NAT gene family in vertebrates with evidence of adaptive selection

PubMed Central

2013-01-01

Background The arylamine N-acetyltransferases (NATs) are a unique family of enzymes widely distributed in nature that play a crucial role in the detoxification of aromatic amine xenobiotics. Considering the temporal changes in the levels and toxicity of environmentally available chemicals, the metabolic function of NATs is likely to be under adaptive evolution to broaden or change substrate specificity over time, making NATs a promising subject for evolutionary analyses. In this study, we trace the molecular evolutionary history of the NAT gene family during the last ~450 million years of vertebrate evolution and define the likely role of gene duplication, gene conversion and positive selection in the evolutionary dynamics of this family. Results A phylogenetic analysis of 77 NAT sequences from 38 vertebrate species retrieved from public genomic databases shows that NATs are phylogenetically unstable genes, characterized by frequent gene duplications and losses even among closely related species, and that concerted evolution only played a minor role in the patterns of sequence divergence. Local signals of positive selection are detected in several lineages, probably reflecting response to changes in xenobiotic exposure. We then put a special emphasis on the study of the last ~85 million years of primate NAT evolution by determining the NAT homologous sequences in 13 additional primate species. Our phylogenetic analysis supports the view that the three human NAT genes emerged from a first duplication event in the common ancestor of Simiiformes, yielding NAT1 and an ancestral NAT gene which in turn, duplicated in the common ancestor of Catarrhini, giving rise to NAT2 and the NATP pseudogene. Our analysis suggests a main role of purifying selection in NAT1 protein evolution, whereas NAT2 was predicted to mostly evolve under positive selection to change its amino acid sequence over time. These findings are consistent with a differential role of the two human isoenzymes and support the involvement of NAT1 in endogenous metabolic pathways. Conclusions This study provides unequivocal evidence that the NAT gene family has evolved under a dynamic process of birth-and-death evolution in vertebrates, consistent with previous observations made in fungi. PMID:23497148
Occurrence and Identification of Phytophthora spp. Pathogenic to Pear Fruit in Irrigation Water in the Wenatchee River Valley of Washington State.

PubMed

Yamak, F; Peever, T L; Grove, G G; Boal, R J

2002-11-01

ABSTRACT Seven hundred forty-nine isolates of Phytophthora spp. were obtained from irrigation canals in eastern Washington State during the 1992 to 1995 and 1999 growing seasons. Isolates were retrieved using pear baiting techniques. All isolates were pathogenic to pear and were present in irrigation water beginning early in fruit development. Over the course of the 5 year study, 10 and 5% of isolates were identified as P. cactorum and P. citricola, respectively, using morphological criteria. The remaining isolates could not be identified using morphological criteria. Colony morphology of these isolates was characterized during all years of the study. In 1999, more detailed studies utilizing polymerase chain reaction restriction fragment length polymorphism (PCR-RFLP) analysis of entire internal transcribed spacer (ITS) regions (ITS1, 5.8S, and ITS2) of ribosomal DNA for 180 isolates, and sequence analysis of ITS2 for 50 isolates, were used to investigate genetic variation and phylogenetic relationships among isolates. Isolates were divided into 12 groups based on their growth type on corn meal agar. Restriction digestion of the entire ITS region with three enzymes revealed 11 restriction digestion patterns among 180 isolates. PCR-RFLP and sequence data were obtained for 12 reference Phytophthora spp. (two species in each of Waterhouse's six morphological groups). Phylogenetic analysis of ITS2 regions revealed nine clades, each with strong bootstrap support. Molecular analyses revealed 23 isolates that were in the P. gonapodyides clade, 9 in the P. parasitica clade, 1 in the P. cactorum clade, 7 in the P. citricola/capsici clade, and 4 in the P. cambivora/pseudotsugae clade. The three isolates comprising clade 5 were significantly distinct from all other Phytophthora spp. in the databases and may represent a new Phytophthora sp. Colony morphology was not consistently correlated to PCR-RFLP pattern or ITS2 phylogeny, suggesting that the former criterion is insufficient for species identification. The results of this study indicate that at least nine phylogenetically distinct taxa of Phytophthora pathogenic to pear are present in irrigation water in North Central Washington.
OpenFluDB, a database for human and animal influenza virus

PubMed Central

Liechti, Robin; Gleizes, Anne; Kuznetsov, Dmitry; Bougueleret, Lydie; Le Mercier, Philippe; Bairoch, Amos; Xenarios, Ioannis

2010-01-01

Although research on influenza lasted for more than 100 years, it is still one of the most prominent diseases causing half a million human deaths every year. With the recent observation of new highly pathogenic H5N1 and H7N7 strains, and the appearance of the influenza pandemic caused by the H1N1 swine-like lineage, a collaborative effort to share observations on the evolution of this virus in both animals and humans has been established. The OpenFlu database (OpenFluDB) is a part of this collaborative effort. It contains genomic and protein sequences, as well as epidemiological data from more than 27 000 isolates. The isolate annotations include virus type, host, geographical location and experimentally tested antiviral resistance. Putative enhanced pathogenicity as well as human adaptation propensity are computed from protein sequences. Each virus isolate can be associated with the laboratories that collected, sequenced and submitted it. Several analysis tools including multiple sequence alignment, phylogenetic analysis and sequence similarity maps enable rapid and efficient mining. The contents of OpenFluDB are supplied by direct user submission, as well as by a daily automatic procedure importing data from public repositories. Additionally, a simple mechanism facilitates the export of OpenFluDB records to GenBank. This resource has been successfully used to rapidly and widely distribute the sequences collected during the recent human swine flu outbreak and also as an exchange platform during the vaccine selection procedure. Database URL: http://openflu.vital-it.ch. PMID:20624713
Evolution & Phylogenetic Analysis: Classroom Activities for Investigating Molecular & Morphological Concepts

ERIC Educational Resources Information Center

Franklin, Wilfred A.

2010-01-01

In a flexible multisession laboratory, students investigate concepts of phylogenetic analysis at both the molecular and the morphological level. Students finish by conducting their own analysis on a collection of skeletons representing the major phyla of vertebrates, a collection of primate skulls, or a collection of hominid skulls.
Hv 1 Proton Channels in Dinoflagellates: Not Just for Bioluminescence?

PubMed

Kigundu, Gabriel; Cooper, Jennifer L; Smith, Susan M E

2018-04-26

Bioluminescence in dinoflagellates is controlled by H V 1 proton channels. Database searches of dinoflagellate transcriptomes and genomes yielded hits with sequence features diagnostic of all confirmed H V 1, and show that H V 1 is widely distributed in the dinoflagellate phylogeny including the basal species Oxyrrhis marina. Multiple sequence alignments followed by phylogenetic analysis revealed three major subfamilies of H V 1 that do not correlate with presence of theca, autotrophy, geographic location, or bioluminescence. These data suggest that most dinoflagellates express a H V 1 which has a function separate from bioluminescence. Sequence evidence also suggests that dinoflagellates can contain more than one H V 1 gene. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
SNP mining in Crassostrea gigas EST data: transferability to four other Crassostrea species, phylogenetic inferences and outlier SNPs under selection.

PubMed

Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

2014-01-01

Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies.
Symbiosis between hydra and chlorella: molecular phylogenetic analysis and experimental study provide insight into its origin and evolution.

PubMed

Kawaida, Hitomi; Ohba, Kohki; Koutake, Yuhki; Shimizu, Hiroshi; Tachida, Hidenori; Kobayakawa, Yoshitaka

2013-03-01

Although many physiological studies have been reported on the symbiosis between hydra and green algae, very little information from a molecular phylogenetic aspect of symbiosis is available. In order to understand the origin and evolution of symbiosis between the two organisms, we compared the phylogenetic relationships among symbiotic green algae with the phylogenetic relationships among host hydra strains. To do so, we reconstructed molecular phylogenetic trees of several strains of symbiotic chlorella harbored in the endodermal epithelial cells of viridissima group hydra strains and investigated their congruence with the molecular phylogenetic trees of the host hydra strains. To examine the species specificity between the host and the symbiont with respect to the genetic distance, we also tried to introduce chlorella strains into two aposymbiotic strains of viridissima group hydra in which symbiotic chlorella had been eliminated in advance. We discussed the origin and history of symbiosis between hydra and green algae based on the analysis. Copyright © 2012 Elsevier Inc. All rights reserved.

Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

PubMed Central

Pride, David T; Schoenfeld, Thomas

2008-01-01

Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.

PubMed

Pride, David T; Schoenfeld, Thomas

2008-09-17

Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
Canis mtDNA HV1 database: a web-based tool for collecting and surveying Canis mtDNA HV1 haplotype in public database.

PubMed

Thai, Quan Ke; Chung, Dung Anh; Tran, Hoang-Dung

2017-06-26

Canine and wolf mitochondrial DNA haplotypes, which can be used for forensic or phylogenetic analyses, have been defined in various schemes depending on the region analyzed. In recent studies, the 582 bp fragment of the HV1 region is most commonly used. 317 different canine HV1 haplotypes have been reported in the rapidly growing public database GenBank. These reported haplotypes contain several inconsistencies in their haplotype information. To overcome this issue, we have developed a Canis mtDNA HV1 database. This database collects data on the HV1 582 bp region in dog mitochondrial DNA from the GenBank to screen and correct the inconsistencies. It also supports users in detection of new novel mutation profiles and assignment of new haplotypes. The Canis mtDNA HV1 database (CHD) contains 5567 nucleotide entries originating from 15 subspecies in the species Canis lupus. Of these entries, 3646 were haplotypes and grouped into 804 distinct sequences. 319 sequences were recognized as previously assigned haplotypes, while the remaining 485 sequences had new mutation profiles and were marked as new haplotype candidates awaiting further analysis for haplotype assignment. Of the 3646 nucleotide entries, only 414 were annotated with correct haplotype information, while 3232 had insufficient or lacked haplotype information and were corrected or modified before storing in the CHD. The CHD can be accessed at http://chd.vnbiology.com . It provides sequences, haplotype information, and a web-based tool for mtDNA HV1 haplotyping. The CHD is updated monthly and supplies all data for download. The Canis mtDNA HV1 database contains information about canine mitochondrial DNA HV1 sequences with reconciled annotation. It serves as a tool for detection of inconsistencies in GenBank and helps identifying new HV1 haplotypes. Thus, it supports the scientific community in naming new HV1 haplotypes and to reconcile existing annotation of HV1 582 bp sequences.
Phylogenetic Characterization of Transport Protein Superfamilies: Superiority of SuperfamilyTree Programs over Those Based on Multiple Alignments

PubMed Central

Chen, Jonathan S.; Reddy, Vamsee; Chen, Joshua H.; Shlykov, Maksim A.; Zheng, Wei Hao; Cho, Jaehoon; Yen, Ming Ren; Saier, Milton H.

2012-01-01

Transport proteins function in the translocation of ions, solutes and macromolecules across cellular and organellar membranes. These integral membrane proteins fall into >600 families as tabulated in the Transporter Classification Database (www.tcdb.org). Recent studies, some of which are reported here, define distant phylogenetic relationships between families with the creation of superfamilies. Several of these are analyzed using a novel set of programs designed to allow reliable prediction of phylogenetic trees when sequence divergence is too great to allow the use of multiple alignments. These new programs, called SuperfamilyTree1 and 2 (SFT1 and 2), allow display of protein and family relationships, respectively, based on thousands of comparative BLAST scores rather than multiple alignments. Superfamilies analyzed include: (1) Aerolysins, (2) RTX Toxins, (3) Defensins, (4) Ion Transporters, (5) Bile/Arsenite/Riboflavin Transporters, (6) Cation: Proton Antiporters, and (7) the Glucose/Fructose/Lactose superfamily within the prokaryotic phosphoenol pyruvate-dependent Phosphotransferase System. In addition to defining the phylogenetic relationships of the proteins and families within these seven superfamilies, evidence is provided showing that the SFT programs outperform programs that are based on multiple alignments whenever sequence divergence of superfamily members is extensive. The SFT programs should be applicable to virtually any superfamily of proteins or nucleic acids. PMID:22286036
Soft-tissue anatomy of the extant hominoids: a review and phylogenetic analysis.

PubMed

Gibbs, S; Collard, M; Wood, B

2002-01-01

This paper reports the results of a literature search for information about the soft-tissue anatomy of the extant non-human hominoid genera, Pan, Gorilla, Pongo and Hylobates, together with the results of a phylogenetic analysis of these data plus comparable data for Homo. Information on the four extant non-human hominoid genera was located for 240 out of the 1783 soft-tissue structures listed in the Nomina Anatomica. Numerically these data are biased so that information about some systems (e.g. muscles) and some regions (e.g. the forelimb) are over-represented, whereas other systems and regions (e.g. the veins and the lymphatics of the vascular system, the head region) are either under-represented or not represented at all. Screening to ensure that the data were suitable for use in a phylogenetic analysis reduced the number of eligible soft-tissue structures to 171. These data, together with comparable data for modern humans, were converted into discontinuous character states suitable for phylogenetic analysis and then used to construct a taxon-by-character matrix. This matrix was used in two tests of the hypothesis that soft-tissue characters can be relied upon to reconstruct hominoid phylogenetic relationships. In the first, parsimony analysis was used to identify cladograms requiring the smallest number of character state changes. In the second, the phylogenetic bootstrap was used to determine the confidence intervals of the most parsimonious clades. The parsimony analysis yielded a single most parsimonious cladogram that matched the molecular cladogram. Similarly the bootstrap analysis yielded clades that were compatible with the molecular cladogram; a (Homo, Pan) clade was supported by 95% of the replicates, and a (Gorilla, Pan, Homo) clade by 96%. These are the first hominoid morphological data to provide statistically significant support for the clades favoured by the molecular evidence.
Molecular identification of poisonous mushrooms using nuclear ITS region and peptide toxins: a retrospective study on fatal cases in Thailand.

PubMed

Parnmen, Sittiporn; Sikaphan, Sujitra; Leudang, Siriwan; Boonpratuang, Thitiya; Rangsiruji, Achariya; Naksuwankul, Khwanruan

2016-02-01

Cases of mushroom poisoning in Thailand have increased annually. During 2008 to 2014, the cases reported to the National Institute of Health included 57 deaths; at least 15 died after ingestion of amanitas, the most common lethal wild mushrooms inhabited. Hence, the aims of this study were to identify mushroom samples from nine clinically reported cases during the 7-year study period based on nuclear ITS sequence data and diagnose lethal peptide toxins using a reversed phase LC-MS method. Nucleotide similarity was identified using BLAST search of the NCBI database and the Barcode of Life Database (BOLD). Clade characterization was performed by maximum likelihood and Bayesian phylogenetic approaches. Based on BLAST and BOLD reference databases our results yielded high nucleotide similarities of poisonous mushroom samples to A. exitialis and A. fuliginea. Detailed phylogenetic analyses showed that all mushroom samples fall into their current classification. Detection of the peptide toxins revealed the presence of amatoxins and phallotoxins in A. exitialis and A. fuliginea. In addition, toxic α-amanitin was identified in a new provisional species, Amanita sp.1, with the highest toxin quantity. Molecular identification confirmed that the mushrooms ingested by the patients were members of the lethal amanitas in the sections Amanita and Phalloideae. In Thailand, the presence of A. exitialis was reported here for the first time and all three poisonous mushroom species provided new and informative data for clinical studies.
cDNA identification, comparison and phylogenetic aspects of lombricine kinase from two oligochaete species.

PubMed

Doumen, Chris

2010-06-01

Creatine kinase and arginine kinase are the typical representatives of an eight-member phosphagen kinase family, which play important roles in the cellular energy metabolism of animals. The phylum Annelida underwent a series of evolutionary processes that resulted in rapid divergence and radiation of these enzymes, producing the greatest diversity of the phosphagen kinases within this phylum. Lombricine kinase (EC 2.7.3.5) is one of such enzymes and sequence information is rather limited compared to other phosphagen kinases. This study presents data on the cDNA sequences of lombricine kinase from two oligochaete species, the California blackworm (Lumbriculus variegatus) and the sludge worm (Tubifex tubifex). The deduced amino acid sequences are analyzed and compared with other selected phosphagen kinases, including two additional lombricine kinase sequences extracted from DNA databases and provide further insights in the evolution and position of these enzymes within the phosphagen kinase family. The data confirms the presence of a deleted region within the flexible loop (the GS region) of all six examined lombricine kinases. A phylogenetic analysis of these six lombricine kinases clearly positions the enzymes together in a small subcluster within the larger creatine kinase (EC 2.7.3.2) clade. 2010. Published by Elsevier Inc.
Description of new genera and species of marine cyanobacteria from the Portuguese Atlantic coast.

PubMed

Brito, Ângela; Ramos, Vitor; Mota, Rita; Lima, Steeve; Santos, Arlete; Vieira, Jorge; Vieira, Cristina P; Kaštovský, Jan; Vasconcelos, Vitor M; Tamagnini, Paula

2017-06-01

Aiming at increasing the knowledge on marine cyanobacteria from temperate regions, we previously isolated and characterized 60 strains from the Portuguese foreshore and evaluate their potential to produce secondary metabolites. About 15% of the obtained 16S rRNA gene sequences showed less than 97% similarity to sequences in the databases revealing novel biodiversity. Herein, seven of these strains were extensively characterized and their classification was re-evaluated. The present study led to the proposal of five new taxa, three genera (Geminobacterium, Lusitaniella, and Calenema) and two species (Hyella patelloides and Jaaginema litorale). Geminobacterium atlanticum LEGE 07459 is a chroococcalean that shares morphological characteristics with other unicellular cyanobacterial genera but has a distinct phylogenetic position and particular ultrastructural features. The description of the Pleurocapsales Hyella patelloides LEGE 07179 includes novel molecular data for members of this genus. The filamentous isolates of Lusitaniella coriacea - LEGE 07167, 07157 and 06111 - constitute a very distinct lineage, and seem to be ubiquitous on the Portuguese coast. Jaaginema litorale LEGE 07176 has distinct characteristics compared to their marine counterparts, and our analysis indicates that this genus is polyphyletic. The Synechococcales Calenema singularis possess wider trichomes than Leptolyngbya, and its phylogenetic position reinforces the establishment of this new genus. Copyright © 2017 Elsevier Inc. All rights reserved.
The Trichoptera barcode initiative: a strategy for generating a species-level Tree of Life.

PubMed

Zhou, Xin; Frandsen, Paul B; Holzenthal, Ralph W; Beet, Clare R; Bennett, Kristi R; Blahnik, Roger J; Bonada, Núria; Cartwright, David; Chuluunbat, Suvdtsetseg; Cocks, Graeme V; Collins, Gemma E; deWaard, Jeremy; Dean, John; Flint, Oliver S; Hausmann, Axel; Hendrich, Lars; Hess, Monika; Hogg, Ian D; Kondratieff, Boris C; Malicky, Hans; Milton, Megan A; Morinière, Jérôme; Morse, John C; Mwangi, François Ngera; Pauls, Steffen U; Gonzalez, María Razo; Rinne, Aki; Robinson, Jason L; Salokannel, Juha; Shackleton, Michael; Smith, Brian; Stamatakis, Alexandros; StClair, Ros; Thomas, Jessica A; Zamora-Muñoz, Carmen; Ziesmann, Tanja; Kjer, Karl M

2016-09-05

DNA barcoding was intended as a means to provide species-level identifications through associating DNA sequences from unknown specimens to those from curated reference specimens. Although barcodes were not designed for phylogenetics, they can be beneficial to the completion of the Tree of Life. The barcode database for Trichoptera is relatively comprehensive, with data from every family, approximately two-thirds of the genera, and one-third of the described species. Most Trichoptera, as with most of life's species, have never been subjected to any formal phylogenetic analysis. Here, we present a phylogeny with over 16 000 unique haplotypes as a working hypothesis that can be updated as our estimates improve. We suggest a strategy of implementing constrained tree searches, which allow larger datasets to dictate the backbone phylogeny, while the barcode data fill out the tips of the tree. We also discuss how this phylogeny could be used to focus taxonomic attention on ambiguous species boundaries and hidden biodiversity. We suggest that systematists continue to differentiate between 'Barcode Index Numbers' (BINs) and 'species' that have been formally described. Each has utility, but they are not synonyms. We highlight examples of integrative taxonomy, using both barcodes and morphology for species description.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.
Venom Gland Transcriptomic and Proteomic Analyses of the Enigmatic Scorpion Superstitionia donensis (Scorpiones: Superstitioniidae), with Insights on the Evolution of Its Venom Components.

PubMed

Santibáñez-López, Carlos E; Cid-Uribe, Jimena I; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

2016-12-09

Venom gland transcriptomic and proteomic analyses have improved our knowledge on the diversity of the heterogeneous components present in scorpion venoms. However, most of these studies have focused on species from the family Buthidae. To gain insights into the molecular diversity of the venom components of scorpions belonging to the family Superstitioniidae, one of the neglected scorpion families, we performed a transcriptomic and proteomic analyses for the species Superstitionia donensis . The total mRNA extracted from the venom glands of two specimens was subjected to massive sequencing by the Illumina protocol, and a total of 219,073 transcripts were generated. We annotated 135 transcripts putatively coding for peptides with identity to known venom components available from different protein databases. Fresh venom collected by electrostimulation was analyzed by LC-MS/MS allowing the identification of 26 distinct components with sequences matching counterparts from the transcriptomic analysis. In addition, the phylogenetic affinities of the found putative calcins, scorpines, La1-like peptides and potassium channel κ toxins were analyzed. The first three components are often reported as ubiquitous in the venom of different families of scorpions. Our results suggest that, at least calcins and scorpines, could be used as molecular markers in phylogenetic studies of scorpion venoms.
Venom Gland Transcriptomic and Proteomic Analyses of the Enigmatic Scorpion Superstitionia donensis (Scorpiones: Superstitioniidae), with Insights on the Evolution of Its Venom Components

PubMed Central

Santibáñez-López, Carlos E.; Cid-Uribe, Jimena I.; Batista, Cesar V. F.; Ortiz, Ernesto; Possani, Lourival D.

2016-01-01

Venom gland transcriptomic and proteomic analyses have improved our knowledge on the diversity of the heterogeneous components present in scorpion venoms. However, most of these studies have focused on species from the family Buthidae. To gain insights into the molecular diversity of the venom components of scorpions belonging to the family Superstitioniidae, one of the neglected scorpion families, we performed a transcriptomic and proteomic analyses for the species Superstitionia donensis. The total mRNA extracted from the venom glands of two specimens was subjected to massive sequencing by the Illumina protocol, and a total of 219,073 transcripts were generated. We annotated 135 transcripts putatively coding for peptides with identity to known venom components available from different protein databases. Fresh venom collected by electrostimulation was analyzed by LC-MS/MS allowing the identification of 26 distinct components with sequences matching counterparts from the transcriptomic analysis. In addition, the phylogenetic affinities of the found putative calcins, scorpines, La1-like peptides and potassium channel κ toxins were analyzed. The first three components are often reported as ubiquitous in the venom of different families of scorpions. Our results suggest that, at least calcins and scorpines, could be used as molecular markers in phylogenetic studies of scorpion venoms. PMID:27941686
Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

PubMed

Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

2018-06-01

Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Tracking the Origin and Deciphering the Phylogenetic Relationship of Porcine Epidemic Diarrhea Virus in Ecuador

PubMed Central

Barrera, Maritza; Garrido-Haro, Ana; Vaca, María S.; Granda, Danilo; Acosta-Batallas, Alfredo

2017-01-01

In 2010, new Chinese strains of porcine epidemic diarrhea virus (PEDV), clinically more severe than the classical strains, emerged. These strains were spread to United States in 2013 through an intercontinental transmission from China with further spreading across the world, evidencing the emergent nature of these strains. In the present study, an analysis of PEDV field sequences from Ecuador was conducted by comparing all the PEDV S gene sequences available in the GenBank database. Phylogenetic comparisons and Bayesian phylogeographic inference based on complete S gene sequences were also conducted to track the origin and putative route of PEDV. The sequence from the PED-outbreak in Ecuador was grouped into the clade II of PEDV genogroup 2a together with other sequences of isolates from Mexico, Canada, and United States. The phylogeographic study revealed the emergence of the Chinese PEDV strains, followed by spreading to US in 2013, from US to Korea, and later the introduction of PEDV to Canada, Mexico, and Ecuador directly from the US. The sources of imports of live swine in Ecuador in 2014 were mainly from Chile and US. Thus, this movement of pigs is suggested as the main way for introducing PEDV to Ecuador. PMID:29379796
A taxonomic and phylogenetic re-appraisal of the genus Curvularia

USDA-ARS?s Scientific Manuscript database

Species of Curvularia are important plant and human pathogens worldwide. In this study, the genus Curvularia is re-assessed based on molecular phylogenetic analysis and morphological observations of available isolates and specimens. A multi-gene phylogenetic tree inferred from ITS, TEF and GPDH gene...
Proteinortho: detection of (co-)orthologs in large-scale analysis.

PubMed

Lechner, Marcus; Findeiss, Sven; Steiner, Lydia; Marz, Manja; Stadler, Peter F; Prohaska, Sonja J

2011-04-28

Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
Design and implementation of a database for Brucella melitensis genome annotation.

PubMed

De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

2008-03-18

The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.
Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.

PubMed

Pineda-Peña, Andrea-Clemencia; Faria, Nuno Rodrigues; Imbrechts, Stijn; Libin, Pieter; Abecasis, Ana Barroso; Deforche, Koen; Gómez-López, Arley; Camacho, Ricardo J; de Oliveira, Tulio; Vandamme, Anne-Mieke

2013-10-01

To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Phylogenetic analysis of different breeds of domestic chickens in selected area of Peninsular Malaysia inferred from partial cytochrome b gene information and RAPD markers.

PubMed

Yap, Fook Choy; Yan, Yap Jin; Loon, Kiung Teh; Zhen, Justina Lee Ning; Kamau, Nelly Warau; Kumaran, Jayaraj Vijaya

2010-10-01

The present investigation was carried out in an attempt to study the phylogenetic analysis of different breeds of domestic chickens in Peninsular Malaysia inferred from partial cytochrome b gene information and random amplified polymorphic DNA (RAPD) markers. Phylogenetic analysis using both neighbor-joining (NJ) and maximum parsimony (MP) methods produced three clusters that encompassed Type-I village chickens, the red jungle fowl subspecies and the Japanese Chunky broilers. The phylogenetic analysis also revealed that majority of the Malaysian commercial chickens were randomly assembled with the Type-II village chickens. In RAPD assay, phylogenetic analysis using neighbor-joining produced six clusters that were completely distinguished based on the locality of chickens. High levels of genetic variations were observed among the village chickens, the commercial broilers, and between the commercial broilers and layer chickens. In this study, it was found that Type-I village chickens could be distinguished from the commercial chickens and Type-II village chickens at the position of the 27th nucleotide of the 351 bp cytochrome b gene. This study also revealed that RAPD markers were unable to differentiate the type of chickens, but it showed the effectiveness of RAPD in evaluating the genetic variation and the genetic relationships between chicken lines and populations.
Phylogenetic analysis of human immunodeficiency virus type 2 isolated from Cuban individuals.

PubMed

Machado, Liuber Y; Díaz, Héctor M; Noa, Enrique; Martín, Dayamí; Blanco, Madeline; Díaz, Dervel F; Sánchez, Yordank R; Nibot, Carmen; Sánchez, Lourdes; Dubed, Marta

2014-08-01

The presence of infection by human immunodeficiency virus type 2 (HIV-2) in Cuba has been previously documented. However, genetic information on the strains that circulate in the Cuban people is still unknown. The present work constitutes the first study concerning the phylogenetic relationship of HIV-2 Cuban isolates conducted on 13 Cuban patients who were diagnosed with HIV-2. The env sequences were analyzed for the construction of a phylogenetic tree with reference sequences of HIV-2. Phylogenetic analysis of the env gene showed that all the Cuban sequences clustered in group A of HIV-2. The analysis indicated several independent introductions of HIV-2 into Cuba. The results of the study will reinforce the program on the epidemiological surveillance of the infection in Cuba and make possible further molecular evolutionary studies.
A sequence database allowing automated genotyping of Classical swine fever virus isolates.

PubMed

Dreier, Sabrina; Zimmermann, Bernd; Moennig, Volker; Greiser-Wilke, Irene

2007-03-01

Classical swine fever (CSF) is a highly contagious viral disease of pigs. According to the OIE classification of diseases it is classified as a notifiable (previously List A) disease, thus having the potential for causing severe socio-economic problems and affecting severely the international trade of pigs and pig products. Effective control measures are compulsory, and to expose weaknesses a reliable tracing of the spread of the virus is necessary. Genetic typing has proved to be the method of choice. However, genotyping involves the use of multiple software applications, which is laborious and complex. The implementation of a sequence database, which is accessible by the World Wide Web with the option to type automatically new CSF virus isolates once the sequence is available is described. The sequence to be typed is tested for correct orientation and, if necessary, adjusted to the right length. The alignment and the neighbor-joining phylogenetic analysis with a standard set of sequences can then be calculated. The results are displayed as a graph. As an example, the determination is shown of the genetic subgroup of the isolate obtained from the outbreaks registered in Russia, in 2005. After registration (Irene.greiser-wilke@tiho-hannover.de) the database including the module for genotyping are accessible under http://viro08.tiho-hannover.de/eg/eurl_virus_db.htm.

Estimating Bacterial Diversity for Ecological Studies: Methods, Metrics, and Assumptions

PubMed Central

Birtel, Julia; Walser, Jean-Claude; Pichon, Samuel; Bürgmann, Helmut; Matthews, Blake

2015-01-01

Methods to estimate microbial diversity have developed rapidly in an effort to understand the distribution and diversity of microorganisms in natural environments. For bacterial communities, the 16S rRNA gene is the phylogenetic marker gene of choice, but most studies select only a specific region of the 16S rRNA to estimate bacterial diversity. Whereas biases derived from from DNA extraction, primer choice and PCR amplification are well documented, we here address how the choice of variable region can influence a wide range of standard ecological metrics, such as species richness, phylogenetic diversity, β-diversity and rank-abundance distributions. We have used Illumina paired-end sequencing to estimate the bacterial diversity of 20 natural lakes across Switzerland derived from three trimmed variable 16S rRNA regions (V3, V4, V5). Species richness, phylogenetic diversity, community composition, β-diversity, and rank-abundance distributions differed significantly between 16S rRNA regions. Overall, patterns of diversity quantified by the V3 and V5 regions were more similar to one another than those assessed by the V4 region. Similar results were obtained when analyzing the datasets with different sequence similarity thresholds used during sequences clustering and when the same analysis was used on a reference dataset of sequences from the Greengenes database. In addition we also measured species richness from the same lake samples using ARISA Fingerprinting, but did not find a strong relationship between species richness estimated by Illumina and ARISA. We conclude that the selection of 16S rRNA region significantly influences the estimation of bacterial diversity and species distributions and that caution is warranted when comparing data from different variable regions as well as when using different sequencing techniques. PMID:25915756
GPSit: An automated method for evolutionary analysis of nonculturable ciliated microeukaryotes.

PubMed

Chen, Xiao; Wang, Yurui; Sheng, Yalan; Warren, Alan; Gao, Shan

2018-05-01

Microeukaryotes are among the most important components of the microbial food web in almost all aquatic and terrestrial ecosystems worldwide. In order to gain a better understanding their roles and functions in ecosystems, sequencing coupled with phylogenomic analyses of entire genomes or transcriptomes is increasingly used to reconstruct the evolutionary history and classification of these microeukaryotes and thus provide a more robust framework for determining their systematics and diversity. More importantly, phylogenomic research usually requires high levels of hands-on bioinformatics experience. Here, we propose an efficient automated method, "Guided Phylogenomic Search in trees" (GPSit), which starts from predicted protein sequences of newly sequenced species and a well-defined customized orthologous database. Compared with previous protocols, our method streamlines the entire workflow by integrating all essential and other optional operations. In so doing, the manual operation time for reconstructing phylogenetic relationships is reduced from days to several hours, compared to other methods. Furthermore, GPSit supports user-defined parameters in most steps and thus allows users to adapt it to their studies. The effectiveness of GPSit is demonstrated by incorporating available online data and new single-cell data of three nonculturable marine ciliates (Anteholosticha monilata, Deviata sp. and Diophrys scutum) under moderate sequencing coverage (~5×). Our results indicate that the former could reconstruct robust "deep" phylogenetic relationships while the latter reveals the presence of intermediate taxa in shallow relationships. Based on empirical phylogenomic data, we also used GPSit to evaluate the impact of different levels of missing data on two commonly used methods of phylogenetic analyses, maximum likelihood (ML) and Bayesian inference (BI) methods. We found that BI is less sensitive to missing data when fast-evolving sites are removed. © 2018 John Wiley & Sons Ltd.
Phylogenetic and experimental characterization of an acyl-ACP thioesterase family reveals significant diversity in enzymatic specificity and activity.

PubMed

Jing, Fuyuan; Cantu, David C; Tvaruzkova, Jarmila; Chipman, Jay P; Nikolau, Basil J; Yandeau-Nelson, Marna D; Reilly, Peter J

2011-08-10

Acyl-acyl carrier protein thioesterases (acyl-ACP TEs) catalyze the hydrolysis of the thioester bond that links the acyl chain to the sulfhydryl group of the phosphopantetheine prosthetic group of ACP. This reaction terminates acyl chain elongation of fatty acid biosynthesis, and in plant seeds it is the biochemical determinant of the fatty acid compositions of storage lipids. To explore acyl-ACP TE diversity and to identify novel acyl ACP-TEs, 31 acyl-ACP TEs from wide-ranging phylogenetic sources were characterized to ascertain their in vivo activities and substrate specificities. These acyl-ACP TEs were chosen by two different approaches: 1) 24 TEs were selected from public databases on the basis of phylogenetic analysis and fatty acid profile knowledge of their source organisms; and 2) seven TEs were molecularly cloned from oil palm (Elaeis guineensis), coconut (Cocos nucifera) and Cuphea viscosissima, organisms that produce medium-chain and short-chain fatty acids in their seeds. The in vivo substrate specificities of the acyl-ACP TEs were determined in E. coli. Based on their specificities, these enzymes were clustered into three classes: 1) Class I acyl-ACP TEs act primarily on 14- and 16-carbon acyl-ACP substrates; 2) Class II acyl-ACP TEs have broad substrate specificities, with major activities toward 8- and 14-carbon acyl-ACP substrates; and 3) Class III acyl-ACP TEs act predominantly on 8-carbon acyl-ACPs. Several novel acyl-ACP TEs act on short-chain and unsaturated acyl-ACP or 3-ketoacyl-ACP substrates, indicating the diversity of enzymatic specificity in this enzyme family. These acyl-ACP TEs can potentially be used to diversify the fatty acid biosynthesis pathway to produce novel fatty acids.
Phylogenetic and experimental characterization of an acyl-ACP thioesterase family reveals significant diversity in enzymatic specificity and activity

PubMed Central

2011-01-01

Background Acyl-acyl carrier protein thioesterases (acyl-ACP TEs) catalyze the hydrolysis of the thioester bond that links the acyl chain to the sulfhydryl group of the phosphopantetheine prosthetic group of ACP. This reaction terminates acyl chain elongation of fatty acid biosynthesis, and in plant seeds it is the biochemical determinant of the fatty acid compositions of storage lipids. Results To explore acyl-ACP TE diversity and to identify novel acyl ACP-TEs, 31 acyl-ACP TEs from wide-ranging phylogenetic sources were characterized to ascertain their in vivo activities and substrate specificities. These acyl-ACP TEs were chosen by two different approaches: 1) 24 TEs were selected from public databases on the basis of phylogenetic analysis and fatty acid profile knowledge of their source organisms; and 2) seven TEs were molecularly cloned from oil palm (Elaeis guineensis), coconut (Cocos nucifera) and Cuphea viscosissima, organisms that produce medium-chain and short-chain fatty acids in their seeds. The in vivo substrate specificities of the acyl-ACP TEs were determined in E. coli. Based on their specificities, these enzymes were clustered into three classes: 1) Class I acyl-ACP TEs act primarily on 14- and 16-carbon acyl-ACP substrates; 2) Class II acyl-ACP TEs have broad substrate specificities, with major activities toward 8- and 14-carbon acyl-ACP substrates; and 3) Class III acyl-ACP TEs act predominantly on 8-carbon acyl-ACPs. Several novel acyl-ACP TEs act on short-chain and unsaturated acyl-ACP or 3-ketoacyl-ACP substrates, indicating the diversity of enzymatic specificity in this enzyme family. Conclusion These acyl-ACP TEs can potentially be used to diversify the fatty acid biosynthesis pathway to produce novel fatty acids. PMID:21831316
A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0

PubMed Central

Bazinet, Adam L.; Zwickl, Derrick J.; Cummings, Michael P.

2014-01-01

We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results. [garli, gateway, grid computing, maximum likelihood, molecular evolution portal, phylogenetics, web service.] PMID:24789072
Phylogenetic inertia and Darwin's higher law.

PubMed

Shanahan, Timothy

2011-03-01

The concept of 'phylogenetic inertia' is routinely deployed in evolutionary biology as an alternative to natural selection for explaining the persistence of characteristics that appear sub-optimal from an adaptationist perspective. However, in many of these contexts the precise meaning of 'phylogenetic inertia' and its relationship to selection are far from clear. After tracing the history of the concept of 'inertia' in evolutionary biology, I argue that treating phylogenetic inertia and natural selection as alternative explanations is mistaken because phylogenetic inertia is, from a Darwinian point of view, simply an expected effect of selection. Although Darwin did not discuss 'phylogenetic inertia,' he did assert the explanatory priority of selection over descent. An analysis of 'phylogenetic inertia' provides a perspective from which to assess Darwin's view. Copyright © 2010 Elsevier Ltd. All rights reserved.
Phylogenomic and MALDI-TOF MS Analysis of Streptococcus sinensis HKU4T Reveals a Distinct Phylogenetic Clade in the Genus Streptococcus

PubMed Central

Tse, Herman; Chen, Jonathan H.K.; Tang, Ying; Lau, Susanna K.P.; Woo, Patrick C.Y.

2014-01-01

Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. PMID:25331233
Phylogenomic and MALDI-TOF MS analysis of Streptococcus sinensis HKU4T reveals a distinct phylogenetic clade in the genus Streptococcus.

PubMed

Teng, Jade L L; Huang, Yi; Tse, Herman; Chen, Jonathan H K; Tang, Ying; Lau, Susanna K P; Woo, Patrick C Y

2014-10-20

Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the "sanguinis group." As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the "mitis group." On the basis of the findings, we propose a novel group, named "sinensis group," to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Transcriptome Analysis of the Octopus vulgaris Central Nervous System

PubMed Central

Zhang, Xiang; Mao, Yong; Huang, Zixia; Qu, Meng; Chen, Jun; Ding, Shaoxiong; Hong, Jingni; Sun, Tiantian

2012-01-01

Background Cephalopoda are a class of Mollusca species found in all the world's oceans. They are an important model organism in neurobiology. Unfortunately, the lack of neuronal molecular sequences, such as ESTs, transcriptomic or genomic information, has limited the development of molecular neurobiology research in this unique model organism. Results With high-throughput Illumina Solexa sequencing technology, we have generated 59,859 high quality sequences from 12,918,391 paired-end reads. Using BLASTx/BLASTn, 12,227 contigs have blast hits in the Swissprot, NR protein database and NT nucleotide database with E-value cutoff 1e−5. The comparison between the Octopus vulgaris central nervous system (CNS) library and the Aplysia californica/Lymnaea stagnalis CNS ESTs library yielded 5.93%/13.45% of O. vulgaris sequences with significant matches (1e−5) using BLASTn/tBLASTx. Meanwhile the hit percentage of the recently published Schistocerca gregaria, Tilapia or Hirudo medicinalis CNS library to the O. vulgaris CNS library is 21.03%–46.19%. We constructed the Phylogenetic tree using two genes related to CNS function, Synaptotagmin-7 and Synaptophysin. Lastly, we demonstrated that O. vulgaris may have a vertebrate-like Blood-Brain Barrier based on bioinformatic analysis. Conclusion This study provides a mass of molecular information that will contribute to further molecular biology research on O. vulgaris. In our presentation of the first CNS transcriptome analysis of O. vulgaris, we hope to accelerate the study of functional molecular neurobiology and comparative evolutionary biology. PMID:22768275
Phylogeny and Haplotype Analysis of Fungi Within the Fusarium incarnatum-equiseti Species Complex.

PubMed

Ramdial, H; Latchoo, R K; Hosein, F N; Rampersad, S N

2017-01-01

Fusarium spp. are ranked among the top 10 most economically and scientifically important plant-pathogenic fungi in the world and are associated with plant diseases that include fruit decay of a number of crops. Fusarium isolates infecting bell pepper in Trinidad were identified based on sequence comparisons of the translation elongation factor gene (EF-1a) with sequences of Fusarium incarnatum-equiseti species complex (FIESC) verified in the FUSARIUM-ID database. Eighty-two isolates were identified as belonging to one of four phylogenetic species within the subclades FIESC-1, FIESC-15, FIESC-16, and FIESC-26, with the majority of isolates belonging to FIESC-15. A comparison of the level of DNA polymorphism and phylogenetic inference for sequences of the internal transcribed spacer region (ITS1-5.8S-ITS2) and EF-1a sequences for Trinidad and FUSARIUM-ID type species was carried out. The ITS sequences were less informative, had lower haplotype diversity and restricted haplotype distribution, and resulted in poor resolution and taxa placement in the consensus maximum-likelihood tree. EF-1a sequences enabled strongly supported phylogenetic inference with highly resolved branching patterns of the 30 phylogenetic species within the FIESC and placement of representative Trinidad isolates. Therefore, global phylogeny was inferred from EF-1a sequences representing 11 countries, and separation into distinct Incarnatum and Equiseti clades was again evident. In total, 42 haplotypes were identified: 12 were shared and the remaining were unique haplotypes. The most diverse haplotype was represented by sequences from China, Indonesia, Malaysia, and Trinidad and consisted exclusively of F. incarnatum isolates. Spain had the highest haplotype diversity, perhaps because both F. equiseti and F. incarnatum sequences were represented; followed by the United States, which contributed both F. equiseti and F. incarnatum sequences to the data set; then by countries representing Southeast Asia (China, Indonesia, Malaysia, Thailand, and Philippines) and Trinidad; both of these regions were represented by only F. incarnatum sequences. Trinidad shared two haplotypes with China and one haplotype with the United States for only F. incarnatum isolates. The findings of this study are important for devising disease management strategies and for understanding the phylogenetic relationships among members of the FIESC.
Draft genome and sequence variant data of the oomycete Pythium insidiosum strain Pi45 from the phylogenetically-distinct Clade-III.

PubMed

Kittichotirat, Weerayuth; Patumcharoenpol, Preecha; Rujirawat, Thidarat; Lohnoo, Tassanee; Yingyong, Wanta; Krajaejun, Theerapong

2017-12-01

Pythium insidiosum is a unique oomycete microorganism, capable of infecting humans and animals. The organism can be phylogenetically categorized into three distinct clades: Clade-I (strains from the Americas); Clade-II (strains from Asia and Australia), and Clade-III (strains from Thailand and the United States). Two draft genomes of the P. insidiosum Clade-I strain CDC-B5653 and Clade-II strain Pi-S are available in the public domain. The genome of P. insidiosum from the distinct Clade-III, which is distantly-related to the other two clades, is lacking. Here, we report the draft genome sequence of the P. insidiosum strain Pi45 (also known as MCC13; isolated from a Thai patient with pythiosis; accession numbers BCFM01000001-BCFM01017277) as a representative strain of the phylogenetically-distinct Clade-III. We also report a genome-scale data set of sequence variants (i.e., SNPs and INDELs) found in P. insidiosum (accessible online at the Mendeley database: http://dx.doi.org/10.17632/r75799jy6c.1).
TREE2FASTA: a flexible Perl script for batch extraction of FASTA sequences from exploratory phylogenetic trees.

PubMed

Sauvage, Thomas; Plouviez, Sophie; Schmidt, William E; Fredericq, Suzanne

2018-03-05

The body of DNA sequence data lacking taxonomically informative sequence headers is rapidly growing in user and public databases (e.g. sequences lacking identification and contaminants). In the context of systematics studies, sorting such sequence data for taxonomic curation and/or molecular diversity characterization (e.g. crypticism) often requires the building of exploratory phylogenetic trees with reference taxa. The subsequent step of segregating DNA sequences of interest based on observed topological relationships can represent a challenging task, especially for large datasets. We have written TREE2FASTA, a Perl script that enables and expedites the sorting of FASTA-formatted sequence data from exploratory phylogenetic trees. TREE2FASTA takes advantage of the interactive, rapid point-and-click color selection and/or annotations of tree leaves in the popular Java tree-viewer FigTree to segregate groups of FASTA sequences of interest to separate files. TREE2FASTA allows for both simple and nested segregation designs to facilitate the simultaneous preparation of multiple data sets that may overlap in sequence content.
Phylesystem: a git-based data store for community-curated phylogenetic estimates.

PubMed

McTavish, Emily Jane; Hinchliff, Cody E; Allman, James F; Brown, Joseph W; Cranston, Karen A; Holder, Mark T; Rees, Jonathan A; Smith, Stephen A

2015-09-01

Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git's version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the 'phylesystem-api', which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. mtholder@gmail.com. © The Author 2015. Published by Oxford University Press.
Molecular analysis of the bacterial microbiome in the forestomach fluid from the dromedary camel (Camelus dromedarius).

PubMed

Bhatt, Vaibhav D; Dande, Suchitra S; Patil, Nitin V; Joshi, Chaitanya G

2013-04-01

Rumen microorganisms play an important role in ruminant digestion and absorption of nutrients and have great potential applications in the field of rumen adjusting, food fermentation and biomass utilization etc. In order to investigate the composition of microorganisms in the rumen of camel (Camelus dromedarius), this study delves in the microbial diversity by culture-independent approach. It includes comparison of rumen samples investigated in the present study to other currently available metagenomes to reveal potential differences in rumen microbial systems. Pyrosequencing based metagenomics was applied to analyze phylogenetic and metabolic profiles by MG-RAST, a web based tool. Pyrosequencing of camel rumen sample yielded 8,979,755 nucleotides assembled to 41,905 sequence reads with an average read length of 214 nucleotides. Taxonomic analysis of metagenomic reads indicated Bacteroidetes (55.5 %), Firmicutes (22.7 %) and Proteobacteria (9.2 %) phyla as predominant camel rumen taxa. At a finer phylogenetic resolution, Bacteroides species dominated the camel rumen metagenome. Functional analysis revealed that clustering-based subsystem and carbohydrate metabolism were the most abundant SEED subsystem representing 17 and 13 % of camel metagenome, respectively. A high taxonomic and functional similarity of camel rumen was found with the cow metagenome which is not surprising given the fact that both are mammalian herbivores with similar digestive tract structures and functions. Combined pyrosequencing approach and subsystems-based annotations available in the SEED database allowed us access to understand the metabolic potential of these microbiomes. Altogether, these data suggest that agricultural and animal husbandry practices can impose significant selective pressures on the rumen microbiota regardless of rumen type. The present study provides a baseline for understanding the complexity of camel rumen microbial ecology while also highlighting striking similarities and differences when compared to other animal gastrointestinal environments.
Prevalence and Identity of Taenia multiceps cysts "Coenurus cerebralis" in Sheep in Egypt.

PubMed

Amer, Said; ElKhatam, Ahmed; Fukuda, Yasuhiro; Bakr, Lamia I; Zidan, Shereif; Elsify, Ahmed; Mohamed, Mostafa A; Tada, Chika; Nakai, Yutaka

2017-12-01

Coenurosis is a parasitic disease caused by the larval stage (Coenurus cerebralis) of the canids cestode Taenia multiceps. C. cerebralis particularly infects sheep and goats, and pose a public health concerns. The present study aimed to determine the occurrence and molecular identity of C. cerebralis infecting sheep in Egypt. Infection rate was determined by postmortem inspection of heads of the cases that showed neurological manifestations. Species identification and genetic diversity were analyzed based on PCR-sequence analysis of nuclear ITS1 and mitochondrial cytochrome oxidase (COI) and nicotinamide adenine dinucleotide dehydrogenase (ND1) gene markers. Out of 3668 animals distributed in 50 herds at localities of Ashmoun and El Sadat cities, El Menoufia Province, Egypt, 420 (11.45%) sheep showed neurological disorders. Postmortem examination of these animals after slaughter at local abattoirs indicated to occurrence of C. cerebralis cysts in the brain of 111 out of 420 (26.4%), with overall infection rate 3.03% of the involved sheep population. Molecular analysis of representative samples of coenuri at ITS1 gene marker showed extensive intra- and inter-sequence diversity due to deletions/insertions in the microsatellite regions. On contrast to the nuclear gene marker, considerably low genetic diversity was seen in the analyzed mitochondrial gene markers. Phylogenetic analysis based on COI and ND1 gene sequences indicated that the generated sequences in the present study and the reference sequences in the database clustered in 4 haplogroups, with more or less similar topologies. Clustering pattern of the phylogenetic tree showed no effect for the geographic location or the host species. Copyright © 2017 Elsevier B.V. All rights reserved.
Characterization of Vibrio parahaemolyticus clinical strains from Maryland (2012-2013) and comparisons to a locally and globally diverse V. parahaemolyticus strains by whole-genome sequence analysis.

PubMed

Haendiges, Julie; Timme, Ruth; Allard, Marc W; Myers, Robert A; Brown, Eric W; Gonzalez-Escalona, Narjol

2015-01-01

Vibrio parahaemolyticus is the leading cause of foodborne illnesses in the US associated with the consumption of raw shellfish. Previous population studies of V. parahaemolyticus have used Multi-Locus Sequence Typing (MLST) or Pulsed Field Gel Electrophoresis (PFGE). Whole genome sequencing (WGS) provides a much higher level of resolution, but has been used to characterize only a few United States (US) clinical isolates. Here we report the WGS characterization of 34 genomes of V. parahaemolyticus strains that were isolated from clinical cases in the state of Maryland (MD) during 2 years (2012-2013). These 2 years saw an increase of V. parahaemolyticus cases compared to previous years. Among these MD isolates, 28% were negative for tdh and trh, 8% were tdh positive only, 11% were trh positive only, and 53% contained both genes. We compared this set of V. parahaemolyticus genomes to those of a collection of 17 archival strains from the US (10 previously sequenced strains and 7 from NCBI, collected between 1988 and 2004) and 15 international strains, isolated from geographically-diverse environmental and clinical sources (collected between 1980 and 2010). A WGS phylogenetic analysis of these strains revealed the regional outbreak strains from MD are highly diverse and yet genetically distinct from the international strains. Some MD strains caused outbreaks 2 years in a row, indicating a local source of contamination (e.g., ST631). Advances in WGS will enable this type of analysis to become routine, providing an excellent tool for improved surveillance. Databases built with phylogenetic data will help pinpoint sources of contamination in future outbreaks and contribute to faster outbreak control.
Characterization of Vibrio parahaemolyticus clinical strains from Maryland (2012–2013) and comparisons to a locally and globally diverse V. parahaemolyticus strains by whole-genome sequence analysis

PubMed Central

Haendiges, Julie; Timme, Ruth; Allard, Marc W.; Myers, Robert A.; Brown, Eric W.; Gonzalez-Escalona, Narjol

2015-01-01

Vibrio parahaemolyticus is the leading cause of foodborne illnesses in the US associated with the consumption of raw shellfish. Previous population studies of V. parahaemolyticus have used Multi-Locus Sequence Typing (MLST) or Pulsed Field Gel Electrophoresis (PFGE). Whole genome sequencing (WGS) provides a much higher level of resolution, but has been used to characterize only a few United States (US) clinical isolates. Here we report the WGS characterization of 34 genomes of V. parahaemolyticus strains that were isolated from clinical cases in the state of Maryland (MD) during 2 years (2012–2013). These 2 years saw an increase of V. parahaemolyticus cases compared to previous years. Among these MD isolates, 28% were negative for tdh and trh, 8% were tdh positive only, 11% were trh positive only, and 53% contained both genes. We compared this set of V. parahaemolyticus genomes to those of a collection of 17 archival strains from the US (10 previously sequenced strains and 7 from NCBI, collected between 1988 and 2004) and 15 international strains, isolated from geographically-diverse environmental and clinical sources (collected between 1980 and 2010). A WGS phylogenetic analysis of these strains revealed the regional outbreak strains from MD are highly diverse and yet genetically distinct from the international strains. Some MD strains caused outbreaks 2 years in a row, indicating a local source of contamination (e.g., ST631). Advances in WGS will enable this type of analysis to become routine, providing an excellent tool for improved surveillance. Databases built with phylogenetic data will help pinpoint sources of contamination in future outbreaks and contribute to faster outbreak control. PMID:25745421
Molecular variation and horizontal gene transfer of the homocysteine methyltransferase gene mmuM and its distribution in clinical pathogens.

PubMed

Ying, Jianchao; Wang, Huifeng; Bao, Bokan; Zhang, Ying; Zhang, Jinfang; Zhang, Cheng; Li, Aifang; Lu, Junwan; Li, Peizhen; Ying, Jun; Liu, Qi; Xu, Teng; Yi, Huiguang; Li, Jinsong; Zhou, Li; Zhou, Tieli; Xu, Zuyuan; Ni, Liyan; Bao, Qiyu

2015-01-01

The homocysteine methyltransferase encoded by mmuM is widely distributed among microbial organisms. It is the key enzyme that catalyzes the last step in methionine biosynthesis and plays an important role in the metabolism process. It also enables the microbial organisms to tolerate high concentrations of selenium in the environment. In this research, 533 mmuM gene sequences covering 70 genera of the bacteria were selected from GenBank database. The distribution frequency of mmuM is different in the investigated genera of bacteria. The mapping results of 160 mmuM reference sequences showed that the mmuM genes were found in 7 species of pathogen genomes sequenced in this work. The polymerase chain reaction products of one mmuM genotype (NC_013951 as the reference) were sequenced and the sequencing results confirmed the mapping results. Furthermore, 144 representative sequences were chosen for phylogenetic analysis and some mmuM genes from totally different genera (such as the genes between Escherichia and Klebsiella and between Enterobacter and Kosakonia) shared closer phylogenetic relationship than those from the same genus. Comparative genomic analysis of the mmuM encoding regions on plasmids and bacterial chromosomes showed that pKF3-140 and pIP1206 plasmids shared a 21 kb homology region and a 4.9 kb fragment in this region was in fact originated from the Escherichia coli chromosome. These results further suggested that mmuM gene did go through the gene horizontal transfer among different species or genera of bacteria. High-throughput sequencing combined with comparative genomics analysis would explore distribution and dissemination of the mmuM gene among bacteria and its evolution at a molecular level.
New Subtypes and Genetic Recombination in HIV Type 1-Infecting Patients with Highly Active Antiretroviral Therapy in Peru (2008–2010)

PubMed Central

Acuña, Maribel; Gazzo, Cecilia; Salinas, Gabriela; Cárdenas, Fanny; Valverde, Ada; Romero, Soledad

2012-01-01

Abstract HIV-1 subtype B is the most frequent strain in Peru. However, there is no available data about the genetic diversity of HIV-infected patients receiving highly active antiretroviral therapy (HAART) here. A group of 267 patients in the Peruvian National Treatment Program with virologic failure were tested for genotypic evidence of HIV drug resistance at the Instituto Nacional de Salud (INS) of Peru between March 2008 and December 2010. Viral RNA was extracted from plasma and the segments of the protease (PR) and reverse transcriptase (RT) genes were amplified by reverse transcriptase polymerase chain reaction (RT-PCR), purified, and fully sequenced. Consensus sequences were submitted to the HIVdb Genotypic Resistance Interpretation Algorithm Database from Stanford University, and then aligned using Clustal X v.2.0 to generate a phylogenetic tree using the maximum likelihood method. Intrasubtype and intersubtype recombination analyses were performed using the SCUEAL program (Subtype Classification by Evolutionary ALgo-rithms). A total of 245 samples (91%) were successfully genotyped. The analysis obtained from the HIVdb program showed 81.5% resistance cases (n=198). The phylogenetic analysis revealed that subtype B was predominant in the population (98.8%), except for new cases of A, C, and H subtypes (n=4). Of these cases, only subtype C was imported. Likewise, recombination analysis revealed nine intersubtype and 20 intrasubtype recombinant cases. This is the first report of the presence of HIV-1 subtypes C and H in Peru. The introduction of new subtypes and circulating recombinants forms can make it difficult to distinguish resistance profiles in patients and consequently affect future treatment strategies against HIV in this country. PMID:22559065
Genetic Characterization and Comparison of Clostridium botulinum Isolates from Botulism Cases in Japan between 2006 and 2011

PubMed Central

Sekizuka, Tsuyoshi; Yamamoto, Akihiko; Iwaki, Masaaki; Komiya, Takako; Hatakeyama, Takashi; Nakajima, Hiroshi; Takahashi, Motohide; Kuroda, Makoto; Shibayama, Keigo

2014-01-01

Genetic characterization was performed for 10 group I Clostridium botulinum strains isolated from botulism cases in Japan between 2006 and 2011. Of these, 1 was type A, 2 were type B, and 7 were type A(B) {carrying a silent bont/B [bont/(B)] gene} serotype strains, based on botulinum neurotoxin (BoNT) production. The type A strain harbored the subtype A1 BoNT gene (bont/A1), which is associated with the ha gene cluster. The type B strains carried bont/B5 or bont/B6 subtype genes. The type A(B) strains carried bont/A1 identical to that of type A(B) strain NCTC2916. However, bont/(B) genes in these strains showed single-nucleotide polymorphisms (SNPs) among strains. SNPs at 2 nucleotide positions of bont/(B) enabled classification of the type A(B) strains into 3 groups. Pulsed-field gel electrophoresis (PFGE) and multiple-locus variable-number tandem-repeat analysis (MLVA) also provided consistent separation results. In addition, the type A(B) strains were separated into 2 lineages based on their plasmid profiles. One lineage carried a small plasmid (5.9 kb), and another harbored 21-kb plasmids. To obtain more detailed genetic information about the 10 strains, we sequenced their genomes and compared them with 13 group I C. botulinum genomes in a database using whole-genome SNP analysis. This analysis provided high-resolution strain discrimination and enabled us to generate a refined phylogenetic tree that provides effective traceability of botulism cases, as well as bioterrorism materials. In the phylogenetic tree, the subtype B6 strains, Okayama2011 and Osaka05, were distantly separated from the other strains, indicating genomic divergence of subtype B6 strains among group I strains. PMID:25192986

Carbohydrate-active enzymes in Trichoderma harzianum: a bioinformatic analysis bioprospecting for key enzymes for the biofuels industry.

PubMed

Ferreira Filho, Jaire Alves; Horta, Maria Augusta Crivelente; Beloti, Lilian Luzia; Dos Santos, Clelton Aparecido; de Souza, Anete Pereira

2017-10-12

Trichoderma harzianum is used in biotechnology applications due to its ability to produce powerful enzymes for the conversion of lignocellulosic substrates into soluble sugars. Active enzymes involved in carbohydrate metabolism are defined as carbohydrate-active enzymes (CAZymes), and the most abundant family in the CAZy database is the glycoside hydrolases. The enzymes of this family play a fundamental role in the decomposition of plant biomass. In this study, the CAZymes of T. harzianum were identified and classified using bioinformatic approaches after which the expression profiles of all annotated CAZymes were assessed via RNA-Seq, and a phylogenetic analysis was performed. A total of 430 CAZymes (3.7% of the total proteins for this organism) were annotated in T. harzianum, including 259 glycoside hydrolases (GHs), 101 glycosyl transferases (GTs), 6 polysaccharide lyases (PLs), 22 carbohydrate esterases (CEs), 42 auxiliary activities (AAs) and 46 carbohydrate-binding modules (CBMs). Among the identified T. harzianum CAZymes, 47% were predicted to harbor a signal peptide sequence and were therefore classified as secreted proteins. The GH families were the CAZyme class with the greatest number of expressed genes, including GH18 (23 genes), GH3 (17 genes), GH16 (16 genes), GH2 (13 genes) and GH5 (12 genes). A phylogenetic analysis of the proteins in the AA9/GH61, CE5 and GH55 families showed high functional variation among the proteins. Identifying the main proteins used by T. harzianum for biomass degradation can ensure new advances in the biofuel production field. Herein, we annotated and characterized the expression levels of all of the CAZymes from T. harzianum, which may contribute to future studies focusing on the functional and structural characterization of the identified proteins.
New subtypes and genetic recombination in HIV type 1-infecting patients with highly active antiretroviral therapy in Peru (2008-2010).

PubMed

Yabar, Carlos Augusto; Acuña, Maribel; Gazzo, Cecilia; Salinas, Gabriela; Cárdenas, Fanny; Valverde, Ada; Romero, Soledad

2012-12-01

HIV-1 subtype B is the most frequent strain in Peru. However, there is no available data about the genetic diversity of HIV-infected patients receiving highly active antiretroviral therapy (HAART) here. A group of 267 patients in the Peruvian National Treatment Program with virologic failure were tested for genotypic evidence of HIV drug resistance at the Instituto Nacional de Salud (INS) of Peru between March 2008 and December 2010. Viral RNA was extracted from plasma and the segments of the protease (PR) and reverse transcriptase (RT) genes were amplified by reverse transcriptase polymerase chain reaction (RT-PCR), purified, and fully sequenced. Consensus sequences were submitted to the HIVdb Genotypic Resistance Interpretation Algorithm Database from Stanford University, and then aligned using Clustal X v.2.0 to generate a phylogenetic tree using the maximum likelihood method. Intrasubtype and intersubtype recombination analyses were performed using the SCUEAL program (Subtype Classification by Evolutionary ALgo-rithms). A total of 245 samples (91%) were successfully genotyped. The analysis obtained from the HIVdb program showed 81.5% resistance cases (n=198). The phylogenetic analysis revealed that subtype B was predominant in the population (98.8%), except for new cases of A, C, and H subtypes (n=4). Of these cases, only subtype C was imported. Likewise, recombination analysis revealed nine intersubtype and 20 intrasubtype recombinant cases. This is the first report of the presence of HIV-1 subtypes C and H in Peru. The introduction of new subtypes and circulating recombinants forms can make it difficult to distinguish resistance profiles in patients and consequently affect future treatment strategies against HIV in this country.
Endosymbiotic Microbiota of the Bamboo Pseudococcid Antonina crawii (Insecta, Homoptera)

PubMed Central

Fukatsu, Takema; Nikoh, Naruo

2000-01-01

We characterized the intracellular symbiotic microbiota of the bamboo pseudococcid Antonina crawii by performing a molecular phylogenetic analysis in combination with in situ hybridization. Almost the entire length of the bacterial 16S rRNA gene was amplified and cloned from A. crawii whole DNA. Restriction fragment length polymorphism analysis revealed that the clones obtained included three distinct types of sequences. Nucleotide sequences of the three types were determined and subjected to a molecular phylogenetic analysis. The first sequence was a member of the γ subdivision of the division Proteobacteria (γ-Proteobacteria) to which no sequences in the database were closely related, although the sequences of endosymbionts of other homopterans, such as psyllids and aphids, were distantly related. The second sequence was a β-Proteobacteria sequence and formed a monophyletic group with the sequences of endosymbionts from other pseudococcids. The third sequence exhibited a high level of similarity to sequences of Spiroplasma spp. from ladybird beetles and a tick. Localization of the endosymbionts was determined by using tissue sections of A. crawii and in situ hybridization with specific oligonucleotide probes. The γ- and β-Proteobacteria symbionts were packed in the cytoplasm of the same mycetocytes (or bacteriocytes) and formed a large mycetome (or bacteriome) in the abdomen. The spiroplasma symbionts were also present intracellularly in various tissues at a low density. We observed that the anterior poles of developing eggs in the ovaries were infected by the γ- and β-Proteobacteria symbionts in a systematic way, which ensured vertical transmission. Five representative pseudococcids were examined by performing diagnostic PCR experiments with specific primers; the β-Proteobacteria symbiont was detected in all five pseudococcids, the γ-Proteobacteria symbiont was found in three, and the spiroplasma symbiont was detected only in A. crawii. PMID:10653730
Verification of Ribosomal Proteins of Aspergillus fumigatus for Use as Biomarkers in MALDI-TOF MS Identification

PubMed Central

Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi

2016-01-01

We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus. PMID:27843740
PHYLOGENETIC RELATIONSHIP OF ALEXANDRIUM MONILATUM (DINOPHYCEAE) TO OTHER ALEXANDRIUM SPECIES BASED ON 18S RIBOSOMAL RNA GENE SEQUENCES

EPA Science Inventory

The phylogenetic relationship of Alexandrium monilatum to other Alexandrium spp. was explored using 18S rDNA sequences. Maximum likelilhood phylogenetic analysis of the combined rDNA sequences established that A. monilatum paired with Alexandrium taylori and that the pair was the...
PHYLOGENETIC RELATIONSHIP OF ALEXANDRIUM MONILATUM (DINOPHYCAE)TO OTHER ALEXANDRIUM SPECIES BASED ON 18S RIBOSOMAL RNA GENE SEQUENCES

EPA Science Inventory

The phylogenetic relationship of Alexandrium monilatum to other Alexandrium spp. was explored using 18S rDNA sequences. Maximum likelihood phylogenetic analysis of the combined rDNA sequences established that A. monilatum paired with Alexandrium taylori and that the pair was the ...
Extensive characterization of Tupaia belangeri neuropeptidome using an integrated mass spectrometric approach.

PubMed

Petruzziello, Filomena; Fouillen, Laetitia; Wadensten, Henrik; Kretz, Robert; Andren, Per E; Rainer, Gregor; Zhang, Xiaozhe

2012-02-03

Neuropeptidomics is used to characterize endogenous peptides in the brain of tree shrews (Tupaia belangeri). Tree shrews are small animals similar to rodents in size but close relatives of primates, and are excellent models for brain research. Currently, tree shrews have no complete proteome information available on which direct database search can be allowed for neuropeptide identification. To increase the capability in the identification of neuropeptides in tree shrews, we developed an integrated mass spectrometry (MS)-based approach that combines methods including data-dependent, directed, and targeted liquid chromatography (LC)-Fourier transform (FT)-tandem MS (MS/MS) analysis, database construction, de novo sequencing, precursor protein search, and homology analysis. Using this integrated approach, we identified 107 endogenous peptides that have sequences identical or similar to those from other mammalian species. High accuracy MS and tandem MS information, with BLAST analysis and chromatographic characteristics were used to confirm the sequences of all the identified peptides. Interestingly, further sequence homology analysis demonstrated that tree shrew peptides have a significantly higher degree of homology to equivalent sequences in humans than those in mice or rats, consistent with the close phylogenetic relationship between tree shrews and primates. Our results provide the first extensive characterization of the peptidome in tree shrews, which now permits characterization of their function in nervous and endocrine system. As the approach developed fully used the conservative properties of neuropeptides in evolution and the advantage of high accuracy MS, it can be portable for identification of neuropeptides in other species for which the fully sequenced genomes or proteomes are not available.
PAMLX: a graphical user interface for PAML.

PubMed

Xu, Bo; Yang, Ziheng

2013-12-01

This note announces pamlX, a graphical user interface/front end for the paml (for Phylogenetic Analysis by Maximum Likelihood) program package (Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13:555-556; Yang Z. 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 24:1586-1591). pamlX is written in C++ using the Qt library and communicates with paml programs through files. It can be used to create, edit, and print control files for paml programs and to launch paml runs. The interface is available for free download at http://abacus.gene.ucl.ac.uk/software/paml.html.
The Coral Trait Database, a curated database of trait information for coral species from the global oceans

NASA Astrophysics Data System (ADS)

Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C. L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

2016-03-01

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.
The Coral Trait Database, a curated database of trait information for coral species from the global oceans

PubMed Central

Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C.L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

2016-01-01

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900
The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

PubMed

Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Harmer, Aaron; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

2016-03-29

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.
Cyanobacterial diversity held in microbial biological resource centers as a biotechnological asset: the case study of the newly established LEGE culture collection.

PubMed

Ramos, Vitor; Morais, João; Castelo-Branco, Raquel; Pinheiro, Ângela; Martins, Joana; Regueiras, Ana; Pereira, Ana L; Lopes, Viviana R; Frazão, Bárbara; Gomes, Dina; Moreira, Cristiana; Costa, Maria Sofia; Brûle, Sébastien; Faustino, Silvia; Martins, Rosário; Saker, Martin; Osswald, Joana; Leão, Pedro N; Vasconcelos, Vitor M

2018-01-01

Cyanobacteria are a well-known source of bioproducts which renders culturable strains a valuable resource for biotechnology purposes. We describe here the establishment of a cyanobacterial culture collection (CC) and present the first version of the strain catalog and its online database (http://lege.ciimar.up.pt/). The LEGE CC holds 386 strains, mainly collected in coastal (48%), estuarine (11%), and fresh (34%) water bodies, for the most part from Portugal (84%). By following the most recent taxonomic classification, LEGE CC strains were classified into at least 46 genera from six orders (41% belong to the Synechococcales), several of them are unique among the phylogenetic diversity of the cyanobacteria. For all strains, primary data were obtained and secondary data were surveyed and reviewed, which can be reached through the strain sheets either in the catalog or in the online database. An overview on the notable biodiversity of LEGE CC strains is showcased, including a searchable phylogenetic tree and images for all strains. With this work, 80% of the LEGE CC strains have now their 16S rRNA gene sequences deposited in GenBank. Also, based in primary data, it is demonstrated that several LEGE CC strains are a promising source of extracellular polymeric substances (EPS). Through a review of previously published data, it is exposed that LEGE CC strains have the potential or actual capacity to produce a variety of biotechnologically interesting compounds, including common cyanotoxins or unprecedented bioactive molecules. Phylogenetic diversity of LEGE CC strains does not entirely reflect chemodiversity. Further bioprospecting should, therefore, account for strain specificity of the valuable cyanobacterial holdings of LEGE CC.
Phylogenetic and environmental diversity of DsrAB-type dissimilatory (bi)sulfite reductases

PubMed Central

Müller, Albert Leopold; Kjeldsen, Kasper Urup; Rattei, Thomas; Pester, Michael; Loy, Alexander

2015-01-01

The energy metabolism of essential microbial guilds in the biogeochemical sulfur cycle is based on a DsrAB-type dissimilatory (bi)sulfite reductase that either catalyzes the reduction of sulfite to sulfide during anaerobic respiration of sulfate, sulfite and organosulfonates, or acts in reverse during sulfur oxidation. Common use of dsrAB as a functional marker showed that dsrAB richness in many environments is dominated by novel sequence variants and collectively represents an extensive, largely uncharted sequence assemblage. Here, we established a comprehensive, manually curated dsrAB/DsrAB database and used it to categorize the known dsrAB diversity, reanalyze the evolutionary history of dsrAB and evaluate the coverage of published dsrAB-targeted primers. Based on a DsrAB consensus phylogeny, we introduce an operational classification system for environmental dsrAB sequences that integrates established taxonomic groups with operational taxonomic units (OTUs) at multiple phylogenetic levels, ranging from DsrAB enzyme families that reflect reductive or oxidative DsrAB types of bacterial or archaeal origin, superclusters, uncultured family-level lineages to species-level OTUs. Environmental dsrAB sequences constituted at least 13 stable family-level lineages without any cultivated representatives, suggesting that major taxa of sulfite/sulfate-reducing microorganisms have not yet been identified. Three of these uncultured lineages occur mainly in marine environments, while specific habitat preferences are not evident for members of the other 10 uncultured lineages. In summary, our publically available dsrAB/DsrAB database, the phylogenetic framework, the multilevel classification system and a set of recommended primers provide a necessary foundation for large-scale dsrAB ecology studies with next-generation sequencing methods. PMID:25343514
Phylomemetics—Evolutionary Analysis beyond the Gene

PubMed Central

Howe, Christopher J.; Windram, Heather F.

2011-01-01

Genes are propagated by error-prone copying, and the resulting variation provides the basis for phylogenetic reconstruction of evolutionary relationships. Horizontal gene transfer may be superimposed on a tree-like evolutionary pattern, with some relationships better depicted as networks. The copying of manuscripts by scribes is very similar to the replication of genes, and phylogenetic inference programs can be used directly for reconstructing the copying history of different versions of a manuscript text. Phylogenetic methods have also been used for some time to analyse the evolution of languages and the development of physical cultural artefacts. These studies can help to answer a range of anthropological questions. We propose the adoption of the term “phylomemetics” for phylogenetic analysis of reproducing non-genetic elements. PMID:21655311
Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies

PubMed Central

Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros

2015-01-01

Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths. PMID:26227865
A Deliberate Practice Approach to Teaching Phylogenetic Analysis

ERIC Educational Resources Information Center

Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.

2013-01-01

One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or "one-shot," in-class activities. Using a deliberate practice instructional approach, we…
Extracting patterns of database and software usage from the bioinformatics literature

PubMed Central

Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert

2014-01-01

Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253
DDRprot: a database of DNA damage response-related proteins.

PubMed

Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

2016-01-01

The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.
Genomics dataset on unclassified published organism (patent US 7547531).

PubMed

Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier

2016-12-01

Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
Phylogenetically informed logic relationships improve detection of biological network organization

PubMed Central

2011-01-01

Background A "phylogenetic profile" refers to the presence or absence of a gene across a set of organisms, and it has been proven valuable for understanding gene functional relationships and network organization. Despite this success, few studies have attempted to search beyond just pairwise relationships among genes. Here we search for logic relationships involving three genes, and explore its potential application in gene network analyses. Results Taking advantage of a phylogenetic matrix constructed from the large orthologs database Roundup, we invented a method to create balanced profiles for individual triplets of genes that guarantee equal weight on the different phylogenetic scenarios of coevolution between genes. When we applied this idea to LAPP, the method to search for logic triplets of genes, the balanced profiles resulted in significant performance improvement and the discovery of hundreds of thousands more putative triplets than unadjusted profiles. We found that logic triplets detected biological network organization and identified key proteins and their functions, ranging from neighbouring proteins in local pathways, to well separated proteins in the whole pathway, and to the interactions among different pathways at the system level. Finally, our case study suggested that the directionality in a logic relationship and the profile of a triplet could disclose the connectivity between the triplet and surrounding networks. Conclusion Balanced profiles are superior to the raw profiles employed by traditional methods of phylogenetic profiling in searching for high order gene sets. Gene triplets can provide valuable information in detection of biological network organization and identification of key genes at different levels of cellular interaction. PMID:22172058

Active Site Characterization of Proteases Sequences from Different Species of Aspergillus.

PubMed

Morya, V K; Yadav, Virendra K; Yadav, Sangeeta; Yadav, Dinesh

2016-09-01

A total of 129 proteases sequences comprising 43 serine proteases, 36 aspartic proteases, 24 cysteine protease, 21 metalloproteases, and 05 neutral proteases from different Aspergillus species were analyzed for the catalytically active site residues using MEROPS database and various bioinformatics tools. Different proteases have predominance of variable active site residues. In case of 24 cysteine proteases of Aspergilli, the predominant active site residues observed were Gln193, Cys199, His364, Asn384 while for 43 serine proteases, the active site residues namely Asp164, His193, Asn284, Ser349 and Asp325, His357, Asn454, Ser519 were frequently observed. The analysis of 21 metalloproteases of Aspergilli revealed Glu298 and Glu388, Tyr476 as predominant active site residues. In general, Aspergilli species-specific active site residues were observed for different types of protease sequences analyzed. The phylogenetic analysis of these 129 proteases sequences revealed 14 different clans representing different types of proteases with diverse active site residues.
Despotism, democracy, and the evolutionary dynamics of leadership and followership.

PubMed

Van Vugt, Mark

2009-01-01

Responds to comments made by George B. Graen and Stephen J. Guastello on the current author's article Leadership, followership, and evolution: Some lessons from the past by Van Vugt, Hogan, and Kaiser. In the original article my co-authors and I proposed a new way of thinking about leadership, informed by evolutionary (neo-Darwinian) theory. In the first commentary, Graen noted that we ignored a number of recently developed psychological theories of leadership that take into account the leader-follower relationship, most notably LMX theory. LMX theory asserts that leadership effectiveness and team performance are affected by the quality of working relationships between superior and subordinates. Because the original article primarily dealt with questions about the origins of leadership--the phylogenetic and evolutionary causes--we had to be concise in our review of proximate psychological theories of leadership. In the second commentary, Guastello concurred with the importance of an evolutionary game analysis for studying leadership but disagreed with certain details of our analysis. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Deeper insight into the structure of the anaerobic digestion microbial community; the biogas microbiome database is expanded with 157 new genomes.

PubMed

Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini

2016-09-01

This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. Copyright © 2016 Elsevier Ltd. All rights reserved.
Genome size estimates for crustaceans using Feulgen image analysis densitometry of ethanol-preserved tissues.

PubMed

Jeffery, Nicholas W; Gregory, T Ryan

2014-10-01

Crustaceans are enormously diverse both phylogenetically and ecologically, but they remain substantially underrepresented in the existing genome size database. An expansion of this dataset could be facilitated if it were possible to obtain genome size estimates from ethanol-preserved specimens. In this study, two tests were performed in order to assess the reliability of genome size data generated using preserved material. First, the results of estimates based on flash-frozen versus ethanol-preserved material were compared across 37 species of crustaceans that differ widely in genome size. Second, a comparison was made of specimens from a single species that had been stored in ethanol for 1-14 years. In both cases, the use of gill tissue in Feulgen image analysis densitometry proved to be a very viable approach. This finding is of direct relevance to both new studies of field-collected crustaceans as well as potential studies based on existing collections. © 2014 International Society for Advancement of Cytometry.
Genetic characterization of influenza A virus subtype H12N1 isolated from a watercock and lesser whistling ducks in Thailand.

PubMed

Wongphatcharachai, Manoosak; Wisedchanwet, Trong; Lapkuntod, Jiradej; Nonthabenjawan, Nutthawan; Jairak, Waleemas; Amonsin, Alongkorn

2012-06-01

Monitoring of influenza A virus (IAV) was conducted in wild bird species in central Thailand. Four IAV subtype H12N1 strains were isolated from a watercock (order Gruiformes, family Rallidae) (n = 1) and lesser whistling ducks (order Anseriformes, family Anatidae) (n = 3). All H12N1 viruses were characterized by whole-genome sequencing. Phylogenetic analysis of all eight genes of the Thai H12N1 viruses indicated that they are most closely related to the Eurasian strains. Analysis of the HA gene revealed the strains to be of low pathogenicity. This study is the first to report the circulation of IAV subtype H12N1 in Thailand and to describe the genetic characteristics of H12N1 in Eurasia. Moreover, the genetic information obtained on H12N1 has contributed a new Eurasian strain of H12N1 to the GenBank database.
Phylum- and Class-Specific PCR Primers for General Microbial Community Analysis

PubMed Central

Blackwood, Christopher B.; Oaks, Adam; Buyer, Jeffrey S.

2005-01-01

Amplification of a particular DNA fragment from a mixture of organisms by PCR is a common first step in methods of examining microbial community structure. The use of group-specific primers in community DNA profiling applications can provide enhanced sensitivity and phylogenetic detail compared to domain-specific primers. Other uses for group-specific primers include quantitative PCR and library screening. The purpose of the present study was to develop several primer sets targeting commonly occurring and important groups. Primers specific for the 16S ribosomal sequences of Alphaproteobacteria, Betaproteobacteria, Bacilli, Actinobacteria, and Planctomycetes and for parts of both the 18S ribosomal sequence and the internal transcribed spacer region of Basidiomycota were examined. Primers were tested by comparison to sequences in the ARB 2003 database, and chosen primers were further tested by cloning and sequencing from soil community DNA. Eighty-five to 100% of the sequences obtained from clone libraries were found to be placed with the groups intended as targets, demonstrating the specificity of the primers under field conditions. It will be important to reevaluate primers over time because of the continual growth of sequence databases and revision of microbial taxonomy. PMID:16204538
Mycofier: a new machine learning-based classifier for fungal ITS sequences.

PubMed

Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel

2016-08-11

The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .
HIV forensics: pitfalls and acceptable standards in the use of phylogenetic analysis as evidence in criminal investigations of HIV transmission.

PubMed

Bernard, E J; Azad, Y; Vandamme, A M; Weait, M; Geretti, A M

2007-09-01

Phylogenetic analysis - the study of the genetic relatedness between HIV strains - has recently been used in criminal prosecutions as evidence of responsibility for HIV transmission. In these trials, the expert opinion of virologists has been of critical importance. Phylogenetic analysis of HIV gene sequences is complex and its findings do not achieve the levels of certainty obtained with the forensic analysis of human DNA. Although two individuals may carry HIV strains that are closely related, these will not necessarily be unique to the two parties and could extend to other persons within the same transmission network. For forensic purposes, phylogenetic analysis should be conducted under strictly controlled conditions by laboratories with relevant expertise applying rigorous methods. It is vitally important to include the right controls, which should be epidemiologically and temporally relevant to the parties under investigation. Use of inappropriate controls can exaggerate any relatedness between the virus strains of the complainant and defendant as being strikingly unique. It will be often difficult to obtain the relevant controls. If convenient but less appropriate controls are used, interpretation of the findings should be tempered accordingly. Phylogenetic analysis cannot prove that HIV transmission occurred directly between two individuals. However, it can exonerate individuals by demonstrating that the defendant carries a virus strain unrelated to that of the complainant. Expert witnesses should acknowledge the limitations of the inferences that might be made and choose the correct language in both written and verbal testimony.
Phylogenetic Status of an Unrecorded Species of Curvularia, C. spicifera, Based on Current Classification System of Curvularia and Bipolaris Group Using Multi Loci.

PubMed

Jeon, Sun Jeong; Nguyen, Thi Thuong Thuong; Lee, Hyang Burm

2015-09-01

A seed-borne fungus, Curvularia sp. EML-KWD01, was isolated from an indigenous wheat seed by standard blotter method. This fungus was characterized based on the morphological characteristics and molecular phylogenetic analysis. Phylogenetic status of the fungus was determined using sequences of three loci: rDNA internal transcribed spacer, large ribosomal subunit, and glyceraldehyde 3-phosphate dehydrogenase gene. Multi loci sequencing analysis revealed that this fungus was Curvularia spicifera within Curvularia group 2 of family Pleosporaceae.
Phylogenetic relationship of Ornithobacterium rhinotracheale strains.

PubMed

DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo

2018-04-10

The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.
Detection and Phylogenetic Analysis of Group 1 Coronaviruses in South American Bats

PubMed Central

Foster, Jerome E.; Zhu, Hua Chen; Zhang, Jin Xia; Smith, Gavin J.D.; Thompson, Nadin; Auguste, Albert J.; Ramkissoon, Vernie; Adesiyun, Abiodun A.; Guan, Yi

2008-01-01

Bat coronaviruses (Bt-CoVs) are thought to be the precursors of severe acute respiratory syndrome coronavirus. We detected Bt-CoVs in 2 bat species from Trinidad. Phylogenetic analysis of the RNA-dependent RNA polymerase gene and helicase confirmed them as group 1 coronaviruses. PMID:19046513
Phylogenetic analysis of of Sarcocystis nesbitti (Coccidia: Sarcocystidae) suggests a snake as its probable definitive host

USDA-ARS?s Scientific Manuscript database

Sarcocystis nesbitti was first described by Mandour in 1969 from rhesus monkey muscle. Its definitive host remains unknown. 18SrRNA gene of Sarcocystis nesbitti was amplified, sequenced, and subjected to phylogenetic analysis. Among those congeners available for comparison, it shares closest affinit...
Using Phylogenetic Analysis to Detect Market Substitution of Atlantic Salmon for Pacific Salmon: An Introductory Biology Laboratory Experiment

ERIC Educational Resources Information Center

Cline, Erica; Gogarten, Jennifer

2012-01-01

We describe a laboratory exercise developed for the cell and molecular biology quarter of a year-long majors' undergraduate introductory biology sequence. In an analysis of salmon samples collected by students in their local stores and restaurants, DNA sequencing and phylogenetic analysis were used to detect market substitution of Atlantic salmon…
Phylogenetic comparative methods complement discriminant function analysis in ecomorphology.

PubMed

Barr, W Andrew; Scott, Robert S

2014-04-01

In ecomorphology, Discriminant Function Analysis (DFA) has been used as evidence for the presence of functional links between morphometric variables and ecological categories. Here we conduct simulations of characters containing phylogenetic signal to explore the performance of DFA under a variety of conditions. Characters were simulated using a phylogeny of extant antelope species from known habitats. Characters were modeled with no biomechanical relationship to the habitat category; the only sources of variation were body mass, phylogenetic signal, or random "noise." DFA on the discriminability of habitat categories was performed using subsets of the simulated characters, and Phylogenetic Generalized Least Squares (PGLS) was performed for each character. Analyses were repeated with randomized habitat assignments. When simulated characters lacked phylogenetic signal and/or habitat assignments were random, <5.6% of DFAs and <8.26% of PGLS analyses were significant. When characters contained phylogenetic signal and actual habitats were used, 33.27 to 45.07% of DFAs and <13.09% of PGLS analyses were significant. False Discovery Rate (FDR) corrections for multiple PGLS analyses reduced the rate of significance to <4.64%. In all cases using actual habitats and characters with phylogenetic signal, correct classification rates of DFAs exceeded random chance. In simulations involving phylogenetic signal in both predictor variables and predicted categories, PGLS with FDR was rarely significant, while DFA often was. In short, DFA offered no indication that differences between categories might be explained by phylogenetic signal, while PGLS did. As such, PGLS provides a valuable tool for testing the functional hypotheses at the heart of ecomorphology. Copyright © 2013 Wiley Periodicals, Inc.
Patterns of forest phylogenetic community structure across the United States and their possible forest health implications

Treesearch

Kevin M. Potter; Frank H. Koch

2014-01-01

The analysis of phylogenetic relationships among co-occurring tree species offers insights into the ecological organization of forest communities from an evolutionary perspective and, when employed regionally across thousands of plots, can assist in forest health assessment. Phylogenetic clustering of species, when species are more closely related than expected by...
Bacterial diversity in permanently cold and alkaline ikaite columns from Greenland.

PubMed

Schmidt, Mariane; Priemé, Anders; Stougaard, Peter

2006-12-01

Bacterial diversity in alkaline (pH 10.4) and permanently cold (4 degrees C) ikaite tufa columns from the Ikka Fjord, SW Greenland, was investigated using growth characterization of cultured bacterial isolates with Terminal-restriction fragment length polymorphism (T-RFLP) and sequence analysis of bacterial 16S rRNA gene fragments. More than 200 bacterial isolates were characterized with respect to pH and temperature tolerance, and it was shown that the majority were cold-active alkaliphiles. T-RFLP analysis revealed distinct bacterial communities in different fractions of three ikaite columns, and, along with sequence analysis, it showed the presence of rich and diverse bacterial communities. Rarefaction analysis showed that the 109 sequenced clones in the 16S rRNA gene library represented between 25 and 65% of the predicted species richness in the three ikaite columns investigated. Phylogenetic analysis of the 16S rRNA gene sequences revealed many sequences with similarity to alkaliphilic or psychrophilic bacteria, and showed that 33% of the cloned sequences and 33% of the cultured bacteria showed less than 97% sequence identity to known sequences in databases, and may therefore represent yet unknown species.
Specimen-level phylogenetics in paleontology using the Fossilized Birth-Death model with sampled ancestors.

PubMed

Cau, Andrea

2017-01-01

Bayesian phylogenetic methods integrating simultaneously morphological and stratigraphic information have been applied increasingly among paleontologists. Most of these studies have used Bayesian methods as an alternative to the widely-used parsimony analysis, to infer macroevolutionary patterns and relationships among species-level or higher taxa. Among recently introduced Bayesian methodologies, the Fossilized Birth-Death (FBD) model allows incorporation of hypotheses on ancestor-descendant relationships in phylogenetic analyses including fossil taxa. Here, the FBD model is used to infer the relationships among an ingroup formed exclusively by fossil individuals, i.e., dipnoan tooth plates from four localities in the Ain el Guettar Formation of Tunisia. Previous analyses of this sample compared the results of phylogenetic analysis using parsimony with stratigraphic methods, inferred a high diversity (five or more genera) in the Ain el Guettar Formation, and interpreted it as an artifact inflated by depositional factors. In the analysis performed here, the uncertainty on the chronostratigraphic relationships among the specimens was included among the prior settings. The results of the analysis confirm the referral of most of the specimens to the taxa Asiatoceratodus , Equinoxiodus, Lavocatodus and Neoceratodus , but reject those to Ceratodus and Ferganoceratodus . The resulting phylogeny constrained the evolution of the Tunisian sample exclusively in the Early Cretaceous, contrasting with the previous scenario inferred by the stratigraphically-calibrated topology resulting from parsimony analysis. The phylogenetic framework also suggests that (1) the sampled localities are laterally equivalent, (2) but three localities are restricted to the youngest part of the section; both results are in agreement with previous stratigraphic analyses of these localities. The FBD model of specimen-level units provides a novel tool for phylogenetic inference among fossils but also for independent tests of stratigraphic scenarios.
Soft-tissue anatomy of the extant hominoids: a review and phylogenetic analysis

PubMed Central

Gibbs, S; Collard, M; Wood, B

2002-01-01

This paper reports the results of a literature search for information about the soft-tissue anatomy of the extant non-human hominoid genera, Pan, Gorilla, Pongo and Hylobates, together with the results of a phylogenetic analysis of these data plus comparable data for Homo. Information on the four extant non-human hominoid genera was located for 240 out of the 1783 soft-tissue structures listed in the Nomina Anatomica. Numerically these data are biased so that information about some systems (e.g. muscles) and some regions (e.g. the forelimb) are over-represented, whereas other systems and regions (e.g. the veins and the lymphatics of the vascular system, the head region) are either under-represented or not represented at all. Screening to ensure that the data were suitable for use in a phylogenetic analysis reduced the number of eligible soft-tissue structures to 171. These data, together with comparable data for modern humans, were converted into discontinuous character states suitable for phylogenetic analysis and then used to construct a taxon-by-character matrix. This matrix was used in two tests of the hypothesis that soft-tissue characters can be relied upon to reconstruct hominoid phylogenetic relationships. In the first, parsimony analysis was used to identify cladograms requiring the smallest number of character state changes. In the second, the phylogenetic bootstrap was used to determine the confidence intervals of the most parsimonious clades. The parsimony analysis yielded a single most parsimonious cladogram that matched the molecular cladogram. Similarly the bootstrap analysis yielded clades that were compatible with the molecular cladogram; a (Homo, Pan) clade was supported by 95% of the replicates, and a (Gorilla, Pan, Homo) clade by 96%. These are the first hominoid morphological data to provide statistically significant support for the clades favoured by the molecular evidence. PMID:11833653
Evolutionary origins of the endocannabinoid system.

PubMed

McPartland, John M; Matias, Isabel; Di Marzo, Vincenzo; Glass, Michelle

2006-03-29

Endocannabinoid system evolution was estimated by searching for functional orthologs in the genomes of twelve phylogenetically diverse organisms: Homo sapiens, Mus musculus, Takifugu rubripes, Ciona intestinalis, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Arabidopsis thaliana, Plasmodium falciparum, Tetrahymena thermophila, Archaeoglobus fulgidus, and Mycobacterium tuberculosis. Sequences similar to human endocannabinoid exon sequences were derived from filtered BLAST searches, and subjected to phylogenetic testing with ClustalX and tree building programs. Monophyletic clades that agreed with broader phylogenetic evidence (i.e., gene trees displaying topographical congruence with species trees) were considered orthologs. The capacity of orthologs to function as endocannabinoid proteins was predicted with pattern profilers (Pfam, Prosite, TMHMM, and pSORT), and by examining queried sequences for amino acid motifs known to serve critical roles in endocannabinoid protein function (obtained from a database of site-directed mutagenesis studies). This novel transfer of functional information onto gene trees enabled us to better predict the functional origins of the endocannabinoid system. Within this limited number of twelve organisms, the endocannabinoid genes exhibited heterogeneous evolutionary trajectories, with functional orthologs limited to mammals (TRPV1 and GPR55), or vertebrates (CB2 and DAGLbeta), or chordates (MAGL and COX2), or animals (DAGLalpha and CB1-like receptors), or opisthokonta (animals and fungi, NAPE-PLD), or eukaryotes (FAAH). Our methods identified fewer orthologs than did automated annotation systems, such as HomoloGene. Phylogenetic profiles, nonorthologous gene displacement, functional convergence, and coevolution are discussed.
A RAD-based phylogenetics for Orestias fishes from Lake Titicaca.

PubMed

Takahashi, Tetsumi; Moreno, Edmundo

2015-12-01

The fish genus Orestias is endemic to the Andes highlands, and Lake Titicaca is the centre of the species diversity of the genus. Previous phylogenetic studies based on a single locus of mitochondrial and nuclear DNA strongly support the monophyly of a group composed of many of species endemic to the Lake Titicaca basin (the Lake Titicaca radiation), but the relationships among the species in the radiation remain unclear. Recently, restriction site-associated DNA (RAD) sequencing, which can produce a vast number of short sequences from various loci of nuclear DNA, has emerged as a useful way to resolve complex phylogenetic problems. To propose a new phylogenetic hypothesis of Orestias fishes of the Lake Titicaca radiation, we conducted a cluster analysis based on morphological similarities among fish samples and a molecular phylogenetic analysis based on RAD sequencing. From a morphological cluster analysis, we recognised four species groups in the radiation, and three of the four groups were resolved as monophyletic groups in maximum-likelihood trees based on RAD sequencing data. The other morphology-based group was not resolved as a monophyletic group in molecular phylogenies, and some members of the group were diverged from its sister group close to the root of the Lake Titicaca radiation. The evolution of these fishes is discussed from the phylogenetic relationships. Copyright © 2015 Elsevier Inc. All rights reserved.

The two-component signal system in rice (Oryza sativa L.): a genome-wide study of cytokinin signal perception and transduction.

PubMed

Du, Liming; Jiao, Fangchan; Chu, Jun; Jin, Gulei; Chen, Ming; Wu, Ping

2007-06-01

In this report we define the genes of two-component regulatory systems in rice through a comprehensive computational analysis of rice (Oryza sativa L.) genome sequence databases. Thirty-seven genes were identified, including 5 HKs (cytokinin-response histidine protein kinase) (OsHK1-4, OsHKL1), 5 HPs (histidine phosphotransfer proteins) (OsHP1-5), 15 type-A RRs (response regulators) (OsRR1-15), 7 type B RR genes (OsRR16-22), and 5 predicted pseudo-response regulators (OsPRR1-5). Protein motif organization, gene structure, phylogenetic analysis, chromosomal location, and comparative analysis between rice, maize, and Arabidopsis are described. Full-length cDNA clones of each gene were isolated from rice. Heterologous expression of each of the OsHKs in yeast mutants conferred histidine kinase function in a cytokinin-dependent manner. Nonconserved regions of individual cDNAs were used as probes in expression profiling experiments. This work provides a foundation for future functional dissection of the rice cytokinin two-component signaling pathway.
Analysis of the ergosterol biosynthesis pathway cloning, molecular characterization and phylogeny of lanosterol 14 α-demethylase (ERG11) gene of Moniliophthora perniciosa.

PubMed

de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles

2014-10-01

The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches' broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea.
Analysis of the ergosterol biosynthesis pathway cloning, molecular characterization and phylogeny of lanosterol 14 α-demethylase (ERG11) gene of Moniliophthora perniciosa

PubMed Central

de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles

2014-01-01

The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches’ broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea. PMID:25505843
Genome-Wide Identification, Evolution and Expression Analysis of mTERF Gene Family in Maize

PubMed Central

Zhao, Yanxin; Cai, Manjun; Zhang, Xiaobo; Li, Yurong; Zhang, Jianhua; Zhao, Hailiang; Kong, Fei; Zheng, Yonglian; Qiu, Fazhan

2014-01-01

Plant mitochondrial transcription termination factor (mTERF) genes comprise a large family with important roles in regulating organelle gene expression. In this study, a comprehensive database search yielded 31 potential mTERF genes in maize (Zea mays L.) and most of them were targeted to mitochondria or chloroplasts. Maize mTERF were divided into nine main groups based on phylogenetic analysis, and group IX represented the mitochondria and species-specific clade that diverged from other groups. Tandem and segmental duplication both contributed to the expansion of the mTERF gene family in the maize genome. Comprehensive expression analysis of these genes, using microarray data and RNA-seq data, revealed that these genes exhibit a variety of expression patterns. Environmental stimulus experiments revealed differential up or down-regulation expression of maize mTERF genes in seedlings exposed to light/dark, salts and plant hormones, respectively, suggesting various important roles of maize mTERF genes in light acclimation and stress-related responses. These results will be useful for elucidating the roles of mTERF genes in the growth, development and stress response of maize. PMID:24718683
PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

PubMed Central

2012-01-01

Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net. PMID:22568821
Evolution of trace amine associated receptor (TAAR) gene family in vertebrates: lineage-specific expansions and degradations of a second class of vertebrate chemosensory receptors expressed in the olfactory epithelium.

PubMed

Hashiguchi, Yasuyuki; Nishida, Mutsumi

2007-09-01

The trace amine-associated receptors (TAARs) form a specific family of G protein-coupled receptors in vertebrates. TAARs were initially considered neurotransmitter receptors, but recent study showed that mouse TAARs function as chemosensory receptors in the olfactory epithelium. To clarify the evolutionary dynamics of the TAAR gene family in vertebrates, near-complete repertoires of TAAR genes and pseudogenes were identified from the genomic assemblies of 4 teleost fishes (zebrafish, fugu, stickleback, and medaka), western clawed frogs, chickens, 3 mammals (humans, mice, and opossum), and sea lampreys. Database searches revealed that fishes had many putatively functional TAAR genes (13-109 genes), whereas relatively small numbers of TAAR genes (3-22 genes) were identified in tetrapods. Phylogenetic analysis of these genes indicated that the TAAR gene family was subdivided into 5 subfamilies that diverged before the divergence of ray-finned fishes and tetrapods. In tetrapods, virtually all TAAR genes were located in 1 specific region of their genomes as a gene cluster; however, in fishes, TAAR genes were scattered throughout more than 2 genomic locations. This possibly reflects a whole-genome duplication that occurred in the common ancestor of ray-finned fishes. Expression analysis of zebrafish and stickleback TAAR genes revealed that many TAARs in these fishes were expressed in the olfactory organ, suggesting the relatively high importance of TAARs as chemosensory receptors in fishes. A possible evolutionary history of the vertebrate TAAR gene family was inferred from the phylogenetic and comparative genomic analyses.
Comparing Mycobacterium tuberculosis genomes using genome topology networks.

PubMed

Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

2015-02-14

Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.
Estimating phylogenetic trees from genome-scale data.

PubMed

Liu, Liang; Xi, Zhenxiang; Wu, Shaoyuan; Davis, Charles C; Edwards, Scott V

2015-12-01

The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as "species tree" methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data. © 2015 New York Academy of Sciences.
CDAO-Store: Ontology-driven Data Integration for Phylogenetic Analysis

PubMed Central

2011-01-01

Background The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Results Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. Conclusions CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore. PMID:21496247
CDAO-store: ontology-driven data integration for phylogenetic analysis.

PubMed

Chisham, Brandon; Wright, Ben; Le, Trung; Son, Tran Cao; Pontelli, Enrico

2011-04-15

The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore.
Characterization of a novel orthoreovirus isolated from fruit bat, China.

PubMed

Hu, Tingsong; Qiu, Wei; He, Biao; Zhang, Yan; Yu, Jing; Liang, Xiu; Zhang, Wendong; Chen, Gang; Zhang, Yingguo; Wang, Yiyin; Zheng, Ying; Feng, Ziliang; Hu, Yonghe; Zhou, Weiguo; Tu, Changchun; Fan, Quanshui; Zhang, Fuqiang

2014-11-30

In recent years novel human respiratory disease agents have been described for Southeast Asia and Australia. The causative pathogens were classified as pteropine orthoreoviruses with a strong phylogenetic relationship to orthoreoviruses of bat origin. In this report, we isolated a novel Melaka-like reovirus (named "Cangyuan virus") from intestinal content samples of one fruit bat residing in China's Yunnan province. Phylogenetic analysis of the whole Cangyuan virus genome sequences of segments L, M and S demonstrated the genetic diversity of the Cangyuan virus. In contrast to the L and M segments, the phylogenetic trees for the S segments of Cangyuan virus demonstrated a greater degree of heterogeneity. Phylogenetic analysis indicated that the Cangyuan virus was a novel orthoreovirus and substantially different from currently known members of Pteropine orthoreovirus (PRV) species group.
Population structure of clinical Pseudomonas aeruginosa from West and Central African countries.

PubMed

Cholley, Pascal; Ka, Roughyatou; Guyeux, Christophe; Thouverez, Michelle; Guessennd, Nathalie; Ghebremedhin, Beniam; Frank, Thierry; Bertrand, Xavier; Hocquet, Didier

2014-01-01

Pseudomonas aeruginosa (PA) has a non-clonal, epidemic population with a few widely distributed and frequently encountered sequence types (STs) called 'high-risk clusters'. Clinical P. aeruginosa (clinPA) has been studied in all inhabited continents excepted in Africa, where a very few isolates have been analyzed. Here, we characterized a collection of clinPA isolates from four countries of West and Central Africa. 184 non-redundant isolates of clinPA from hospitals of Senegal, Ivory Coast, Nigeria, and Central African Republic were genotyped by MLST. We assessed their resistance level to antibiotics by agar diffusion and identified the extended-spectrum β-lactamases (ESBLs) and metallo-β-lactamases (MBLs) by sequencing. The population structure of the species was determined by a nucleotide-based analysis of the entire PA MLST database and further localized on the phylogenetic tree (i) the sequence types (STs) of the present collection, (ii) the STs by continents, (iii) ESBL- and MBL-producing STs from the MLST database. We found 80 distinct STs, of which 24 had no relationship with any known STs. 'High-risk' international clonal complexes (CC155, CC244, CC235) were frequently found in West and Central Africa. The five VIM-2-producing isolates belonged to CC233 and CC244. GES-1 and GES-9 enzymes were produced by one CC235 and one ST1469 isolate, respectively. We showed the spread of 'high-risk' international clonal complexes, often described as multidrug-resistant on other continents, with a fully susceptible phenotype. The MBL- and ESBL-producing STs were scattered throughout the phylogenetic tree and our data suggest a poor association between a continent and a specific phylogroup. ESBL- and MBL-encoding genes are borne by both successful international clonal complexes and distinct local STs in clinPA of West and Central Africa. Furthermore, our data suggest that the spread of a ST could be either due to its antibiotic resistance or to features independent from the resistance to antibiotics.
A Format for Phylogenetic Placements

PubMed Central

Matsen, Frederick A.; Hoffman, Noah G.; Gallagher, Aaron; Stamatakis, Alexandros

2012-01-01

We have developed a unified format for phylogenetic placements, that is, mappings of environmental sequence data (e.g., short reads) into a phylogenetic tree. We are motivated to do so by the growing number of tools for computing and post-processing phylogenetic placements, and the lack of an established standard for storing them. The format is lightweight, versatile, extensible, and is based on the JSON format, which can be parsed by most modern programming languages. Our format is already implemented in several tools for computing and post-processing parsimony- and likelihood-based phylogenetic placements and has worked well in practice. We believe that establishing a standard format for analyzing read placements at this early stage will lead to a more efficient development of powerful and portable post-analysis tools for the growing applications of phylogenetic placement. PMID:22383988
A format for phylogenetic placements.

PubMed

Matsen, Frederick A; Hoffman, Noah G; Gallagher, Aaron; Stamatakis, Alexandros

2012-01-01

We have developed a unified format for phylogenetic placements, that is, mappings of environmental sequence data (e.g., short reads) into a phylogenetic tree. We are motivated to do so by the growing number of tools for computing and post-processing phylogenetic placements, and the lack of an established standard for storing them. The format is lightweight, versatile, extensible, and is based on the JSON format, which can be parsed by most modern programming languages. Our format is already implemented in several tools for computing and post-processing parsimony- and likelihood-based phylogenetic placements and has worked well in practice. We believe that establishing a standard format for analyzing read placements at this early stage will lead to a more efficient development of powerful and portable post-analysis tools for the growing applications of phylogenetic placement.
A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

PubMed

Rajan, Vaibhav

2013-03-01

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.
Relationships among North American and Japanese Laetiporus isolates inferred from molecular phylogenetics and single-spore incompatibility reactions

Treesearch

Mark T. Banik; Daniel L. Lindner; Yuko Ota; Tsutomu Hattori

2010-01-01

Relationships were investigated among North American and Japanese isolates of Laetiporus using phylogenetic analysis of ITS sequences and single-spore isolate incompatibility. Single-spore isolate pairings revealed no significant compatibility between North American and Japanese isolates. ITS analysis revealed 12 clades within the core ...
Spatial and phylogenetic analysis of the vesicular stomatitis virus epidemic in the southwestern United States in 2004-2006

USDA-ARS?s Scientific Manuscript database

The southwestern United States has been incidentally affected by vesicular stomatitis virus (VSV) epidemics during the last 100 years. By the time this manuscript was written, the last episodes were reported in 2004-2006. Results of space clustering and phylogenetic analysis techniques used here sug...
Phylogenetic analysis of West Nile virus, Nuevo Leon State, Mexico.

PubMed

Blitvich, Bradley J; Fernández-Salas, Ildefonso; Contreras-Cordero, Juan F; Loroño-Pino, María A; Marlenee, Nicole L; Díaz, Francisco J; González-Rojas, José I; Obregón-Martínez, Nelson; Chiu-García, Jorge A; Black, William C; Beaty, Barry J

2004-07-01

West Nile virus RNA was detected in brain tissue from a horse that died in June 2003 in Nuevo Leon State, Mexico. Nucleotide sequencing and phylogenetic analysis of the premembrane and envelope genes showed that the virus was most closely related to West Nile virus isolates collected in Texas in 2002.
Phylogenetic Analysis of West Nile Virus, Nuevo Leon State, Mexico

PubMed Central

Blitvich, Bradley J.; Fernández-Salas, Ildefonso; Contreras-Cordero, Juan F.; Loroño-Pino, María A.; Marlenee, Nicole L.; Díaz, Francisco J.; González-Rojas, José I.; Obregón-Martínez, Nelson; Chiu-García, Jorge A.; Black, William C.

2004-01-01

West Nile virus RNA was detected in brain tissue from a horse that died in June 2003 in Nuevo Leon State, Mexico. Nucleotide sequencing and phylogenetic analysis of the premembrane and envelope genes showed that the virus was most closely related to West Nile virus isolates collected in Texas in 2002. PMID:15324558
PHYLOGENETIC ANALYSIS OF 16S RRNA GENE SEQUENCES REVEALS THE PREVALENCE OF MYCOBACTERIA SP., ALPHA-PROTEOBACTERIA, AND UNCULTURED BACTERIA IN DRINKING WATER MICROBIAL COMMUNITIES

EPA Science Inventory

Previous studies have shown that culture-based methods tend to underestimate the densities and diversity of bacterial populations inhabiting water distribution systems (WDS). In this study, the phylogenetic diversity of drinking water bacteria was assessed using sequence analysis...

A revision and phylogenetic analysis of the spider genus Oxysoma Nicolet (Araneae: Anyphaenidae, Amaurobioidinae).

PubMed

Aisen, Santiago; Ramírez, Martín J

2015-08-06

We review the spider genus Oxysoma Nicolet, with most of its species endemic from the southern temperate forests in Chile and Argentina, and present a phylogenetic analysis including seven species, of which three are newly described in this study (O. macrocuspis new species, O. kuni new species, and O. losruiles new species, all from Chile), together with other 107 representatives of Anyphaenidae. New geographical records and distribution maps are provided for all species, with illustrations and reviewed diagnoses for the genus and the four previously known species (O. punctatum Nicolet, O. saccatum (Tullgren), O. longiventre (Nicolet) and O. itambezinho Ramírez). The phylogenetic analysis using cladistic methods is based on 264 previously defined characters plus one character that arises from this study. The three new species are closely related with Oxysoma longiventre, and this four species compose what we define as the Oxysoma longiventre species group. The phylogenetic analysis did not retrieve the monophyly of Oxysoma, which should be reevaluated in the future, together with the genus Tasata.
Utility of COX1 phylogenetics to differentiate between locally acquired and imported Plasmodium knowlesi infections in Singapore

PubMed Central

Loh, Jin Phang; Gao, Qiu Han Christine; Lee, Vernon J; Tetteh, Kevin; Drakeley, Chris

2016-01-01

INTRODUCTION Although there have been several phylogenetic studies on Plasmodium knowlesi (P. knowlesi), only cytochrome c oxidase subunit 1 (COX1) gene analysis has shown some geographical differentiation between the isolates of different countries. METHODS Phylogenetic analysis of locally acquired P. knowlesi infections, based on circumsporozoite, small subunit ribosomal ribonucleic acid (SSU rRNA), merozoite surface protein 1 and COX1 gene targets, was performed. The results were compared with the published sequences of regional isolates from Malaysia and Thailand. RESULTS Phylogenetic analysis of the circumsporozoite, SSU rRNA and merozoite surface protein 1 gene sequences for regional P. knowlesi isolates showed no obvious differentiation that could be attributed to their geographical origin. However, COX1 gene analysis showed that it was possible to differentiate between Singapore-acquired P. knowlesi infections and P. knowlesi infections from Peninsular Malaysia and Sarawak, Borneo, Malaysia. CONCLUSION The ability to differentiate between locally acquired P. knowlesi infections and imported P. knowlesi infections has important utility for the monitoring of P. knowlesi malaria control programmes in Singapore. PMID:26805667
Utility of COX1 phylogenetics to differentiate between locally acquired and imported Plasmodium knowlesi infections in Singapore.

PubMed

Loh, Jin Phang; Gao, Qiu Han Christine; Lee, Vernon J; Tetteh, Kevin; Drakeley, Chris

2016-12-01

Although there have been several phylogenetic studies on Plasmodium knowlesi (P. knowlesi), only cytochrome c oxidase subunit 1 (COX1) gene analysis has shown some geographical differentiation between the isolates of different countries. Phylogenetic analysis of locally acquired P. knowlesi infections, based on circumsporozoite, small subunit ribosomal ribonucleic acid (SSU rRNA), merozoite surface protein 1 and COX1 gene targets, was performed. The results were compared with the published sequences of regional isolates from Malaysia and Thailand. Phylogenetic analysis of the circumsporozoite, SSU rRNA and merozoite surface protein 1 gene sequences for regional P. knowlesi isolates showed no obvious differentiation that could be attributed to their geographical origin. However, COX1 gene analysis showed that it was possible to differentiate between Singapore-acquired P. knowlesi infections and P. knowlesi infections from Peninsular Malaysia and Sarawak, Borneo, Malaysia. The ability to differentiate between locally acquired P. knowlesi infections and imported P. knowlesi infections has important utility for the monitoring of P. knowlesi malaria control programmes in Singapore. Copyright: © Singapore Medical Association
Phylogenetic Analysis of Aedes aegypti Based on Mitochondrial ND4 Gene Sequences in Almadinah, Saudi Arabia.

PubMed

Ali, Khalil H Al; El-Badry, Ayman A; Ali, Mouhanad Al; El-Sayed, Wael S M; El-Beshbishy, Hesham A

2016-06-01

Aedes aegypti is the main vector of the yellow fever and dengue virus. This mosquito has become the major indirect cause of morbidity and mortality of the human worldwide. Dengue virus activity has been reported recently in the western areas of Saudi Arabia. There is no vaccine for dengue virus until now, and the control of the disease depends on the control of the vector. The present study has aimed to perform phylogenetic analysis of Aedes aegypti based on mitochondrial NADH dehydrogenase subunit 4 ( ND4 ) gene at Almadinah, Saudi Arabia in order to get further insight into the epidemiology and transmission of this vector. Mitochondrial ND4 gene was sequenced in the eight isolated Aedes aegypti mosquitoes from Almadinah, Saudi Arabia, sequences were aligned, and phylogenetic analysis were performed and compared with 54 sequences of Aedes reported in the previous studies from Mexico, Thailand, Brazil, and Africa. Our results suggest that increased gene flow among Aedes aegypti populations occurs between Africa and Saudi Arabia. Phylogenetic relationship analysis showed two genetically distinct Aedes aegypti in Saudi Arabia derived from dual African ancestor.
Babes in the wood – a unique window into sea scorpion ontogeny

PubMed Central

2013-01-01

Background Few studies on eurypterids have taken into account morphological changes that occur throughout postembryonic development. Here two species of eurypterid are described from the Pragian Beartooth Butte Formation of Cottonwood Canyon in Wyoming and included in a phylogenetic analysis. Both species comprise individuals from a number of instars, and this allows for changes that occur throughout their ontogeny to be documented, and how ontogenetically variable characters can influence phylogenetic analysis to be tested. Results The two species of eurypterid are described as Jaekelopterus howelli (Kjellesvig-Waering and Størmer, 1952) and Strobilopterus proteus sp. nov. Phylogenetic analysis places them within the Pterygotidae and Strobilopteridae respectively, both families within the Eurypterina. Jaekelopterus howelli shows positive allometry of the cheliceral denticles throughout ontogeny, while a number of characteristics including prosomal appendage length, carapace shape, lateral eye position, and relative breadth all vary during the growth of Strobilopterus proteus. Conclusions The ontogeny of Strobilopterus proteus shares much in common with that of modern xiphosurans, however certain characteristics including apparent true direct development suggest a closer affinity to arachnids. The ontogenetic development of the genital appendage also supports the hypothesis that the structure is homologous to the endopods of the trunk limbs of other arthropods. Including earlier instars in the phylogenetic analysis is shown to destabilise the retrieved topology. Therefore, coding juveniles as individual taxa in an analysis is shown to be actively detrimental and alternative ways of coding ontogenetic data into phylogenetic analyses should be explored. PMID:23663507
Transcriptome resources for the frogs Lithobates clamitans and Pseudacris regilla, emphasizing antimicrobial peptides and conserved loci for phylogenetics

USGS Publications Warehouse

Robertson, Laura S.; Cornman, Robert S.

2014-01-01

We developed genetic resources for two North American frogs, Lithobates clamitans and Pseudacris regilla, widespread native amphibians that are potential indicator species of environmental health. For both species, mRNA from multiple tissues was sequenced using 454 technology. De novo assemblies with Mira3 resulted in 50 238 contigs (N50 = 687 bp) and 48 213 contigs (N50 = 686 bp) for L. clamitans and P. regilla, respectively, after clustering with CD-Hit-EST and purging contigs below 200 bp. We performed BLASTX similarity searches against the Xenopus tropicalis proteome and, for predicted ORFs, HMMER similarity searches against the Pfam-A database. Because there is broad interest in amphibian immune factors, we manually annotated putative antimicrobial peptides. To identify conserved regions suitable for amplicon resequencing across a broad taxonomic range, we performed an additional assembly of public short-read transcriptome data derived from two species of the genus Rana and identified reciprocal best TBLASTX matches among all assemblies. Although P. regilla, a hylid frog, is substantially more diverged from the ranid species, we identified 56 genes that were sufficiently conserved to allow nondegenerate primer design with Primer3. In addition to providing a foundation for comparative genomics and quantitative gene expression analysis, our results enable quick development of nuclear sequence-based markers for phylogenetics or population genetics.
The Trichoptera barcode initiative: a strategy for generating a species-level Tree of Life

PubMed Central

Frandsen, Paul B.; Holzenthal, Ralph W.; Beet, Clare R.; Bennett, Kristi R.; Blahnik, Roger J.; Bonada, Núria; Cartwright, David; Chuluunbat, Suvdtsetseg; Cocks, Graeme V.; Collins, Gemma E.; deWaard, Jeremy; Dean, John; Flint, Oliver S.; Hausmann, Axel; Hendrich, Lars; Hess, Monika; Hogg, Ian D.; Kondratieff, Boris C.; Malicky, Hans; Milton, Megan A.; Morinière, Jérôme; Morse, John C.; Mwangi, François Ngera; Pauls, Steffen U.; Gonzalez, María Razo; Rinne, Aki; Robinson, Jason L.; Salokannel, Juha; Shackleton, Michael; Smith, Brian; Stamatakis, Alexandros; StClair, Ros; Thomas, Jessica A.; Zamora-Muñoz, Carmen; Ziesmann, Tanja

2016-01-01

DNA barcoding was intended as a means to provide species-level identifications through associating DNA sequences from unknown specimens to those from curated reference specimens. Although barcodes were not designed for phylogenetics, they can be beneficial to the completion of the Tree of Life. The barcode database for Trichoptera is relatively comprehensive, with data from every family, approximately two-thirds of the genera, and one-third of the described species. Most Trichoptera, as with most of life's species, have never been subjected to any formal phylogenetic analysis. Here, we present a phylogeny with over 16 000 unique haplotypes as a working hypothesis that can be updated as our estimates improve. We suggest a strategy of implementing constrained tree searches, which allow larger datasets to dictate the backbone phylogeny, while the barcode data fill out the tips of the tree. We also discuss how this phylogeny could be used to focus taxonomic attention on ambiguous species boundaries and hidden biodiversity. We suggest that systematists continue to differentiate between ‘Barcode Index Numbers’ (BINs) and ‘species’ that have been formally described. Each has utility, but they are not synonyms. We highlight examples of integrative taxonomy, using both barcodes and morphology for species description. This article is part of the themed issue ‘From DNA barcodes to biomes’. PMID:27481793
Detecting Coevolution in and among Protein Domains

PubMed Central

Yeang, Chen-Hsiang; Haussler, David

2007-01-01

Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264
Single-cell genomics reveals complex carbohydrate degradation patterns in poribacterial symbionts of marine sponges

PubMed Central

Kamke, Janine; Sczyrba, Alexander; Ivanova, Natalia; Schwientek, Patrick; Rinke, Christian; Mavromatis, Kostas; Woyke, Tanja; Hentschel, Ute

2013-01-01

Many marine sponges are hosts to dense and phylogenetically diverse microbial communities that are located in the extracellular matrix of the animal. The candidate phylum Poribacteria is a predominant member of the sponge microbiome and its representatives are nearly exclusively found in sponges. Here we used single-cell genomics to obtain comprehensive insights into the metabolic potential of individual poribacterial cells representing three distinct phylogenetic groups within Poribacteria. Genome sizes were up to 5.4 Mbp and genome coverage was as high as 98.5%. Common features of the poribacterial genomes indicated that heterotrophy is likely to be of importance for this bacterial candidate phylum. Carbohydrate-active enzyme database screening and further detailed analysis of carbohydrate metabolism suggested the ability to degrade diverse carbohydrate sources likely originating from seawater and from the host itself. The presence of uronic acid degradation pathways as well as several specific sulfatases provides strong support that Poribacteria degrade glycosaminoglycan chains of proteoglycans, which are important components of the sponge host matrix. Dominant glycoside hydrolase families further suggest degradation of other glycoproteins in the host matrix. We therefore propose that Poribacteria are well adapted to an existence in the sponge extracellular matrix. Poribacteria may be viewed as efficient scavengers and recyclers of a particular suite of carbon compounds that are unique to sponges as microbial ecosystems. PMID:23842652
Actinomyces gaoshouyii sp. nov., isolated from plateau pika (Ochotona curzoniae).

PubMed

Meng, Xiangli; Wang, Yiting; Lu, Shan; Lai, Xin-He; Jin, Dong; Yang, Jing; Xu, Jianguo

2017-09-01

Two strains (pika_113T and pika_114) of a previously undescribed Actinomyces-like bacterium were recovered from the intestinal contents of plateau pika (Ochotona curzoniae) on the Tibet-Qinghai Plateau, China. Results from biochemical characterization indicated that the two strains were phenotypically homogeneous and distinct from other previously described species of the genus Actinomyces. Based on the comparison of 16S rRNA gene sequences and genome analysis, the bacteria were determined to be a hitherto unknown subline within the genus Actinomyces, being most closely related to type strains of Actinomyces denticolens and Actinomyces timonensis with a respective 97.2 and 97.1 % similarity in their 16S rRNA gene sequences. Phylogenetic analyses confirmed that pika_113T was well separated from any other recognized species of the genus Actinomyces and within the cluster with A. denticolens and A. timonensis. The genome of strain pika_113T displayed less than 42 % relatedness in DNA-DNA hybridization with all the available genomes of existing species of the genus Actinomyces in the NCBI database. Collectively, based on the phenotypic characteristics and phylogenetic analyses results, we propose the novel isolates as representatives of Actinomyces gaoshouyii sp. nov. The type strain of Actinomyces gaoshouyii is pika_113T (=CGMCC 4.7372T=DSM 104049T), with a genomic DNA G+C content of 71 mol%.
Phylogenetic evidence for multiple intertypic recombinations in enterovirus B81 strains isolated in Tibet, China

PubMed Central

Hu, Lan; Zhang, Yong; Hong, Mei; Zhu, Shuangli; Yan, Dongmei; Wang, Dongyan; Li, Xiaolei; Zhu, Zhen; Tsewang; Xu, Wenbo

2014-01-01

Enterovirus B81 (EV-B81) is a newly identified serotype within the species enterovirus B (EV-B). To date, only eight nucleotide sequences of EV-B81 have been published and only one full-length genome sequence (the prototype strain) has been made available in the GenBank database. Here, we report the full-length genome sequences of two EV-B81 strains isolated in the Tibet Autonomous Region of China during acute flaccid paralysis surveillance activities, and we also conducted an antibody seroprevalence study in two prefectures of Tibet. The sequence comparison and phylogenetic dendrogram analysis revealed high variability among the global EV-B81 strains and frequent intertypic recombination in the non-structural protein region of EV-B serotypes, suggesting high genetic diversity of EV-B81. However, low positive rates and low titers of neutralizing antibodies against EV-B81 were detected. Nearly 68% of children under the age of five had no neutralizing antibodies against EV-B81. Hence, the extent of transmission and the exposure of the population to this EV type are very limited. Although little is known about the biological and pathogenic properties of EV-B81 because of few research in this field owing to the limited number of isolates, our study provides basic information for further studies of EV-B81. PMID:25112835
Evolutionary profiles from the QR factorization of multiple sequence alignments

PubMed Central

Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

2005-01-01

We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270
Phylogeny of Morella rubra and Its Relatives (Myricaceae) and Genetic Resources of Chinese Bayberry Using RAD Sequencing

PubMed Central

Liu, Luxian; Jin, Xinjie; Chen, Nan; Li, Xian; Li, Pan; Fu, Chengxin

2015-01-01

Phylogenetic relationships among Chinese species of Morella (Myricaceae) are unresolved. Here, we use restriction site-associated DNA sequencing (RAD-seq) to identify candidate loci that will help in determining phylogenetic relationships among Morella rubra, M. adenophora, M. nana and M. esculenta. Three methods for inferring phylogeny, maximum parsimony (MP), maximum likelihood (ML) and Bayesian concordance, were applied to data sets including as many as 4253 RAD loci with 8360 parsimony informative variable sites. All three methods significantly favored the topology of (((M. rubra, M. adenophora), M. nana), M. esculenta). Two species from North America (M. cerifera and M. pensylvanica) were placed as sister to the four Chinese species. According to BEAST analysis, we deduced speciation of M. rubra to be at about the Miocene-Pliocene boundary (5.28 Ma). Intraspecific divergence in M. rubra occurred in the late Pliocene (3.39 Ma). From pooled data, we assembled 29378, 21902 and 23552 de novo contigs with an average length of 229, 234 and 234 bp for M. rubra, M. nana and M. esculenta respectively. The contigs were used to investigate functional classification of RAD tags in a BLASTX search. Additionally, we identified 3808 unlinked SNP sites across the four populations of M. rubra and discovered genes associated with fruit ripening and senescence, fruit quality and disease/defense metabolism based on KEGG database. PMID:26431030
Phylogenetic analysis of porcine reproductive and respiratory syndrome virus isolates from Northern Ireland.

PubMed

Smith, Natalie; Power, Ultan F; McKillen, John

2018-05-29

To investigate the genetic diversity of porcine reproductive and respiratory syndrome virus (PRRSV) in Northern Ireland, the ORF5 gene from nine field isolates was sequenced and phylogenetically analysed. The results revealed relatively high diversity amongst isolates, with 87.6-92.2% identity between farms at the nucleotide level and 84.1-93.5% identity at the protein level. Phylogenetic analysis confirmed that all nine isolates belonged to the European (type 1) genotype and formed a cluster within the subtype 1 subgroup. This study provides the first report on PRRSV isolate diversity in Northern Ireland.
Characterization of the complete mitochondrial genome of the hybrid Epinephelus moara♀ × Epinephelus lanceolatus♂, and phylogenetic analysis in subfamily epinephelinae

NASA Astrophysics Data System (ADS)

Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin

2017-06-01

This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.
Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

NASA Astrophysics Data System (ADS)

Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

2007-12-01

Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.
Basic Helix-Loop-Helix Transcription Factor Gene Family Phylogenetics and Nomenclature

PubMed Central

Skinner, Michael K.; Rawls, Alan; Wilson-Rawls, Jeanne; Roalson, Eric H.

2010-01-01

A phylogenetic analysis of the basic helix-loop-helix (bHLH) gene superfamily was performed using seven different species (human, mouse, rat, worm, fly, yeast, and plant Arabidopsis) and involving over 600 bHLH genes [1]. All bHLH genes were identified in the genomes of the various species, including expressed sequence tags, and the entire coding sequence was used in the analysis. Nearly 15% of the gene family has been updated or added since the original publication. A super-tree involving six clades and all structural relationships was established and is now presented for four of the species. The wealth of functional data available for members of the bHLH gene superfamily provides us with the opportunity to use this exhaustive phylogenetic tree to predict potential functions of uncharacterized members of the family. This phylogenetic and genomic analysis of the bHLH gene family has revealed unique elements of the evolution and functional relationships of the different genes in the bHLH gene family. PMID:20219281
Plunging hands into the mushroom jar: a phylogenetic framework for Lyophyllaceae (Agaricales, Basidiomycota).

PubMed

Bellanger, J-M; Moreau, P-A; Corriol, G; Bidaud, A; Chalange, R; Dudova, Z; Richard, F

2015-04-01

During the last two decades, the unprecedented development of molecular phylogenetic tools has propelled an opportunity to revisit the fungal kingdom under an evolutionary perspective. Mycology has been profoundly changed but a sustained effort to elucidate large sections of the astonishing fungal diversity is still needed. Here we fill this gap in the case of Lyophyllaceae, a species-rich and ecologically diversified family of mushrooms. Assembly and genealogical concordance multigene phylogenetic analysis of a large dataset that includes original, vouchered material from expert field mycologists reveal the phylogenetic topology of the family, from higher (generic) to lower (species) levels. A comparative analysis of the most widely used phylogenetic markers in Fungi indicates that the nuc rDNA region encompassing the internal transcribed spacers 1 and 2, along with the 5.8S rDNA (ITS) and portions of the genes for RNA polymerase II second largest subunit (RPB2) is the most performing combination to resolve the broadest range of taxa within Lyophyllaceae. Eleven distinct evolutionary lineages are identified, that display partial overlap with traditional genera as well as with the phylogenetic framework previously proposed for the family. Eighty phylogenetic species are delineated, which shed light on a large number of morphological concepts, including rare and poorly documented ones. Probing these novel phylogenetic species to the barcoding method of species limit delineation, indicates that the latter method fully resolves Lyophyllaceae species, except in one clade. This case study provides the first comprehensive phylogenetic overview of Lyophyllaceae, a necessary step towards a taxonomical, ecological and nomenclatural revision of this family of mushrooms. It also proposes a set of methodological guidelines that may be of relevance for future taxonomic works in other groups of Fungi.
Phylogenetic analysis of the envelope protein (domain lll) of dengue 4 viruses

PubMed Central

Mota, Javier; Ramos-Castañeda, José; Rico-Hesse, Rebeca; Ramos, Celso

2011-01-01

Objective To evaluate the genetic variability of domain III of envelope (E) protein and to estimate phylogenetic relationships of dengue 4 (Den-4) viruses isolated in Mexico and from other endemic areas of the world. Material and Methods A phylogenetic study of domain III of envelope (E) protein of Den-4 viruses was conducted in 1998 using virus strains from Mexico and other parts of the world, isolated in different years. Specific primers were used to amplify by RT-PCR the domain III and to obtain nucleotide sequence. Based on nucleotide and deduced aminoacid sequence, genetic variability was estimated and a phylogenetic tree was generated. To make an easy genetic analysis of domain III region, a Restriction Fragment Length Polymorphism (RFLP) assay was performed, using six restriction enzymes. Results Study results demonstrate that nucleotide and aminoacid sequence analysis of domain III are similar to those reported from the complete E protein gene. Based on the RFLP analysis of domain III using the restriction enzymes Nla III, Dde I and Cfo I, Den-4 viruses included in this study were clustered into genotypes 1 and 2 previously reported. Conclusions Study results suggest that domain III may be used as a genetic marker for phylogenetic and molecular epidemiology studies of dengue viruses. The English version of this paper is available too at: http://www.insp.mx/salud/index.html PMID:12132320
GENOME-WIDE COMPARATIVE ANALYSIS OF PHYLOGENETIC TREES: THE PROKARYOTIC FOREST OF LIFE

PubMed Central

Puigbò, Pere; Wolf, Yuri I.; Koonin, Eugene V.

2013-01-01

Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a ‘species tree’. PMID:22399455

Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life.

PubMed

Puigbò, Pere; Wolf, Yuri I; Koonin, Eugene V

2012-01-01

Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the application of these methods to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
Phylogenetic position of the genus Perkinsus (Protista, Apicomplexa) based on small subunit ribosomal RNA.

PubMed

Goggin, C L; Barker, S C

1993-07-01

Parasites of the genus Perkinsus destroy marine molluscs worldwide. Their phylogenetic position within the kingdom Protista is controversial. Nucleotide sequence data (1792 bp) from the small subunit rRNA gene of Perkinsus sp. from Anadara trapezia (Mollusca: Bivalvia) from Moreton Bay, Queensland, was used to examine the phylogenetic affinities of this enigmatic genus. These data were aligned with nucleotide sequences from 6 apicomplexans, 3 ciliates, 3 flagellates, a dinoflagellate, 3 fungi, maize and human. Phylogenetic trees were constructed after analysis with maximum parsimony and distance matrix methods. Our analyses indicate that Perkinsus is phylogenetically closer to dinoflagellates and to coccidean and piroplasm apicomplexans than to fungi or flagellates.
Genomic analysis of carboxyl/cholinesterase genes in the silkworm Bombyx mori

PubMed Central

2010-01-01

Background Carboxyl/cholinesterases (CCEs) have pivotal roles in dietary detoxification, pheromone or hormone degradation and neurodevelopment. The recent completion of genome projects in various insect species has led to the identification of multiple CCEs with unknown functions. Here, we analyzed the phylogeny, expression and genomic distribution of 69 putative CCEs in the silkworm, Bombyx mori (Lepidoptera: Bombycidae). Results A phylogenetic tree of CCEs in B. mori and other lepidopteran species was constructed. The expression pattern of each B. mori CCE was also investigated by a search of an expressed sequence tag (EST) database, and the relationship between phylogeny and expression was analyzed. A large number of B. mori CCEs were identified from a midgut EST library. CCEs expressed in the midgut formed a cluster in the phylogenetic tree that included not only B. mori genes but also those of other lepidopteran species. The silkworm, and possibly also other lepidopteran species, has a large number of CCEs, and this might be a consequence of the large cluster of midgut CCEs. Investigation of intron-exon organization in B. mori CCEs revealed that their positions and splicing site phases were strongly conserved. Several B. mori CCEs, including juvenile hormone esterase, not only showed clustering in the phylogenetic tree but were also closely located on silkworm chromosomes. We investigated the phylogeny and microsynteny of neuroligins in detail, among many CCEs. Interestingly, we found the evolution of this gene appeared not to be conserved between B. mori and other insect orders. Conclusions We analyzed 69 putative CCEs from B. mori. Comparison of these CCEs with other lepidopteran CCEs indicated that they had conserved expression and function in this insect order. The analyses showed that CCEs were unevenly distributed across the genome of B. mori and suggested that neuroligins may have a distinct evolutionary history from other insect order. It is possible that such an uneven genomic distribution and a unique neuroligin evolution are shared with other lepidopteran insects. Our genomic analysis has provided novel information on the CCEs of the silkworm, which will be of value to understanding the biology, physiology and evolution of insect CCEs. PMID:20546589
Phylogenetic resolution and habitat specificity of members of the Photobacterium phosphoreum species group.

PubMed

Ast, Jennifer C; Dunlap, Paul V

2005-10-01

Substantial ambiguity exists regarding the phylogenetic status of facultatively psychrophilic luminous bacteria identified as Photobacterium phosphoreum, a species thought to be widely distributed in the world's oceans and believed to be the specific bioluminescent light-organ symbiont of several deep-sea fishes. Members of the P. phosphoreum species group include luminous and non-luminous strains identified phenotypically from a variety of different habitats as well as phylogenetically defined lineages that appear to be evolutionarily distinct. To resolve this ambiguity and to begin developing a meaningful knowledge of the geographic distributions, habitats and symbiotic relationships of bacteria in the P. phosphoreum species group, we carried out a multilocus, fine-scale phylogenetic analysis based on sequences of the 16S rRNA, gyrB and luxABFE genes of many newly isolated luminous strains from symbiotic and saprophytic habitats, together with previously isolated luminous and non-luminous strains identified as P. phosphoreum from these and other habitats. Parsimony analysis unambiguously resolved three evolutionarily distinct clades, phosphoreum, iliopiscarium and kishitanii. The tight phylogenetic clustering within these clades and the distinct separation between them indicates they are different species, P. phosphoreum, Photobacterium iliopiscarium and the newly recognized 'Photobacterium kishitanii'. Previously reported non-luminous strains, which had been identified phenotypically as P. phosphoreum, resolved unambiguously as P. iliopiscarium, and all examined deep-sea fishes (specimens of families Chlorophthalmidae, Macrouridae, Moridae, Trachichthyidae and Acropomatidae) were found to harbour 'P. kishitanii', not P. phosphoreum, in their light organs. This resolution revealed also that 'P. kishitanii' is cosmopolitan in its geographic distribution. Furthermore, the lack of phylogenetic variation within 'P. kishitanii' indicates that this facultatively symbiotic bacterium is not cospeciating with its phylogenetically divergent host fishes. The results of this fine-scale phylogenetic analysis support the emerging view that bacterial species names should designate singular historical entities, i.e. discrete lineages diagnosed by a significant divergence of shared derived nucleotide characters.
Proteinortho: Detection of (Co-)orthologs in large-scale analysis

PubMed Central

2011-01-01

Background Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware. PMID:21526987
Lactobacillus crustorum sp. nov., isolated from two traditional Belgian wheat sourdoughs.

PubMed

Scheirlinck, Ilse; Van der Meulen, Roel; Van Schoor, Ann; Huys, Geert; Vandamme, Peter; De Vuyst, Luc; Vancanneyt, Marc

2007-07-01

A polyphasic taxonomic study of the lactic acid bacteria (LAB) population in three traditional Belgian sourdoughs, sampled between 2002 and 2004, revealed a group of isolates that could not be assigned to any recognized LAB species. Initially, sourdough isolates were screened by means of (GTG)(5)-PCR fingerprinting. Four isolates displaying unique (GTG)(5)-PCR patterns were further investigated by means of phenylalanyl-tRNA synthase (pheS) gene sequence analysis and represented a bifurcated branch that could not be allocated to any LAB species present in the in-house pheS database. Their phylogenetic affiliation was determined using 16S rRNA gene sequence analysis and showed that the four sourdough isolates belong to the Lactobacillus plantarum group with Lactobacillus mindensis, Lactobacillus farciminis and Lactobacillus nantensis as closest relatives. Further genotypic and phenotypic studies, including whole-cell protein analysis (SDS-PAGE), amplified fragment length polymorphism (AFLP) fingerprinting, DNA-DNA hybridization, DNA G+C content analysis, growth characteristics and biochemical features, demonstrated that the new sourdough isolates represent a novel Lactobacillus species for which the name Lactobacillus crustorum sp. nov. is proposed. The type strain of the new species is LMG 23699(T) (=CCUG 53174(T)).
Evidence for a close phylogenetic relationship between Melissococcus pluton, the causative agent of European foulbrood disease, and the genus Enterococcus.

PubMed

Cai, J; Collins, M D

1994-04-01

The 16S rRNA gene sequence of Melissococcus pluton, the causative agent of European foulbrood disease, was determined in order to investigate the phylogenetic relationships between this organism and other low-G + C-content gram-positive bacteria. A comparative sequence analysis revealed that M. pluton is a close phylogenetic relative of the genus Enterococcus.
Aujeszky's disease in red fox (Vulpes vulpes): phylogenetic analysis unravels an unexpected epidemiologic link.

PubMed

Caruso, Claudio; Dondo, Alessandro; Cerutti, Francesco; Masoero, Loretta; Rosamilia, Alfonso; Zoppi, Simona; D'Errico, Valeria; Grattarola, Carla; Acutis, Pier Luigi; Peletto, Simone

2014-07-01

We describe Aujeszky's disease in a female of red fox (Vulpes vulpes). Although wild boar (Sus scrofa) would be the expected source of infection, phylogenetic analysis suggested a domestic rather than a wild source of virus, underscoring the importance of biosecurity measures in pig farms to prevent contact with wild animals.
Isolation and Phylogenetic Analysis of Sindbis Viruses from Mosquitoes in Germany ▿

PubMed Central

Jöst, Hanna; Bialonski, Alexandra; Storch, Volker; Günther, Stephan; Becker, Norbert; Schmidt-Chanasit, Jonas

2010-01-01

A molecular survey of 16,057 mosquitoes captured in Southwest Germany during the summer of 2009 demonstrated the presence of Sindbis virus (SINV) in Culex spp. and Anopheles maculipennis sensu lato. Phylogenetic analysis of the German SINV strains linked them with Swedish SINV strains, the causative agent of Ockelbo disease in humans. PMID:20335414
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.

PubMed

Hahn, Lars; Leimeister, Chris-André; Ounit, Rachid; Lonardi, Stefano; Morgenstern, Burkhard

2016-10-01

Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.
Characterization of Microbial Communities Associated With Deep-Sea Hydrothermal Vent Animals of the East Pacific Rise and the Galápagos Rift

NASA Astrophysics Data System (ADS)

Ward, N.; Page, S.; Heidelberg, J.; Eisen, J. A.; Fraser, C. M.

2002-12-01

The composition of microbial communities associated with deep-sea hydrothermal vent animals is of interest because of the key role of bacterial symbionts in driving the chemosynthetic food chain of the vent system, and also because bacterial biofilms attached to animal exterior surfaces may play a part in settlement of larval forms. Sequence analysis of 16S ribosomal RNA (rRNA) genes from such communities provides a snapshot of community structure, as this gene is present in all Bacteria and Archaea, and a useful phylogenetic marker for both cultivated microbial species, and uncultivated species such as many of those found in the deep-sea environment. Specimens of giant tube worms (Riftia pachyptila), mussels (Bathymodiolus thermophilus), and clams (Calyptogena magnifica) were collected during the 2002 R/V Atlantis research cruises to the East Pacific Rise (9N) and Galápagos Rift. Microbial biofilms attached to the exterior surfaces of individual animals were sampled, as were tissues known to harbor chemosynthetic bacterial endosymbionts. Genomic DNA was extracted from the samples using a commercially available kit, and 16S rRNA genes amplified from the mixed bacterial communities using the polymerase chain reaction (PCR) and oligonucleotide primers targeting conserved terminal regions of the 16S rRNA gene. The PCR products obtained were cloned into a plasmid vector and the recombinant plasmids transformed into cells of Escherichia coli. Individual cloned 16S rRNA genes were sequenced at the 5' end of the gene (the most phylogenetically informative region in most taxa) and the sequence data compared to publicly available gene sequence databases, to allow a preliminary assignment of clones to taxonomic groups within the Bacteria and Archaea, and to determine the overall composition and phylogenetic diversity of the animal-associated microbial communities. Analysis of Riftia pachyptila exterior biofilm samples revealed the presence of members of the delta and epsilon proteobacteria, low GC Gram positive bacteria (firmicutes), spirochetes, CFB (Cytophaga-Flavobacterium-Bacteroides) group, green nonsulfur bacteria, acidobacteria, verrucomicrobia, and planctomycetes. The presence of the latter three taxonomic groups is of special interest, as they represent phylogenetically distinct groups within the Bacteria for which specific ecological functions have not yet been identified, but which have been found to be widely distributed and often numerically significant in diverse terrestrial and aquatic habitats. Although further sequencing is required to demonstrate the presence of a Riftia-associated microbial population distinct from that of the surrounding seawater, results available from three Riftia individuals from the East Pacific Rise suggest this to be the case. Analysis of microbial communities associated with the gill tissue of the mussel Bathymodiolus thermophilus shows a population dominated by gamma-Proteobacterial chemoautotrophic symbionts, although lower frequency novel phylotypes have been detected. Representatives of specific taxonomic groups have been selected for sequencing of the complete 16S rRNA gene, and the sequences used to reconstruct phylogenetic trees to more accurately determine the evolutionary relationships between the novel sequences, and available sequences for both cultured and non-cultured bacteria.
Comparison of multilocus sequence typing and pulsed-field gel electrophoresis for Salmonella spp. identification in surface water

NASA Astrophysics Data System (ADS)

Kuo, Chun Wei; Hao Huang, Kuan; Hsu, Bing Mu; Tsai, Hsien Lung; Tseng, Shao Feng; Kao, Po Min; Shen, Shu Min; Chou Chiu, Yi; Chen, Jung Sheng

2013-04-01

Salmonella is one of the most important pathogens of waterborne diseases with outbreaks from contaminated water reported worldwide. In addition, Salmonella spp. can survive for long periods in aquatic environments. To realize genotypes and serovars of Salmonella in aquatic environments, we isolated the Salmonella strains by selective culture plates to identify the serovars of Salmonella by serological assay, and identify the genotypes by Multilocus sequence typing (MLST) based on the sequence data from University College Cork (UCC), respectively. The results show that 36 stream water samples (30.1%) and 18 drinking water samples (23.3%) were confirmed the existence of Salmonella using culture method combined PCR specific invA gene amplification. In this study, 24 cultured isolates of Salmonella from water samples were classified to fifteen Salmonella enterica serovars. In addition, we construct phylogenetic analysis using phylogenetic tree and Minimum spanning tree (MST) method to analyze the relationship of clinical, environmental, and geographical data. Phylogenetic tree showed that four main clusters and our strains can be distributed in all. The genotypes of isolates from stream water are more biodiversity while comparing the Salmonella strains genotypes from drinking water sources. According to MST data, we can found the positive correlation between serovars and genotypes of Salmonella. Previous studies revealed that the result of Pulsed field gel electrophoresis (PFGE) method can predict the serovars of Salmonella strain. Hence, we used the MLST data combined phylogenetic analysis to identify the serovars of Salmonella strain and achieved effectiveness. While using the geographical data combined phylogenetic analysis, the result showed that the dominant strains were existed in whole stream area in rainy season. Keywords: Salmonella spp., MLST, phylogenetic analysis, PFGE
Study of Clinical Survival and Gene Expression in a Sample of Pancreatic Ductal Adenocarcinoma by Parsimony Phylogenetic Analysis.

PubMed

Nalbantoglu, Sinem; Abu-Asab, Mones; Tan, Ming; Zhang, Xuemin; Cai, Ling; Amri, Hakima

2016-07-01

Pancreatic ductal adenocarcinoma (PDAC) is one of the rapidly growing forms of pancreatic cancer with a poor prognosis and less than 5% 5-year survival rate. In this study, we characterized the genetic signatures and signaling pathways related to survival from PDAC, using a parsimony phylogenetic algorithm. We applied the parsimony phylogenetic algorithm to analyze the publicly available whole-genome in silico array analysis of a gene expression data set in 25 early-stage human PDAC specimens. We explain here that the parsimony phylogenetics is an evolutionary analytical method that offers important promise to uncover clonal (driver) and nonclonal (passenger) aberrations in complex diseases. In our analysis, parsimony and statistical analyses did not identify significant correlations between survival times and gene expression values. Thus, the survival rankings did not appear to be significantly different between patients for any specific gene (p > 0.05). Also, we did not find correlation between gene expression data and tumor stage in the present data set. While the present analysis was unable to identify in this relatively small sample of patients a molecular signature associated with pancreatic cancer prognosis, we suggest that future research and analyses with the parsimony phylogenetic algorithm in larger patient samples are worthwhile, given the devastating nature of pancreatic cancer and its early diagnosis, and the need for novel data analytic approaches. The future research practices might want to place greater emphasis on phylogenetics as one of the analytical paradigms, as our findings presented here are on the cusp of this shift, especially in the current era of Big Data and innovation policies advocating for greater data sharing and reanalysis.
A Deliberate Practice Approach to Teaching Phylogenetic Analysis

PubMed Central

Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.

2013-01-01

One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or “one-shot,” in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts. PMID:24297294
DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

PubMed

Kelly, Steven; Maini, Philip K

2013-01-01

The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.
Phylogenetic Information Content of Copepoda Ribosomal DNA Repeat Units: ITS1 and ITS2 Impact

PubMed Central

Zagoskin, Maxim V.; Lazareva, Valentina I.; Grishanin, Andrey K.; Mukha, Dmitry V.

2014-01-01

The utility of various regions of the ribosomal repeat unit for phylogenetic analysis was examined in 16 species representing four families, nine genera, and two orders of the subclass Copepoda (Crustacea). Fragments approximately 2000 bp in length containing the ribosomal DNA (rDNA) 18S and 28S gene fragments, the 5.8S gene, and the internal transcribed spacer regions I and II (ITS1 and ITS2) were amplified and analyzed. The DAMBE (Data Analysis in Molecular Biology and Evolution) software was used to analyze the saturation of nucleotide substitutions; this test revealed the suitability of both the 28S gene fragment and the ITS1/ITS2 rDNA regions for the reconstruction of phylogenetic trees. Distance (minimum evolution) and probabilistic (maximum likelihood, Bayesian) analyses of the data revealed that the 28S rDNA and the ITS1 and ITS2 regions are informative markers for inferring phylogenetic relationships among families of copepods and within the Cyclopidae family and associated genera. Split-graph analysis of concatenated ITS1/ITS2 rDNA regions of cyclopoid copepods suggested that the Mesocyclops, Thermocyclops, and Macrocyclops genera share complex evolutionary relationships. This study revealed that the ITS1 and ITS2 regions potentially represent different phylogenetic signals. PMID:25215300
Viral taxonomy needs a spring clean; its exploration era is over.

PubMed

Gibbs, Adrian J

2013-08-09

The International Committee on Taxonomy of Viruses has recently changed its approved definition of a viral species, and also discontinued work on its database of virus descriptions. These events indicate that the exploration era of viral taxonomy has ended; over the past century the principles of viral taxonomy have been established, the tools for phylogenetic inference invented, and the ultimate discriminatory data required for taxonomy, namely gene sequences, are now readily available. Further changes would make viral taxonomy more informative. First, the status of a 'taxonomic species' with an italicized name should only be given to viruses that are specifically linked with a single 'type genomic sequence' like those in the NCBI Reference Sequence Database. Secondly all approved taxa should be predominately monophyletic, and uninformative higher taxa disendorsed. These are 'quality assurance' measures and would improve the value of viral nomenclature to its users. The ICTV should also promote the use of a public database, such as Wikipedia, to replace the ICTV database as a store of the primary metadata of individual viruses, and should publish abstracts of the ICTV Reports in that database, so that they are 'Open Access'.
RECOVIR Software for Identifying Viruses

NASA Technical Reports Server (NTRS)

Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui

2013-01-01

Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.
GPCRdb: an information system for G protein-coupled receptors

PubMed Central

Isberg, Vignir; Mordalski, Stefan; Munk, Christian; Rataj, Krzysztof; Harpsøe, Kasper; Hauser, Alexander S.; Vroling, Bas; Bojarski, Andrzej J.; Vriend, Gert; Gloriam, David E.

2016-01-01

Recent developments in G protein-coupled receptor (GPCR) structural biology and pharmacology have greatly enhanced our knowledge of receptor structure-function relations, and have helped improve the scientific foundation for drug design studies. The GPCR database, GPCRdb, serves a dual role in disseminating and enabling new scientific developments by providing reference data, analysis tools and interactive diagrams. This paper highlights new features in the fifth major GPCRdb release: (i) GPCR crystal structure browsing, superposition and display of ligand interactions; (ii) direct deposition by users of point mutations and their effects on ligand binding; (iii) refined snake and helix box residue diagram looks; and (iii) phylogenetic trees with receptor classification colour schemes. Under the hood, the entire GPCRdb front- and back-ends have been re-coded within one infrastructure, ensuring a smooth browsing experience and development. GPCRdb is available at http://www.gpcrdb.org/ and it's open source code at https://bitbucket.org/gpcr/protwis. PMID:26582914
Proteome Studies of Filamentous Fungi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Scott E.; Panisko, Ellen A.

2011-04-20

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less

Comparative methods for the analysis of gene-expression evolution: an example using yeast functional genomic data.

PubMed

Oakley, Todd H; Gu, Zhenglong; Abouheif, Ehab; Patel, Nipam H; Li, Wen-Hsiung

2005-01-01

Understanding the evolution of gene function is a primary challenge of modern evolutionary biology. Despite an expanding database from genomic and developmental studies, we are lacking quantitative methods for analyzing the evolution of some important measures of gene function, such as gene-expression patterns. Here, we introduce phylogenetic comparative methods to compare different models of gene-expression evolution in a maximum-likelihood framework. We find that expression of duplicated genes has evolved according to a nonphylogenetic model, where closely related genes are no more likely than more distantly related genes to share common expression patterns. These results are consistent with previous studies that found rapid evolution of gene expression during the history of yeast. The comparative methods presented here are general enough to test a wide range of evolutionary hypotheses using genomic-scale data from any organism.
Liberomyces gen. nov. with two new species of endophytic coelomycetes from broadleaf trees.

PubMed

Pazoutová, Sylvie; Srutka, Petr; Holusa, Jaroslav; Chudícková, Milada; Kubátová, Alena; Kolarík, Miroslav

2012-01-01

During a study of endophytic and saprotrophic fungi in the sapwood and phloem of broadleaf trees (Salix alba, Quercus robur, Ulmus laevis, Alnus glutinosa, Betula pendula) fungi belonging to an anamorphic coelomycetous genus not attributable to a described taxon were detected and isolated in pure culture. The new genus, Liberomyces, with two species, L. saliciphilus and L. macrosporus, is described. Both species have subglobose conidiomata containing holoblastic sympodial conidiogenous cells. The conidiomata dehisce irregularly or by ostiole and secrete a slimy suspension of conidia. The conidia are hyaline, narrowly allantoid with a typically curved distal end. In L. macrosporus simultaneous production of synanamorph with thin filamentous conidia was observed occasionally. The genus has no known teleomorph. Related sequences in the public databases belong to endophytes of angiosperms. Phylogenetic analysis revealed a position close to the Xylariales (Sordariomycetes), but family and order affiliation remained unclear.
Opportunities and challenges for digital morphology

PubMed Central

2010-01-01

Advances in digital data acquisition, analysis, and storage have revolutionized the work in many biological disciplines such as genomics, molecular phylogenetics, and structural biology, but have not yet found satisfactory acceptance in morphology. Improvements in non-invasive imaging and three-dimensional visualization techniques, however, permit high-throughput analyses also of whole biological specimens, including museum material. These developments pave the way towards a digital era in morphology. Using sea urchins (Echinodermata: Echinoidea), we provide examples illustrating the power of these techniques. However, remote visualization, the creation of a specialized database, and the implementation of standardized, world-wide accepted data deposition practices prior to publication are essential to cope with the foreseeable exponential increase in digital morphological data. Reviewers This article was reviewed by Marc D. Sutton (nominated by Stephan Beck), Gonzalo Giribet (nominated by Lutz Walter), and Lennart Olsson (nominated by Purificación López-García). PMID:20604956
Prevalence of group A genotype human rotavirus among children with dirarrhea in Thailand, 2009-2011.

PubMed

Maiklang, Ornwalan; Vutithanachot, Viboonsak; Vutithanachot, Chanpim; Hacharoen, Pitchaya; Chieochansin, Thaweesak; Poovorawan, Yong

2012-07-01

Rotavirus is the most common cause of severe diarrhea in infants and young children world-wide, with the highest fatality rate in developing countries. We investigated the presence and seasonal distribution of group A rotavirus infection among Thai children. The data will be used for vaccine development. Samples were collected from infants and children with acute gastroenteritis or diarrhea admitted to two hospitals between June 2009 and May 2011. Group A rotaviruses were detected in 250 (44.5%) of 562 specimens by RT-PCR. The most prevalent genotype was G3P[8] (60.4%) followed by G1P[8] (39.2%) and G2P[4] (0.4%). The specimens were subjected to phylogenetic analysis based on the VP7 and VP4 genes. We examined the rotavirus genotypes and compared them with data from the GenBank database.
Proteome studies of filamentous fungi.

PubMed

Baker, Scott E; Panisko, Ellen A

2011-01-01

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.
Synthesizing and databasing fossil calibrations: divergence dating and beyond

PubMed Central

Ksepka, Daniel T.; Benton, Michael J.; Carrano, Matthew T.; Gandolfo, Maria A.; Head, Jason J.; Hermsen, Elizabeth J.; Joyce, Walter G.; Lamm, Kristin S.; Patané, José S. L.; Phillips, Matthew J.; Polly, P. David; Van Tuinen, Marcel; Ware, Jessica L.; Warnock, Rachel C. M.; Parham, James F.

2011-01-01

Divergence dating studies, which combine temporal data from the fossil record with branch length data from molecular phylogenetic trees, represent a rapidly expanding approach to understanding the history of life. National Evolutionary Synthesis Center hosted the first Fossil Calibrations Working Group (3–6 March, 2011, Durham, NC, USA), bringing together palaeontologists, molecular evolutionists and bioinformatics experts to present perspectives from disciplines that generate, model and use fossil calibration data. Presentations and discussions focused on channels for interdisciplinary collaboration, best practices for justifying, reporting and using fossil calibrations and roadblocks to synthesis of palaeontological and molecular data. Bioinformatics solutions were proposed, with the primary objective being a new database for vetted fossil calibrations with linkages to existing resources, targeted for a 2012 launch. PMID:21525049
YBYRÁ facilitates comparison of large phylogenetic trees.

PubMed

Machado, Denis Jacob

2015-07-01

The number and size of tree topologies that are being compared by phylogenetic systematists is increasing due to technological advancements in high-throughput DNA sequencing. However, we still lack tools to facilitate comparison among phylogenetic trees with a large number of terminals. The "YBYRÁ" project integrates software solutions for data analysis in phylogenetics. It comprises tools for (1) topological distance calculation based on the number of shared splits or clades, (2) sensitivity analysis and automatic generation of sensitivity plots and (3) clade diagnoses based on different categories of synapomorphies. YBYRÁ also provides (4) an original framework to facilitate the search for potential rogue taxa based on how much they affect average matching split distances (using MSdist). YBYRÁ facilitates comparison of large phylogenetic trees and outperforms competing software in terms of usability and time efficiency, specially for large data sets. The programs that comprises this toolkit are written in Python, hence they do not require installation and have minimum dependencies. The entire project is available under an open-source licence at http://www.ib.usp.br/grant/anfibios/researchSoftware.html .
SICLE: a high-throughput tool for extracting evolutionary relationships from phylogenetic trees.

PubMed

DeBlasio, Dan F; Wisecaver, Jennifer H

2016-01-01

We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4,589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.
A review of criticisms of phylogenetic nomenclature: is taxonomic freedom the fundamental issue?

PubMed

Bryant, Harold N; Cantino, Philip D

2002-02-01

The proposal to implement a phylogenetic nomenclatural system governed by the PhyloCode), in which taxon names are defined by explicit reference to common descent, has met with strong criticism from some proponents of phylogenetic taxonomy (taxonomy based on the principle of common descent in which only clades and species are recognized). We examine these criticisms and find that some of the perceived problems with phylogenetic nomenclature are based on misconceptions, some are equally true of the current rank-based nomenclatural system, and some will be eliminated by implementation of the PhyloCode. Most of the criticisms are related to an overriding concern that, because the meanings of names are associated with phylogenetic pattern which is subject to change, the adoption of phylogenetic nomenclature will lead to increased instability in the content of taxa. This concern is associated with the fact that, despite the widespread adoption of the view that taxa are historical entities that are conceptualized based on ancestry, many taxonomists also conceptualize taxa based on their content. As a result, critics of phylogenetic nomenclature have argued that taxonomists should be free to emend the content of taxa without constraints imposed by nomenclatural decisions. However, in phylogenetic nomenclature the contents of taxa are determined, not by the taxonomist, but by the combination of the phylogenetic definition of the name and a phylogenetic hypothesis. Because the contents of taxa, once their names are defined, can no longer be freely modified by taxonomists, phylogenetic nomenclature is perceived as limiting taxonomic freedom. We argue that the form of taxonomic freedom inherent to phylogenetic nomenclature is appropriate to phylogenetic taxonomy in which taxa are considered historical entities that are discovered through phylogenetic analysis and are not human constructs.
Phylogenomic Reconstruction of the Oomycete Phylogeny Derived from 37 Genomes

PubMed Central

McCarthy, Charley G. P.

2017-01-01

ABSTRACT The oomycetes are a class of microscopic, filamentous eukaryotes within the Stramenopiles-Alveolata-Rhizaria (SAR) supergroup which includes ecologically significant animal and plant pathogens, most infamously the causative agent of potato blight Phytophthora infestans. Single-gene and concatenated phylogenetic studies both of individual oomycete genera and of members of the larger class have resulted in conflicting conclusions concerning species phylogenies within the oomycetes, particularly for the large Phytophthora genus. Genome-scale phylogenetic studies have successfully resolved many eukaryotic relationships by using supertree methods, which combine large numbers of potentially disparate trees to determine evolutionary relationships that cannot be inferred from individual phylogenies alone. With a sufficient amount of genomic data now available, we have undertaken the first whole-genome phylogenetic analysis of the oomycetes using data from 37 oomycete species and 6 SAR species. In our analysis, we used established supertree methods to generate phylogenies from 8,355 homologous oomycete and SAR gene families and have complemented those analyses with both phylogenomic network and concatenated supermatrix analyses. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and individual clades within the problematic Phytophthora genus. Support for the resolution of the inferred relationships between individual Phytophthora clades varies depending on the methodology used. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes. IMPORTANCE The oomycetes are a class of eukaryotes and include ecologically significant animal and plant pathogens. Single-gene and multigene phylogenetic studies of individual oomycete genera and of members of the larger classes have resulted in conflicting conclusions concerning interspecies relationships among these species, particularly for the Phytophthora genus. The onset of next-generation sequencing techniques now means that a wealth of oomycete genomic data is available. For the first time, we have used genome-scale phylogenetic methods to resolve oomycete phylogenetic relationships. We used supertree methods to generate single-gene and multigene species phylogenies. Overall, our supertree analyses utilized phylogenetic data from 8,355 oomycete gene families. We have also complemented our analyses with superalignment phylogenies derived from 131 single-copy ubiquitous gene families. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and clades. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes. PMID:28435885
Development and application of a phylogenomic toolkit: Resolving the evolutionary history of Madagascar’s lemurs

PubMed Central

Horvath, Julie E.; Weisrock, David W.; Embry, Stephanie L.; Fiorentino, Isabella; Balhoff, James P.; Kappeler, Peter; Wray, Gregory A.; Willard, Huntington F.; Yoder, Anne D.

2008-01-01

Lemurs and the other strepsirrhine primates are of great interest to the primate genomics community due to their phylogenetic placement as the sister lineage to all other primates. Previous attempts to resolve the phylogeny of lemurs employed limited mitochondrial or small nuclear data sets, with many relationships poorly supported or entirely unresolved. We used genomic resources to develop 11 novel markers from nine chromosomes, representing ∼9 kb of nuclear sequence data. In combination with previously published nuclear and mitochondrial loci, this yields a data set of more than 16 kb and adds ∼275 kb of DNA sequence to current databases. Our phylogenetic analyses confirm hypotheses of lemuriform monophyly and provide robust resolution of the phylogenetic relationships among the five lemuriform families. We verify that the genus Daubentonia is the sister lineage to all other lemurs. The Cheirogaleidae and Lepilemuridae are sister taxa and together form the sister lineage to the Indriidae; this clade is the sister lineage to the Lemuridae. Divergence time estimates indicate that lemurs are an ancient group, with their initial diversification occurring around the Cretaceous-Tertiary boundary. Given the power of this data set to resolve branches in a notoriously problematic area of primate phylogeny, we anticipate that our phylogenomic toolkit will be of value to other studies of primate phylogeny and diversification. Moreover, the methods applied will be broadly applicable to other taxonomic groups where phylogenetic relationships have been notoriously difficult to resolve. PMID:18245770
PhyLIS: a simple GNU/Linux distribution for phylogenetics and phyloinformatics.

PubMed

Thomson, Robert C

2009-07-30

PhyLIS is a free GNU/Linux distribution that is designed to provide a simple, standardized platform for phylogenetic and phyloinformatic analysis. The operating system incorporates most commonly used phylogenetic software, which has been pre-compiled and pre-configured, allowing for straightforward application of phylogenetic methods and development of phyloinformatic pipelines in a stable Linux environment. The software is distributed as a live CD and can be installed directly or run from the CD without making changes to the computer. PhyLIS is available for free at http://www.eve.ucdavis.edu/rcthomson/phylis/.
PhyLIS: A Simple GNU/Linux Distribution for Phylogenetics and Phyloinformatics

PubMed Central

Thomson, Robert C.

2009-01-01

PhyLIS is a free GNU/Linux distribution that is designed to provide a simple, standardized platform for phylogenetic and phyloinformatic analysis. The operating system incorporates most commonly used phylogenetic software, which has been pre-compiled and pre-configured, allowing for straightforward application of phylogenetic methods and development of phyloinformatic pipelines in a stable Linux environment. The software is distributed as a live CD and can be installed directly or run from the CD without making changes to the computer. PhyLIS is available for free at http://www.eve.ucdavis.edu/rcthomson/phylis/. PMID:19812729
A study on the characterization of Propionibacterium acnes isolated from ocular clinical specimens.

PubMed

Sowmiya, Murali; Malathi, Jambulingam; Swarnali, Sen; Priya, Jeyavel Padma; Therese, Kulandai Lily; Madhavan, Hajib N

2015-10-01

There are only a few reports available on characterization of Propionibacterium acnes isolated from various ocular clinical specimens. We undertook this study to evaluate the role of P. acnes in ocular infections and biofilm production, and also do the phylogenetic analysis of the bacilli. One hundred isolates of P. acnes collected prospectively from ocular clinical specimens at a tertiary care eye hospital between January 2010 and December 2011, were studied for their association with various ocular disease conditions. The isolates were also subjected to genotyping and phylogenetic analysis, and were also tested for their ability to produce biofilms. Among preoperative conjunctival swabs, P. acnes was a probably significant pathogen in one case; a possibly significant pathogen in two cases. In other clinical conditions, 13 per cent isolates were probably significant pathogens and 38 per cent as possibly significant pathogens. The analysis of 16S rRNA gene revealed four different phylogenies whereas analysis of recA gene showed two phylogenies confirming that recA gene was more reliable than 16S rRNA with less sequence variation. Results of polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) had 100 per cent concordance with phylogenetic results. No association was seen between P. acnes subtypes and biofilm production. RecA gene phylogenetic studies revealed two different phylogenies. RFLP technique was found to be cost-effective with high sensitivity and specificity in phylogenetic analysis. No association between P. acnes subtypes and pathogenetic ability was observed. Biofilm producing isolates showed increased antibiotic resistance compared with non-biofilm producing isolates.
PCOGR: phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms.

PubMed

Meereis, Florian; Kaufmann, Michael

2004-10-15

The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.
Different evolutionary trajectories of vaccine-controlled and non-controlled avian infectious bronchitis viruses in commercial poultry

PubMed Central

Lee, Dong-Hun

2017-01-01

To determine the genetic and epidemiological relationship of infectious bronchitis virus (IBV) isolates from commercial poultry to attenuated live IBV vaccines we conducted a phylogenetic network analysis on the full-length S1 sequence for Arkansas (Ark), Massachusetts (Mass) and Delmarva/1639 (DMV/1639) type viruses isolated in 2015 from clinical cases by 3 different diagnostic laboratories. Phylogenetic network analysis of Ark isolates showed two predominant groups linked by 2 mutations, consistent with subpopulations found in commercial vaccines for this IBV type. In addition, a number of satellite groups surrounding the two predominant populations were observed for the Ark type virus, which is likely due to mutations associated with the nature of this vaccine to persist in flocks. The phylogenetic network analysis of Mass-type viruses shows two groupings corresponding to different manufacturers vaccine sequences. No satellite groups were observed for Mass-type viruses, which is consistent with no persistence of this vaccine type in the field. At the time of collection, no vaccine was being used for the DMV/1639 type viruses and phylogenetic network analysis showed a dispersed network suggesting no clear change in genetic distribution. Selection pressure analysis showed that the DMV/1639 and Mass-type strains were evolving under negative selection, whereas the Ark type viruses had evolved under positive selection. This data supports the hypothesis that live attenuated vaccine usage does play a role in the genetic profile of similar IB viruses in the field and phylogenetic network analysis can be used to identify vaccine and vaccine origin isolates, which is important for our understanding of the role live vaccines play in the evolutionary trajectory of those viruses. PMID:28472110
MultitaskProtDB: a database of multitasking proteins.

PubMed

Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

2014-01-01

We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.
rrndb: the Ribosomal RNA Operon Copy Number Database

PubMed Central

Klappenbach, Joel A.; Saxman, Paul R.; Cole, James R.; Schmidt, Thomas M.

2001-01-01

The Ribosomal RNA Operon Copy Number Database (rrndb) is an Internet-accessible database containing annotated information on rRNA operon copy number among prokaryotes. Gene redundancy is uncommon in prokaryotic genomes, yet the rRNA genes can vary from one to as many as 15 copies. Despite the widespread use of 16S rRNA gene sequences for identification of prokaryotes, information on the number and sequence of individual rRNA genes in a genome is not readily accessible. In an attempt to understand the evolutionary implications of rRNA operon redundancy, we have created a phylogenetically arranged report on rRNA gene copy number for a diverse collection of prokaryotic microorganisms. Each entry (organism) in the rrndb contains detailed information linked directly to external websites including the Ribosomal Database Project, GenBank, PubMed and several culture collections. Data contained in the rrndb will be valuable to researchers investigating microbial ecology and evolution using 16S rRNA gene sequences. The rrndb web site is directly accessible on the WWW at http://rrndb.cme.msu.edu. PMID:11125085
A case of methicillin-resistant Staphylococcus aureus wound infection: phylogenetic analysis to establish if nosocomial or community acquired.

PubMed

Cancilleri, Francesco; Ciccozzi, Massimo; Fogolari, Marta; Cella, Eleonora; De Florio, Lucia; Berton, Alessandra; Salvatore, Giuseppe; Dicuonzo, Giordano; Spoto, Silvia; Denaro, Vincenzo; Angeletti, Silvia

2018-05-01

Methicillin-resistant Staphylococcus aureus (MRSA) infection is rapidly increasing in both hospital and community settings. A 71-year-old man admitted at the Department of Orthopaedics and Trauma Surgery, University Campus Bio-Medico of Rome, with MRSA wound infection consequent to orthopedic surgery was studied and the MRSA transmission evaluated by phylogenetic analysis.
Complete mitochondrial genome of Cuora trifasciata (Chinese three-striped box turtle), and a comparative analysis with other box turtles.

PubMed

Li, Wei; Zhang, Xin-Cheng; Zhao, Jian; Shi, Yan; Zhu, Xin-Ping

2015-01-25

Cuora trifasciata has become one of the most critically endangered species in the world. The complete mitochondrial genome of C. trifasciata (Chinese three-striped box turtle) was determined in this study. Its mitochondrial genome is a 16,575-bp-long circular molecule that consists of 37 genes that are typically found in other vertebrates. And the basic characteristics of the C. trifasciata mitochondrial genome were also determined. Moreover, a comparison of C. trifasciata with Cuora cyclornata, Cuora pani and Cuora aurocapitata indicated that the four mitogenomics differed in length, codons, overlaps, 13 protein-coding genes (PCGs), ND3, rRNA genes, control region, and other aspects. Phylogenetic analysis with Bayesian inference and maximum likelihood based on 12 protein-coding genes of the genus Cuora indicated the phylogenetic position of C. trifasciata within Cuora. The phylogenetic analysis also showed that C. trifasciata from Vietnam and China formed separate monophyletic clades with different Cuora species. The results of nucleotide base compositions, protein-coding genes and phylogenetic analysis showed that C. trifasciata from these two countries may represent different Cuora species. Copyright © 2014 Elsevier B.V. All rights reserved.

A Phylogenetic and Phenotypic Analysis of Salmonella enterica Serovar Weltevreden, an Emerging Agent of Diarrheal Disease in Tropical Regions

PubMed Central

Makendi, Carine; Page, Andrew J.; Wren, Brendan W.; Le Thi Phuong, Tu; Clare, Simon; Hale, Christine; Goulding, David; Klemm, Elizabeth J.; Pickard, Derek; Okoro, Chinyere; Hunt, Martin; Thompson, Corinne N.; Phu Huong Lan, Nguyen; Tran Do Hoang, Nhu; Thwaites, Guy E.; Le Hello, Simon; Brisabois, Anne; Weill, François-Xavier; Baker, Stephen; Dougan, Gordon

2016-01-01

Salmonella enterica serovar Weltevreden (S. Weltevreden) is an emerging cause of diarrheal and invasive disease in humans residing in tropical regions. Despite the regional and international emergence of this Salmonella serovar, relatively little is known about its genetic diversity, genomics or virulence potential in model systems. Here we used whole genome sequencing and bioinformatics analyses to define the phylogenetic structure of a diverse global selection of S. Weltevreden. Phylogenetic analysis of more than 100 isolates demonstrated that the population of S. Weltevreden can be segregated into two main phylogenetic clusters, one associated predominantly with continental Southeast Asia and the other more internationally dispersed. Subcluster analysis suggested the local evolution of S. Weltevreden within specific geographical regions. Four of the isolates were sequenced using long read sequencing to produce high quality reference genomes. Phenotypic analysis in Hep-2 cells and in a murine infection model indicated that S. Weltevreden were significantly attenuated in these models compared to the classical S. Typhimurium reference strain SL1344. Our work outlines novel insights into this important emerging pathogen and provides a baseline understanding for future research studies. PMID:26867150
A gateway for phylogenetic analysis powered by grid computing featuring GARLI 2.0.

PubMed

Bazinet, Adam L; Zwickl, Derrick J; Cummings, Michael P

2014-09-01

We introduce molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing. The gateway features a garli 2.0 web service that enables a user to quickly and easily submit thousands of maximum likelihood tree searches or bootstrap searches that are executed in parallel on distributed computing resources. The garli web service allows one to easily specify partitioned substitution models using a graphical interface, and it performs sophisticated post-processing of phylogenetic results. Although the garli web service has been used by the research community for over three years, here we formally announce the availability of the service, describe its capabilities, highlight new features and recent improvements, and provide details about how the grid system efficiently delivers high-quality phylogenetic results. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
On the phylogenetic placement of human T cell leukemia virus type 1 sequences associated with an Andean mummy.

PubMed

Coulthart, Michael B; Posada, David; Crandall, Keith A; Dekaban, Gregory A

2006-03-01

Recently, the putative finding of ancient human T cell leukemia virus type 1 (HTLV-1) long terminal repeat (LTR) DNA sequences in association with a 1500-year-old Chilean mummy has stirred vigorous debate. The debate is based partly on the inherent uncertainties associated with phylogenetic reconstruction when only short sequences of closely related genotypes are available. However, a full analysis of what phylogenetic information is present in the mummy data has not previously been published, leaving open the question of what precisely is the range of admissible interpretation. To fulfill this need, we re-analyzed the mummy data in a new way. We first performed phylogenetic analysis of 188 published LTR DNA sequences from extant strains belonging to the HTLV-1 Cosmopolitan clade, using the method of statistical parsimony which is designed both to optimize phylogenetic resolution among sequences with little evolutionary divergence, and to permit precise mapping of individual sequence mutations onto branches of a divergence network. We then deduced possible phylogenetic positions for the two main categories of published Chilean mummy sequences, based on their published 157-nucleotide LTR sequences. The possible phylogenetic placements for one of the mummy sequence categories are consistent with a modern origin. However, one of these placements for the other mummy sequence category falls very close to the root of the Cosmopolitan clade, consistent with an ancient origin for both this mummy sequence and the Cosmopolitan clade.
Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis

PubMed Central

Wen, Dingqiao; Yu, Yun; Hahn, Matthew W.; Nakhleh, Luay

2016-01-01

The role of hybridization and subsequent introgression has been demonstrated in an increasing number of species. Recently, Fontaine et al. (Science, 347, 2015, 1258524) conducted a phylogenomic analysis of six members of the Anopheles gambiae species complex. Their analysis revealed a reticulate evolutionary history and pointed to extensive introgression on all four autosomal arms. The study further highlighted the complex evolutionary signals that the co-occurrence of incomplete lineage sorting (ILS) and introgression can give rise to in phylogenomic analyses. While tree-based methodologies were used in the study, phylogenetic networks provide a more natural model to capture reticulate evolutionary histories. In this work, we reanalyse the Anopheles data using a recently devised framework that combines the multispecies coalescent with phylogenetic networks. This framework allows us to capture ILS and introgression simultaneously, and forms the basis for statistical methods for inferring reticulate evolutionary histories. The new analysis reveals a phylogenetic network with multiple hybridization events, some of which differ from those reported in the original study. To elucidate the extent and patterns of introgression across the genome, we devise a new method that quantifies the use of reticulation branches in the phylogenetic network by each genomic region. Applying the method to the mosquito data set reveals the evolutionary history of all the chromosomes. This study highlights the utility of ‘network thinking’ and the new insights it can uncover, in particular in phylogenomic analyses of large data sets with extensive gene tree incongruence. PMID:26808290
Horizontal gene transfer of acetyltransferases, invertases and chorismate mutases from different bacteria to diverse recipients.

PubMed

Noon, Jason B; Baum, Thomas J

2016-04-12

Hoplolaimina plant-parasitic nematodes (PPN) are a lineage of animals with many documented cases of horizontal gene transfer (HGT). In a recent study, we reported on three likely HGT candidate genes in the soybean cyst nematode Heterodera glycines, all of which encode secreted candidate effectors with putative functions in the host plant. Hg-GLAND1 is a putative GCN5-related N-acetyltransferase (GNAT), Hg-GLAND13 is a putative invertase (INV), and Hg-GLAND16 is a putative chorismate mutase (CM), and blastp searches of the non-redundant database resulted in highest similarity to bacterial sequences. Here, we searched nematode and non-nematode sequence databases to identify all the nematodes possible that contain these three genes, and to formulate hypotheses about when they most likely appeared in the phylum Nematoda. We then performed phylogenetic analyses combined with model selection tests of alternative models of sequence evolution to determine whether these genes were horizontally acquired from bacteria. Mining of nematode sequence databases determined that GNATs appeared in Hoplolaimina PPN late in evolution, while both INVs and CMs appeared before the radiation of the Hoplolaimina suborder. Also, Hoplolaimina GNATs, INVs and CMs formed well-supported clusters with different rhizosphere bacteria in the phylogenetic trees, and the model selection tests greatly supported models of HGT over descent via common ancestry. Surprisingly, the phylogenetic trees also revealed additional, well-supported clusters of bacterial GNATs, INVs and CMs with diverse eukaryotes and archaea. There were at least eleven and eight well-supported clusters of GNATs and INVs, respectively, from different bacteria with diverse eukaryotes and archaea. Though less frequent, CMs from different bacteria formed supported clusters with multiple different eukaryotes. Moreover, almost all individual clusters containing bacteria and eukaryotes or archaea contained species that inhabit very similar niches. GNATs were horizontally acquired late in Hoplolaimina PPN evolution from bacteria most similar to the saprophytic and plant-pathogenic actinomycetes. INVs and CMs were horizontally acquired from bacteria most similar to rhizobacteria and Burkholderia soil bacteria, respectively, before the radiation of Hoplolaimina. Also, these three gene groups appear to have been frequent subjects of HGT from different bacteria to numerous, diverse lineages of eukaryotes and archaea, which suggests that these genes may confer important evolutionary advantages to many taxa. In the case of Hoplolaimina PPN, this advantage likely was an improved ability to parasitize plants.
[Genome-wide identification and expression analysis of the WRKY gene family in peach].

PubMed

Gu, Yan-bing; Ji, Zhi-rui; Chi, Fu-mei; Qiao, Zhuang; Xu, Cheng-nan; Zhang, Jun-xiang; Zhou, Zong-shan; Dong, Qing-long

2016-03-01

The WRKY transcription factors are one of the largest families of transcriptional regulators and play diverse regulatory roles in biotic and abiotic stresses, plant growth and development processes. In this study, the WRKY DNA-binding domain (Pfam Database number: PF03106) downloaded from Pfam protein families database was exploited to identify WRKY genes from the peach (Prunus persica 'Lovell') genome using HMMER 3.0. The obtained amino acid sequences were analyzed with DNAMAN 5.0, WebLogo 3, MEGA 5.1, MapInspect and MEME bioinformatics softwares. Totally 61 peach WRKY genes were found in the peach genome. Our phylogenetic analysis revealed that peach WRKY genes were classified into three Groups: Ⅰ, Ⅱ and Ⅲ. The WRKY N-terminal and C-terminal domains of Group Ⅰ (group I-N and group I-C) were monophyletic. The Group Ⅱ was sub-divided into five distinct clades (groupⅡ-a, Ⅱ-b, Ⅱ-c, Ⅱ-d and Ⅱ-e). Our domain analysis indicated that the WRKY regions contained a highly conserved heptapeptide stretch WRKYGQK at its N-terminus followed by a zinc-finger motif. The chromosome mapping analysis showed that peach WRKY genes were distributed with different densities over 8 chromosomes. The intron-exon structure analysis revealed that structures of the WRKY gene were highly conserved in the peach. The conserved motif analysis showed that the conserved motifs 1, 2 and 3, which specify the WRKY domain, were observed in all peach WRKY proteins, motif 5 as the unknown domain was observed in group Ⅱ-d, two WRKY domains were assigned to GroupⅠ. SqRT-PCR and qRT-PCR results indicated that 16 PpWRKY genes were expressed in roots, stems, leaves, flowers and fruits at various expression levels. Our analysis thus identified the PpWRKY gene families, and future functional studies are needed to reveal its specific roles.
SigTree: A Microbial Community Analysis Tool to Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree.

PubMed

Stevens, John R; Jones, Todd R; Lefevre, Michael; Ganesan, Balasubramanian; Weimer, Bart C

2017-01-01

Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree , a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.
The utility of DNA sequences of an intron from the beta-fibrinogen gene in phylogenetic analysis of woodpeckers (Aves: Picidae).

PubMed

Prychitko, T M; Moore, W S

1997-10-01

Estimating phylogenies from DNA sequence data has become the major methodology of molecular phylogenetics. To date, molecular phylogenetics of the vertebrates has been very dependent on mtDNA, but studies involving mtDNA are limited because the several genes comprising the mt-genome are inherited as a single linkage group. The only apparent solution to this problem is to sequence additional genes, each representing a distinct linkage group, so that the resultant gene trees provide independent estimates of the species tree. There exists the need to find novel gene sequences which contain enough phylogenetic information to resolve relationships between closely related species. A possible source is the nuclear-encoded introns, because they evolve more rapidly than exons. We designed primers to amplify and sequence the 7 intron from the beta-fibrinogen gene for a recently evolved group, the woodpeckers. We sequenced the entire intron for 10 specimens representing five species. Nucleotide substitutions are randomly distributed along the length of the intron, suggesting selective neutrality. A preliminary analysis indicates that the phylogenetic signal in the intron is as strong as that in the mitochondrial encoded cytochrome b (cyt b) gene. The topology of the beta-fibrinogen tree is identical to that of the cyt b tree. This analysis demonstrates the ability of the 7 intron of beta-fibrinogen to provide well resolved, independent gene trees for recently evolved groups and establishes it as a source of sequences to be used in other phylogenetic studies. Copyright 1997 Academic Press
Dermatophilus congolensis infection in sheep and goats in Delta region of Tamil Nadu.

PubMed

Chitra, M Ananda; Jayalakshmi, K; Ponnusamy, P; Manickam, R; Ronald, B S M

2017-11-01

The study was conducted to isolate and identify Dermatophilus congolensis (DC) using conventional and molecular diagnostic techniques in scab materials collected from skin infections of sheep and goats in the Delta region of Tamil Nadu. A total of 20 scab samples collected from 18 goats and 2 sheep from Nagapattinam, Thanjavur, and Tiruvarur districts of Tamil Nadu. Smears were made from softened scab materials and stained by either Gram's or Giemsa staining. Isolation was attempted on blood agar plates, and colonies were stained by Gram's staining for morphological identification. Identification was also done by biochemical tests and confirmed by 16S rRNA polymerase chain reaction (PCR), followed by sequencing and phylogenetic analysis of the amplified product. The peculiar laddering arrangement of coccoid forms in stained smears prepared from scab materials revealed the presence of DC. Isolated colonies from scab materials of sheep and goats on bovine blood agar plate were small, hemolytic, rough, adherent, and bright orange-yellow in color, but some colonies were white to cream color. Gram-staining of cultured organisms revealed Gram-positive branching filaments with various disintegration stages of organisms. 16S rRNA PCR yielded 500 bp amplicon specific for DC. Sequence analysis of a sheep DC isolate showed 99-100% sequence homology with other DC isolates available in NCBI database, and phylogenetic tree showed a close cluster with DC isolates of Congo, Nigeria, and Angola of Africa. Genes for virulence factors such as serine protease and alkaline ceramidase could not be detected by PCR in any of the DC strains isolated of this study. The presence of dermatophilosis in Tamil Nadu was established from this study.
Dermatophilus congolensis infection in sheep and goats in Delta region of Tamil Nadu

PubMed Central

Chitra, M. Ananda; Jayalakshmi, K.; Ponnusamy, P.; Manickam, R.; Ronald, B. S. M.

2017-01-01

Aim: The study was conducted to isolate and identify Dermatophilus congolensis (DC) using conventional and molecular diagnostic techniques in scab materials collected from skin infections of sheep and goats in the Delta region of Tamil Nadu. Materials and Methods: A total of 20 scab samples collected from 18 goats and 2 sheep from Nagapattinam, Thanjavur, and Tiruvarur districts of Tamil Nadu. Smears were made from softened scab materials and stained by either Gram’s or Giemsa staining. Isolation was attempted on blood agar plates, and colonies were stained by Gram’s staining for morphological identification. Identification was also done by biochemical tests and confirmed by 16S rRNA polymerase chain reaction (PCR), followed by sequencing and phylogenetic analysis of the amplified product. Results: The peculiar laddering arrangement of coccoid forms in stained smears prepared from scab materials revealed the presence of DC. Isolated colonies from scab materials of sheep and goats on bovine blood agar plate were small, hemolytic, rough, adherent, and bright orange-yellow in color, but some colonies were white to cream color. Gram-staining of cultured organisms revealed Gram-positive branching filaments with various disintegration stages of organisms. 16S rRNA PCR yielded 500 bp amplicon specific for DC. Sequence analysis of a sheep DC isolate showed 99-100% sequence homology with other DC isolates available in NCBI database, and phylogenetic tree showed a close cluster with DC isolates of Congo, Nigeria, and Angola of Africa. Genes for virulence factors such as serine protease and alkaline ceramidase could not be detected by PCR in any of the DC strains isolated of this study. Conclusion: The presence of dermatophilosis in Tamil Nadu was established from this study. PMID:29263591
Molecular characterization of an α-N-acetylgalactosaminidase from Clonorchis sinensis.

PubMed

Lee, Myoung-Ro; Yoo, Won Gi; Kim, Yu-Jung; Kim, Dae-Won; Cho, Shin-Hyeong; Hwang, Kwang Yeon; Ju, Jung-Won; Lee, Won-Ja

2012-11-01

The α-N-acetylgalactosaminidase (α-NAGAL) is an exoglycosidase that selectively cleaves terminal α-linked N-acetylgalactosamines from a variety of sugar chains. A complementary DNA (cDNA) clone encoding a novel Clonorchis sinensis α-NAGAL (Cs-α-NAGAL) was identified in the expressed sequence tags database of the adult C. sinensis liver fluke. The complete coding sequence was 1,308 bp long and encoded a 436-residue protein. The selected glycosidase was manually curated as α-NAGAL (EC 3.2.1.49) based on a composite bioinformatics analysis including a search for orthologues, comparative structure modeling, and the generation of a phylogenetic tree. One orthologue of Cs-α-NAGAL was the Rattus norvegicus α-NAGAL (accession number: NP_001012120) that does not exist in C. sinensis. Cs-α-NAGAL belongs to the GH27 family and the GH-D clan. A phylogenetic analysis revealed that the GH27 family of Cs-α-NAGAL was distinct from GH31 and GH36 within the GH-D clan. The putative 3D structure of Cs-α-NAGAL was built using SWISS-MODEL with a Gallus gallus α-NAGAL template (PDB code 1ktb chain A); this model demonstrated the superimposition of a TIM barrel fold (α/β) structure and substrate binding pocket. Cs-α-NAGAL transcripts were detected in the adult worm and egg cDNA libraries of C. sinensis but not in the metacercaria. Recombinant Cs-α-NAGAL (rCs-α-NAGAL) was expressed in Escherichia coli, and the purified rCs-α-NAGAL was recognized specifically by the C. sinensis-infected human sera. This is the first report of an α-NAGAL protein in the Trematode class, suggesting that it is a potential diagnostic or vaccine candidate with strong antigenicity.
Partial characterization of normal and Haemophilus influenzae-infected mucosal complementary DNA libraries in chinchilla middle ear mucosa.

PubMed

Kerschner, Joseph E; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J Christopher; Ehrlich, Garth D

2010-04-01

We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription-polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis.
Partial Characterization of Normal and Haemophilus influenzae–Infected Mucosal Complementary DNA Libraries in Chinchilla Middle Ear Mucosa

PubMed Central

Kerschner, Joseph E.; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J. Christopher; Ehrlich, Garth D.

2010-01-01

Objectives We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Methods Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription–polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Results Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Conclusions Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis. PMID:20433028
Transcriptome Analysis of the Chrysanthemum Foliar Nematode, Aphelenchoides ritzemabosi (Aphelenchida: Aphelenchoididae)

PubMed Central

Li, Jun-Yi; Xie, Hui; Xu, Chun-Ling; Li, Yu

2016-01-01

The chrysanthemum foliar nematode (CFN), Aphelenchoides ritzemabosi, is a plant parasitic nematode that attacks many plants. In this study, a transcriptomes of mixed-stage population of CFN was sequenced on the Illumina HiSeq 2000 platform. 68.10 million Illumina high quality paired end reads were obtained which generated 26,817 transcripts with a mean length of 1,032 bp and an N50 of 1,672 bp, of which 16,467 transcripts were annotated against six databases. In total, 20,311 coding region sequences (CDS), 495 simple sequence repeats (SSRs) and 8,353 single-nucleotide polymorphisms (SNPs) were predicted, respectively. The CFN with the most shared sequences was B. xylophilus with 16,846 (62.82%) common transcripts and 10,543 (39.31%) CFN transcripts matched sequences of all of four plant parasitic nematodes compared. A total of 111 CFN transcripts were predicted as homologues of 7 types of carbohydrate-active enzymes (CAZymes) with plant/fungal cell wall-degrading activities, fewer transcripts were predicted as homologues of plant cell wall-degrading enzymes than fungal cell wall-degrading enzymes. The phylogenetic analysis of GH5, GH16, GH43 and GH45 proteins between CFN and other organisms showed CFN and other nematodes have a closer phylogenetic relationship. In the CFN transcriptome, sixteen types of genes orthologues with seven classes of protein families involved in the RNAi pathway in C. elegans were predicted. This research provides comprehensive gene expression information at the transcriptional level, which will facilitate the elucidation of the molecular mechanisms of CFN and the distribution of gene functions at the macro level, potentially revealing improved methods for controlling CFN. PMID:27875578
Metatranscriptomics of the marine sponge Geodia barretti: tackling phylogeny and function of its microbial community.

PubMed

Radax, Regina; Rattei, Thomas; Lanzen, Anders; Bayer, Christoph; Rapp, Hans Tore; Urich, Tim; Schleper, Christa

2012-05-01

Geodia barretti is a marine cold-water sponge harbouring high numbers of microorganisms. Significant rates of nitrification have been observed in this sponge, indicating a substantial contribution to nitrogen turnover in marine environments with high sponge cover. In order to get closer insights into the phylogeny and function of the active microbial community and the interaction with its host G. barretti, a metatranscriptomic approach was employed, using the simultaneous analysis of rRNA and mRNA. Of the 262 298 RNA-tags obtained by pyrosequencing, 92% were assigned to ribosomal RNA (ribo-tags). A total of 109 325 SSU rRNA ribo-tags revealed a detailed picture of the community, dominated by group SAR202 of Chloroflexi, candidate phylum Poribacteria and Acidobacteria, which was different in its composition from that obtained in clone libraries prepared form the same samples. Optimized assembly strategies allowed the reconstruction of full-length rRNA sequences from the short ribo-tags for more detailed phylogenetic studies of the dominant taxa. Cells of several phyla were visualized by FISH analyses for confirmation. Of the remaining 21 325 RNA-tags, 10 023 were assigned to mRNA-tags, based on similarities to genes in the databases. A wide range of putative functional gene transcripts from over 10 different phyla were identified among the bacterial mRNA-tags. The most abundant mRNAs were those encoding key metabolic enzymes of nitrification from ammonia-oxidizing archaea as well as candidate genes involved in related processes. Our analysis demonstrates the potential and limits of using a combined rRNA and mRNA approach to explore the microbial community profile, phylogenetic assignments and metabolic activities of a complex, but little explored microbial community. © 2012 Society for Applied Microbiology and Blackwell Publishing Ltd.
A MLVA Genotyping Scheme for Global Surveillance of the Citrus Pathogen Xanthomonas citri pv. citri Suggests a Worldwide Geographical Expansion of a Single Genetic Lineage

PubMed Central

Boyer, Karine; Leduc, Alice; Tourterel, Christophe; Drevet, Christine; Ravigné, Virginie; Gagnevin, Lionel; Guérin, Fabien; Chiroleu, Frédéric; Koebnik, Ralf; Verdier, Valérie; Vernière, Christian

2014-01-01

MultiLocus Variable number of tandem repeat Analysis (MLVA) has been extensively used to examine epidemiological and evolutionary issues on monomorphic human pathogenic bacteria, but not on bacterial plant pathogens of agricultural importance albeit such tools would improve our understanding of their epidemiology, as well as of the history of epidemics on a global scale. Xanthomonas citri pv. citri is a quarantine organism in several countries and a major threat for the citrus industry worldwide. We screened the genomes of Xanthomonas citri pv. citri strain IAPAR 306 and of phylogenetically related xanthomonads for tandem repeats. From these in silico data, an optimized MLVA scheme was developed to assess the global diversity of this monomorphic bacterium. Thirty-one minisatellite loci (MLVA-31) were selected to assess the genetic structure of 129 strains representative of the worldwide pathological and genetic diversity of X. citri pv. citri. Based on Discriminant Analysis of Principal Components (DAPC), four pathotype-specific clusters were defined. DAPC cluster 1 comprised strains that were implicated in the major geographical expansion of X. citri pv. citri during the 20th century. A subset of 12 loci (MLVA-12) resolved 89% of the total diversity and matched the genetic structure revealed by MLVA-31. MLVA-12 is proposed for routine epidemiological identification of X. citri pv. citri, whereas MLVA-31 is proposed for phylogenetic and population genetics studies. MLVA-31 represents an opportunity for international X. citri pv. citri genotyping and data sharing. The MLVA-31 data generated in this study was deposited in the Xanthomonas citri genotyping database (http://www.biopred.net/MLVA/). PMID:24897119
Molecular characterization and phylogenetic inferences of Dermanyssus gallinae isolates in Italy within an European framework.

PubMed

Marangi, M; Cantacessi, C; Sparagano, O A E; Camarda, A; Giangaspero, A

2014-12-01

In order to investigate the genetic relationships between Dermanyssus gallinae (Metastigmata: Dermanyssidae) (de Geer) isolates from poultry farms in Italy and other European countries, phylogenetic analysis was performed using a portion of the cytochrome c oxidase subunit 1 (cox1) gene of the mitochondrial DNA and the internal transcribed spacers (ITS1+5.8S+ITS2) of the ribosomal DNA. A total of 360 cox1 sequences and 360 ITS+ sequences were obtained from mites collected on 24 different poultry farms in 10 different regions of Northern and Southern Italy. Phylogenetic analysis of the cox1 sequences resulted in the clustering of two groups (A and B), whereas phylogenetic analysis of the ITS+ resulted in largely unresolved clusters. Knowledge of the genetic make-up of mite populations within countries, together with comparative analyses of D. gallinae isolates from different countries, will provide better understanding of the population dynamics of D. gallinae. This will also allow the identification of genetic markers of emerging acaricide resistance and the development of alternative strategies for the prevention and treatment of infestations. © 2014 The Royal Entomological Society.
SinEx DB: a database for single exon coding sequences in mammalian genomes.

PubMed

Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

2016-01-01

Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
An evolutionary analysis of the GH57 amylopullulanases based on the DOMON_glucodextranase_like domains.

PubMed

Jiao, Yu-Liang; Wang, Shu-Jun; Lv, Ming-Sheng; Fang, Yao-Wei; Liu, Shu

2013-03-01

Thermostable amylopullulanase (TAPU) is valuable in starch saccharification industry for its capability to catalyze both α-1,4 and α-1,6 glucosidic bonds under the industrial starch liquefication condition. The majority of TAPUs belong to glycoside hydrolase family 57 (GH57). In this study, we performed a phylogenetic analysis of GH57 amylopullulanase (APU) based on the highly conserved DOMON_glucodextranase_like (DDL) domain and classified APUs according to their multidomain architectures, phylogenetic analysis and enzymatic characters. This study revealed that amylopullulanase, pullulanase, andα-amylase had passed through a long joint evolution process, in which DDL played an important role. The phylogenetic analysis of DDL domain showed that the GH57 APU is directly sharing a common ancestor with pullulanase, and the DDL domains in some species undergo evolution scenarios such as domain duplication and recombination. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NAC transcription factor genes: genome-wide identification, phylogenetic, motif and cis-regulatory element analysis in pigeonpea (Cajanus cajan (L.) Millsp.).

PubMed

Satheesh, Viswanathan; Jagannadham, P Tej Kumar; Chidambaranathan, Parameswaran; Jain, P K; Srinivasan, R

2014-12-01

The NAC (NAM, ATAF and CUC) proteins are plant-specific transcription factors implicated in development and stress responses. In the present study 88 pigeonpea NAC genes were identified from the recently published draft genome of pigeonpea by using homology based and de novo prediction programmes. These sequences were further subjected to phylogenetic, motif and promoter analyses. In motif analysis, highly conserved motifs were identified in the NAC domain and also in the C-terminal region of the NAC proteins. A phylogenetic reconstruction using pigeonpea, Arabidopsis and soybean NAC genes revealed 33 putative stress-responsive pigeonpea NAC genes. Several stress-responsive cis-elements were identified through in silico analysis of the promoters of these putative stress-responsive genes. This analysis is the first report of NAC gene family in pigeonpea and will be useful for the identification and selection of candidate genes associated with stress tolerance.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.