Sample records for cluster sequence analysis

  1. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    PubMed

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  2. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  3. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072

  4. Defining objective clusters for rabies virus sequences using affinity propagation clustering

    PubMed Central

    Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

    2018-01-01

    Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361

  5. Impact of Sampling Density on the Extent of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2014-01-01

    Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430

  6. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  7. Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization.

    PubMed

    Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa

    2017-01-01

    The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.

  8. Characterization of HIV Transmission in South-East Austria

    PubMed Central

    Kessler, Harald H.; Haas, Bernhard; Stelzl, Evelyn; Weninger, Karin; Little, Susan J.; Mehta, Sanjay R.

    2016-01-01

    To gain deeper insight into the epidemiology of HIV-1 transmission in South-East Austria we performed a retrospective analysis of 259 HIV-1 partial pol sequences obtained from unique individuals newly diagnosed with HIV infection in South-East Austria from 2008 through 2014. After quality filtering, putative transmission linkages were inferred when two sequences were ≤1.5% genetically different. Multiple linkages were resolved into putative transmission clusters. Further phylogenetic analyses were performed using BEAST v1.8.1. Finally, we investigated putative links between the 259 sequences from South-East Austria and all publicly available HIV polymerase sequences in the Los Alamos National Laboratory HIV sequence database. We found that 45.6% (118/259) of the sampled sequences were genetically linked with at least one other sequence from South-East Austria forming putative transmission clusters. Clustering individuals were more likely to be men who have sex with men (MSM; p<0.001), infected with subtype B (p<0.001) or subtype F (p = 0.02). Among clustered males who reported only heterosexual (HSX) sex as an HIV risk, 47% clustered closely with MSM (either as pairs or within larger MSM clusters). One hundred and seven of the 259 sequences (41.3%) from South-East Austria had at least one putative inferred linkage with sequences from a total of 69 other countries. In conclusion, analysis of HIV-1 sequences from newly diagnosed individuals residing in South-East Austria revealed a high degree of national and international clustering mainly within MSM. Interestingly, we found that a high number of heterosexual males clustered within MSM networks, suggesting either linkage between risk groups or misrepresentation of sexual risk behaviors by subjects. PMID:26967154

  9. Characterization of HIV Transmission in South-East Austria.

    PubMed

    Hoenigl, Martin; Chaillon, Antoine; Kessler, Harald H; Haas, Bernhard; Stelzl, Evelyn; Weninger, Karin; Little, Susan J; Mehta, Sanjay R

    2016-01-01

    To gain deeper insight into the epidemiology of HIV-1 transmission in South-East Austria we performed a retrospective analysis of 259 HIV-1 partial pol sequences obtained from unique individuals newly diagnosed with HIV infection in South-East Austria from 2008 through 2014. After quality filtering, putative transmission linkages were inferred when two sequences were ≤1.5% genetically different. Multiple linkages were resolved into putative transmission clusters. Further phylogenetic analyses were performed using BEAST v1.8.1. Finally, we investigated putative links between the 259 sequences from South-East Austria and all publicly available HIV polymerase sequences in the Los Alamos National Laboratory HIV sequence database. We found that 45.6% (118/259) of the sampled sequences were genetically linked with at least one other sequence from South-East Austria forming putative transmission clusters. Clustering individuals were more likely to be men who have sex with men (MSM; p<0.001), infected with subtype B (p<0.001) or subtype F (p = 0.02). Among clustered males who reported only heterosexual (HSX) sex as an HIV risk, 47% clustered closely with MSM (either as pairs or within larger MSM clusters). One hundred and seven of the 259 sequences (41.3%) from South-East Austria had at least one putative inferred linkage with sequences from a total of 69 other countries. In conclusion, analysis of HIV-1 sequences from newly diagnosed individuals residing in South-East Austria revealed a high degree of national and international clustering mainly within MSM. Interestingly, we found that a high number of heterosexual males clustered within MSM networks, suggesting either linkage between risk groups or misrepresentation of sexual risk behaviors by subjects.

  10. The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

    PubMed

    Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

    2015-11-04

    In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.

  11. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    PubMed

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).

  12. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  14. Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

    PubMed Central

    Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

    2011-01-01

    Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716

  15. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  16. GibbsCluster: unsupervised clustering and alignment of peptide sequences.

    PubMed

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-07-03

    Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene,more » and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  18. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

    PubMed

    Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

    2018-01-01

    Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

  19. Cloning and Characterization of the Pyrrolomycin Biosynthetic Gene Clusters from Actinosporangium vitaminophilum ATCC 31673 and Streptomyces sp. Strain UC 11065▿

    PubMed Central

    Zhang, Xiujun; Parry, Ronald J.

    2007-01-01

    The pyrrolomycins are a family of polyketide antibiotics, some of which contain a nitro group. To gain insight into the nitration mechanism associated with the formation of these antibiotics, the pyrrolomycin biosynthetic gene cluster from Actinosporangium vitaminophilum was cloned. Sequencing of ca. 56 kb of A. vitaminophilum DNA revealed 35 open reading frames (ORFs). Sequence analysis revealed a clear relationship between some of these ORFs and the biosynthetic gene cluster for pyoluteorin, a structurally related antibiotic. Since a gene transfer system could not be devised for A. vitaminophilum, additional proof for the identity of the cloned gene cluster was sought by cloning the pyrrolomycin gene cluster from Streptomyces sp. strain UC 11065, a transformable pyrrolomycin producer. Sequencing of ca. 26 kb of UC 11065 DNA revealed the presence of 17 ORFs, 15 of which exhibit strong similarity to ORFs in the A. vitaminophilum cluster as well as a nearly identical organization. Single-crossover disruption of two genes in the UC 11065 cluster abolished pyrrolomycin production in both cases. These results confirm that the genetic locus cloned from UC 11065 is essential for pyrrolomycin production, and they also confirm that the highly similar locus in A. vitaminophilum encodes pyrrolomycin biosynthetic genes. Sequence analysis revealed that both clusters contain genes encoding the two components of an assimilatory nitrate reductase. This finding suggests that nitrite is required for the formation of the nitrated pyrrolomycins. However, sequence analysis did not provide additional insights into the nitration process, suggesting the operation of a novel nitration mechanism. PMID:17158935

  20. Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

    PubMed Central

    Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

    2014-01-01

    Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182

  1. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

    PubMed Central

    Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

    2011-01-01

    High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

  2. Evolutionary interpretations of mycobacteriophage biodiversity and host-range through the analysis of codon usage bias.

    PubMed

    Esposito, Lauren A; Gupta, Swati; Streiter, Fraida; Prasad, Ashley; Dennehy, John J

    2016-10-01

    In an genomics course sponsored by the Howard Hughes Medical Institute (HHMI), undergraduate students have isolated and sequenced the genomes of more than 1,150 mycobacteriophages, creating the largest database of sequenced bacteriophages able to infect a single host, Mycobacterium smegmatis , a soil bacterium. Genomic analysis indicates that these mycobacteriophages can be grouped into 26 clusters based on genetic similarity. These clusters span a continuum of genetic diversity, with extensive genomic mosaicism among phages in different clusters. However, little is known regarding the primary hosts of these mycobacteriophages in their natural habitats, nor of their broader host ranges. As such, it is possible that the primary host of many newly isolated mycobacteriophages is not M. smegmatis , but instead a range of closely related bacterial species. However, determining mycobacteriophage host range presents difficulties associated with mycobacterial cultivability, pathogenicity and growth. Another way to gain insight into mycobacteriophage host range and ecology is through bioinformatic analysis of their genomic sequences. To this end, we examined the correlations between the codon usage biases of 199 different mycobacteriophages and those of several fully sequenced mycobacterial species in order to gain insight into the natural host range of these mycobacteriophages. We find that UPGMA clustering tends to match, but not consistently, clustering by shared nucleotide sequence identify. In addition, analysis of GC content, tRNA usage and correlations between mycobacteriophage and mycobacterial codon usage bias suggests that the preferred host of many clustered mycobacteriophages is not M. smegmatis but other, as yet unknown, members of the mycobacteria complex or closely allied bacterial species.

  3. Evolutionary interpretations of mycobacteriophage biodiversity and host-range through the analysis of codon usage bias

    PubMed Central

    Esposito, Lauren A.; Gupta, Swati; Streiter, Fraida; Prasad, Ashley

    2016-01-01

    In an genomics course sponsored by the Howard Hughes Medical Institute (HHMI), undergraduate students have isolated and sequenced the genomes of more than 1,150 mycobacteriophages, creating the largest database of sequenced bacteriophages able to infect a single host, Mycobacterium smegmatis, a soil bacterium. Genomic analysis indicates that these mycobacteriophages can be grouped into 26 clusters based on genetic similarity. These clusters span a continuum of genetic diversity, with extensive genomic mosaicism among phages in different clusters. However, little is known regarding the primary hosts of these mycobacteriophages in their natural habitats, nor of their broader host ranges. As such, it is possible that the primary host of many newly isolated mycobacteriophages is not M. smegmatis, but instead a range of closely related bacterial species. However, determining mycobacteriophage host range presents difficulties associated with mycobacterial cultivability, pathogenicity and growth. Another way to gain insight into mycobacteriophage host range and ecology is through bioinformatic analysis of their genomic sequences. To this end, we examined the correlations between the codon usage biases of 199 different mycobacteriophages and those of several fully sequenced mycobacterial species in order to gain insight into the natural host range of these mycobacteriophages. We find that UPGMA clustering tends to match, but not consistently, clustering by shared nucleotide sequence identify. In addition, analysis of GC content, tRNA usage and correlations between mycobacteriophage and mycobacterial codon usage bias suggests that the preferred host of many clustered mycobacteriophages is not M. smegmatis but other, as yet unknown, members of the mycobacteria complex or closely allied bacterial species. PMID:28348827

  4. Portuguese Lexical Clusters and CVC Sequences in Speech Perception and Production.

    PubMed

    Cunha, Conceição

    2015-01-01

    This paper investigates similarities between lexical consonant clusters and CVC sequences differing in the presence or absence of a lexical vowel in speech perception and production in two Portuguese varieties. The frequent high vowel deletion in the European variety (EP) and the realization of intervening vocalic elements between lexical clusters in Brazilian Portuguese (BP) may minimize the contrast between lexical clusters and CVC sequences in the two Portuguese varieties. In order to test this hypothesis we present a perception experiment with 72 participants and a physiological analysis of 3-dimensional movement data from 5 EP and 4 BP speakers. The perceptual results confirmed a gradual confusion of lexical clusters and CVC sequences in EP, which corresponded roughly to the gradient consonantal overlap found in production. © 2015 S. Karger AG, Basel.

  5. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    PubMed

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies.

  6. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    PubMed Central

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies. PMID:21106786

  7. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  8. Phylogenetic diversity in the genus Bacillus as seen by 16S rRNA sequencing studies

    NASA Technical Reports Server (NTRS)

    Rossler, D.; Ludwig, W.; Schleifer, K. H.; Lin, C.; McGill, T. J.; Wisotzkey, J. D.; Jurtshuk, P. Jr; Fox, G. E.

    1991-01-01

    Comparative sequence analysis of 16S ribosomal (r)RNAs or DNAs of Bacillus alvei, B. laterosporus, B. macerans, B. macquariensis, B. polymyxa and B. stearothermophilus revealed the phylogenetic diversity of the genus Bacillus. Based on the presently available data set of 16S rRNA sequences from bacilli and relatives at least four major "Bacillus clusters" can be defined: a "Bacillus subtilis cluster" including B. stearothermophilus, a "B. brevis cluster" including B. laterosporus, a "B. alvei cluster" including B. macerans, B. maquariensis and B. polymyxa and a "B. cycloheptanicus branch".

  9. Statistical Features of the 2010 Beni-Ilmane, Algeria, Aftershock Sequence

    NASA Astrophysics Data System (ADS)

    Hamdache, M.; Peláez, J. A.; Gospodinov, D.; Henares, J.

    2018-03-01

    The aftershock sequence of the 2010 Beni-Ilmane ( M W 5.5) earthquake is studied in depth to analyze the spatial and temporal variability of seismicity parameters of the relationships modeling the sequence. The b value of the frequency-magnitude distribution is examined rigorously. A threshold magnitude of completeness equal to 2.1, using the maximum curvature procedure or the changing point algorithm, and a b value equal to 0.96 ± 0.03 have been obtained for the entire sequence. Two clusters have been identified and characterized by their faulting type, exhibiting b values equal to 0.99 ± 0.05 and 1.04 ± 0.05. Additionally, the temporal decay of the aftershock sequence was examined using a stochastic point process. The analysis was done through the restricted epidemic-type aftershock sequence (RETAS) stochastic model, which allows the possibility to recognize the prevailing clustering pattern of the relaxation process in the examined area. The analysis selected the epidemic-type aftershock sequence (ETAS) model to offer the most appropriate description of the temporal distribution, which presumes that all events in the sequence can cause secondary aftershocks. Finally, the fractal dimensions are estimated using the integral correlation. The obtained D 2 values are 2.15 ± 0.01, 2.23 ± 0.01 and 2.17 ± 0.02 for the entire sequence, and for the first and second cluster, respectively. An analysis of the temporal evolution of the fractal dimensions D -2, D 0, D 2 and the spectral slope has been also performed to derive and characterize the different clusters included in the sequence.

  10. Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.

    PubMed

    Cheng, Min; Zhang, Wen-jian; Xing, Tian-yu; Yan, Xiao-hong; Li, Yu-mao; Li, Hui; Wang, Ning

    2016-08-01

    miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19.82% and 60.14%, respectively. These data demonstrated that the important promoter region of chicken miR-17-92 cluster is located in the -3400/-2506 bp region. Our results lay the foundation for revealing the transcriptional regulatory mechanisms of chicken miR-17-92 cluster.

  11. The Nature of Red-Sequence Cluster Spiral Galaxies

    NASA Astrophysics Data System (ADS)

    Kashur, Lane; Barkhouse, Wayne; Sultanova, Madina; Kalawila Vithanage, Sandanuwa; Archer, Haylee; Foote, Gregory; Mathew, Elijah; Rude, Cody; Lopez-Cruz, Omar

    2017-01-01

    Preliminary analysis of the red-sequence galaxy population from a sample of 57 low-redshift galaxy clusters observed using the KPNO 0.9m telescope and 74 clusters from the WINGS dataset, indicates that a small fraction of red-sequence galaxies have a morphology consistent with spiral systems. For spiral galaxies to acquire the color of elliptical/S0s at a similar luminosity, they must either have been stripped of their star-forming gas at an earlier epoch, or contain a larger than normal fraction of dust. To test these ideas we have compiled a sample of red-sequence spiral galaxies and examined their infrared properties as measured by 2MASS, WISE, Spitzer, and Herschel. These IR data allows us to estimate the amount of dust in each of our red-sequence spiral galaxies. We compare the estimated dust mass in each of these red-sequence late-type galaxies with spiral galaxies located in the same cluster field but having colors inconsistent with the red-sequence. We thus provide a statistical measure to discriminate between purely passive spiral galaxy evolution and dusty spirals to explain the presence of these late-type systems in cluster red-sequences.

  12. A Cluster of Legionella-Associated Pneumonia Cases in a Population of Military Recruits

    DTIC Science & Technology

    2007-06-01

    this cluster may suggest a previously unrecognized suscep- FIG. 1. Phylogenic analysis of the training center strain (represented by the MCRD consensus...military recruits during population- based surveillance for pneumonia pathogens. Results were confirmed by sequence analysis . Cases cluster tightly...17 April 2007 A Legionella cluster was identified through retrospective PCR analysis of 240 throat swab samples from X-ray-confirmed pneumonia cases

  13. Joint Sequence Analysis: Association and Clustering

    ERIC Educational Resources Information Center

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  14. Oligonucleotide fingerprinting of rRNA genes for analysis of fungal community composition.

    PubMed

    Valinsky, Lea; Della Vedova, Gianluca; Jiang, Tao; Borneman, James

    2002-12-01

    Thorough assessments of fungal diversity are currently hindered by technological limitations. Here we describe a new method for identifying fungi, oligonucleotide fingerprinting of rRNA genes (OFRG). ORFG sorts arrayed rRNA gene (ribosomal DNA [rDNA]) clones into taxonomic clusters through a series of hybridization experiments, each using a single oligonucleotide probe. A simulated annealing algorithm was used to design an OFRG probe set for fungal rDNA. Analysis of 1,536 fungal rDNA clones derived from soil generated 455 clusters. A pairwise sequence analysis showed that clones with average sequence identities of 99.2% were grouped into the same cluster. To examine the accuracy of the taxonomic identities produced by this OFRG experiment, we determined the nucleotide sequences for 117 clones distributed throughout the tree. For all but two of these clones, the taxonomic identities generated by this OFRG experiment were consistent with those generated by a nucleotide sequence analysis. Eighty-eight percent of the clones were affiliated with Ascomycota, while 12% belonged to BASIDIOMYCOTA: A large fraction of the clones were affiliated with the genera Fusarium (404 clones) and Raciborskiomyces (176 clones). Smaller assemblages of clones had high sequence identities to the Alternaria, Ascobolus, Chaetomium, Cryptococcus, and Rhizoctonia clades.

  15. Determinants of HIV Phylogenetic Clustering in Chicago Among Young Black Men Who Have Sex With Men From the uConnect Cohort.

    PubMed

    Morgan, Ethan; Nyaku, Amesika N; DʼAquila, Richard T; Schneider, John A

    2017-07-01

    Phylogenetic analysis determines similarities among HIV genetic sequences from persons infected with HIV, identifying clusters of transmission. We determined characteristics associated with both membership in an HIV transmission cluster and the number of clustered sequences among a cohort of young black men who have sex with men (YBMSM) in Chicago. Pairwise genetic distances of HIV-1 pol sequences were collected during 2013-2016. Potential transmission ties were identified among HIV-infected persons whose sequences were ≤1.5% genetically distant. Putative transmission pairs were defined as ≥1 tie to another sequence. We then determined demographic and risk attributes associated with both membership in an HIV transmission cluster and the number of ties to the sequences from other persons in the cluster. Of 86 available sequences, 31 (36.0%) were tied to ≥1 other sequence. Through multivariable analyses, we determined that those who reported symptoms of depression and those who had a higher number of confidants in their network had significantly decreased odds of membership in transmission clusters. We found that those who had unstable housing and who reported heavy marijuana use had significantly more ties to other individuals within transmission clusters, whereas those identifying as bisexual, those participating in group sex, and those with higher numbers of sexual partners had significantly fewer ties. This study demonstrates the potential for combining phylogenetic and individual and network attributes to target HIV control efforts to persons with potentially higher transmission risk, as well as suggesting some unappreciated specific predictors of transmission risk among YBMSM in Chicago for future study.

  16. Nuclear counterparts of the cytoplasmic mitochondrial 12S rRNA gene: a problem of ancient DNA and molecular phylogenies.

    PubMed

    van der Kuyl, A C; Kuiken, C L; Dekker, J T; Perizonius, W R; Goudsmit, J

    1995-06-01

    Monkey mummy bones and teeth originating from the North Saqqara Baboon Galleries (Egypt), soft tissue from a mummified baboon in a museum collection, and nineteenth/twentieth-century skin fragments from mangabeys were used for DNA extraction and PCR amplification of part of the mitochondrial 12S rRNA gene. Sequences aligning with the 12S rRNA gene were recovered but were only distantly related to contemporary monkey mitochondrial 12S rRNA sequences. However, many of these sequences were identical or closely related to human nuclear DNA sequences resembling mitochondrial 12S rRNA (isolated from a cell line depleted in mitochondria) and therefore have to be considered contamination. Subsequently in a separate study we were able to recover genuine mitochondrial 12S rRNA sequences from many extant species of nonhuman Old World primates and sequences closely resembling the human nuclear integrations. Analysis of all sequences by the neighbor-joining (NJ) method indicated that mitochondrial DNA sequences and their nuclear counterparts can be divided into two distinct clusters. One cluster contained all temporary cytoplasmic mitochondrial DNA sequences and approximately half of the monkey nuclear mitochondriallike sequences. A second cluster contained most human nuclear sequences and the other half of monkey nuclear sequences with a separate branch leading to human and gorilla mitochondrial and nuclear sequences. Sequences recovered from ancient materials were equally divided between the two clusters. These results constitute a warning for when working with ancient DNA or performing phylogenetic analysis using mitochondrial DNA as a target sequence: Nuclear counterparts of mitochondrial genes may lead to faulty interpretation of results.

  17. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

    PubMed

    He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

    2015-01-01

    The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.

  18. Diversity and evolution analysis of glycoprotein GP85 from avian leukosis virus subgroup J isolates from chickens of different genetic backgrounds during 1989-2016: Coexistence of five extremely different clusters.

    PubMed

    Wang, Peikun; Lin, Lulu; Li, Haijuan; Yang, Yongli; Huang, Teng; Wei, Ping

    2018-02-01

    ALV-J has caused the most serious losses to the poultry industry in China. The gp85-coding sequence of ALV-J is known to be prone to mutation, but any association between the gp85 gene and breed of chicken remains unclear. A comprehensive and systematic study of the evolutionary process of ALV-J in China is needed. In this study, we compared and analyzed gp85 gene sequences from 198 ALV-J isolates, originating from China, USA, UK and France during 1989-2016. These were sorted into five clusters. Cluster 1, 2, 3, 4 and 5 included isolates from chicken types of different genetic backgrounds, e.g. white-feather broiler, Guangxi indigenous chicken breeds, Yellow chickens and layer chickens respectively. A correlation comparison of amino acid sequence similarities in the gp85 protein among the five clusters showed significant differences (P < 0.01) with the exception being when the third and fifth cluster were compared (P > 0.05). Results of entropy analysis of the gp85 sequences revealed that cluster 3 had the largest variation and cluster 1 had the least variation. The N-glycosylation sites in the majority of isolates numbered 14, 16, 17, 16 and 16, respectively, with regards to clusters 1-5. In addition, 5 isolates from cluster 3 had one more glycosylation site than the other isolates from cluster 3. Our study provides evidence that there were five extremely different ALV-J clusters during 1989-2016 and that the gp85 genes isolated from indigenous chicken breed isolates had the largest variation.

  19. SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.

    PubMed

    Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto

    2010-03-09

    An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.

  20. HIV Transmission Networks in the San Diego–Tijuana Border Region

    PubMed Central

    Mehta, Sanjay R.; Wertheim, Joel O.; Brouwer, Kimberly C.; Wagner, Karla D.; Chaillon, Antoine; Strathdee, Steffanie; Patterson, Thomas L.; Rangel, Maria G.; Vargas, Mlenka; Murrell, Ben; Garfein, Richard; Little, Susan J.; Smith, Davey M.

    2015-01-01

    Background HIV sequence data can be used to reconstruct local transmission networks. Along international borders, like the San Diego–Tijuana region, understanding the dynamics of HIV transmission across reported risks, racial/ethnic groups, and geography can help direct effective prevention efforts on both sides of the border. Methods We gathered sociodemographic, geographic, clinical, and viral sequence data from HIV infected individuals participating in ten studies in the San Diego–Tijuana border region. Phylogenetic and network analysis was performed to infer putative relationships between HIV sequences. Correlates of identified clusters were evaluated and spatiotemporal relationships were explored using Bayesian phylogeographic analysis. Findings After quality filtering, 843 HIV sequences with associated demographic data and 263 background sequences from the region were analyzed, and 138 clusters were inferred (2–23 individuals). Overall, the rate of clustering did not differ by ethnicity, residence, or sex, but bisexuals were less likely to cluster than heterosexuals or men who have sex with men (p = 0.043), and individuals identifying as white (p ≤ 0.01) were more likely to cluster than other races. Clustering individuals were also 3.5 years younger than non-clustering individuals (p < 0.001). Although the sampled San Diego and Tijuana epidemics were phylogenetically compartmentalized, five clusters contained individuals residing on both sides of the border. Interpretation This study sampled ~ 7% of HIV infected individuals in the border region, and although the sampled networks on each side of the border were largely separate, there was evidence of persistent bidirectional cross-border transmissions that linked risk groups, thus highlighting the importance of the border region as a “melting pot” of risk groups. Funding NIH, VA, and Pendleton Foundation. PMID:26629540

  1. HIV Transmission Networks in the San Diego-Tijuana Border Region.

    PubMed

    Mehta, Sanjay R; Wertheim, Joel O; Brouwer, Kimberly C; Wagner, Karla D; Chaillon, Antoine; Strathdee, Steffanie; Patterson, Thomas L; Rangel, Maria G; Vargas, Mlenka; Murrell, Ben; Garfein, Richard; Little, Susan J; Smith, Davey M

    2015-10-01

    HIV sequence data can be used to reconstruct local transmission networks. Along international borders, like the San Diego-Tijuana region, understanding the dynamics of HIV transmission across reported risks, racial/ethnic groups, and geography can help direct effective prevention efforts on both sides of the border. We gathered sociodemographic, geographic, clinical, and viral sequence data from HIV infected individuals participating in ten studies in the San Diego-Tijuana border region. Phylogenetic and network analysis was performed to infer putative relationships between HIV sequences. Correlates of identified clusters were evaluated and spatiotemporal relationships were explored using Bayesian phylogeographic analysis. After quality filtering, 843 HIV sequences with associated demographic data and 263 background sequences from the region were analyzed, and 138 clusters were inferred (2-23 individuals). Overall, the rate of clustering did not differ by ethnicity, residence, or sex, but bisexuals were less likely to cluster than heterosexuals or men who have sex with men (p = 0.043), and individuals identifying as white (p ≤ 0.01) were more likely to cluster than other races. Clustering individuals were also 3.5 years younger than non-clustering individuals (p < 0.001). Although the sampled San Diego and Tijuana epidemics were phylogenetically compartmentalized, five clusters contained individuals residing on both sides of the border. This study sampled ~ 7% of HIV infected individuals in the border region, and although the sampled networks on each side of the border were largely separate, there was evidence of persistent bidirectional cross-border transmissions that linked risk groups, thus highlighting the importance of the border region as a "melting pot" of risk groups. NIH, VA, and Pendleton Foundation.

  2. Estudio de la población estelar de varios cúmulos en Carina

    NASA Astrophysics Data System (ADS)

    Molina-Lera, J. A.; Baume, G. L.; Carraro, G.; Costa, E.

    2015-08-01

    Based on deep photometric data in the bands, complemented with infrared 2MASS data, we conducted an analysis of the fundamental parameters of six open clusters located in the Carina region. To perform a systematic study we developed a specialized code. In particular, we investigated the behavior of the respective lower main sequences. Our analysis indicated the presence of a significant population of pre-sequence stars in several of the clusters. We therefore obtained estimated values of contraction ages. Furthermore, we have determined the slopes of the initial mass functions of the studied clusters.

  3. Phylogenetic relationship of Ornithobacterium rhinotracheale strains.

    PubMed

    DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo

    2018-04-10

    The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.

  4. A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters.

    PubMed

    Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing

    2017-10-27

    Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.

  5. Interspecific and intraspecific gene variability in a 1-Mb region containing the highest density of NBS-LRR genes found in the melon genome.

    PubMed

    González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere

    2014-12-17

    Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes.

  6. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing

    PubMed Central

    Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641

  7. Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

    PubMed Central

    Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro

    2014-01-01

    The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631

  8. Phylodynamic Analysis Revealed That Epidemic of CRF07_BC Strain in Men Who Have Sex with Men Drove Its Second Spreading Wave in China.

    PubMed

    Zhang, Min; Jia, Dijing; Li, Hanping; Gui, Tao; Jia, Lei; Wang, Xiaolin; Li, Tianyi; Liu, Yongjian; Bao, Zuoyi; Liu, Siyang; Zhuang, Daomin; Li, Jingyun; Li, Lin

    2017-10-01

    CRF07_BC was originally formed in Yunnan province of China in 1980s and spread quickly in injecting drug users (IDUs). In recent years, it has been introduced into men who have sex with men (MSM) and become the most dominant strain in China. In this study, we performed a comprehensively phylodynamic analysis of CRF07_BC sequences from China. All CRF07_BC sequences identified in China were retrieved from database. More sequences obtained in our laboratory were added to make the dataset more representative. A maximum-likelihood (ML) tree was constructed with PhyML3.0. Maximum clade credibility (MCC) tree and effective population size were predicted by using Markov Chains Monte Carlo sampling method with Beast software. A total of 610 CRF07_BC sequences coving 1,473 bp of the gag gene (from 817 to 2,289 according to HXB2 calculator) were included into the dataset. Three epidemic clusters were identified; two clusters comprised sequences from IDUs, while one cluster mainly contained sequences from MSMs. The time of the most recent common ancestor of clusters that composed of sequences from MSMs was estimated to be in 2000. Two rapid spreading waves of effective population size of CRF07_BC infections were identified in the skyline plot. The second wave coincided with the expanding of MSM cluster. The results indicated that the control of CRF07_BC infections in MSMs would help to decrease its epidemic in China.

  9. Inferring Phylogenetic Relationships of Indian Citron (Citrus medica L.) based on rbcL and matK Sequences of Chloroplast DNA.

    PubMed

    Uchoi, Ajit; Malik, Surendra Kumar; Choudhary, Ravish; Kumar, Susheel; Rohini, M R; Pal, Digvender; Ercisli, Sezai; Chaudhury, Rekha

    2016-06-01

    Phylogenetic relationships of Indian Citron (Citrus medica L.) with other important Citrus species have been inferred through sequence analyses of rbcL and matK gene region of chloroplast DNA. The study was based on 23 accessions of Citrus genotypes representing 15 taxa of Indian Citrus, collected from wild, semi-wild, and domesticated stocks. The phylogeny was inferred using the maximum parsimony (MP) and neighbor-joining (NJ) methods. Both MP and NJ trees separated all the 23 accessions of Citrus into five distinct clusters. The chloroplast DNA (cpDNA) analysis based on rbcL and matK sequence data carried out in Indian taxa of Citrus was useful in differentiating all the true species and species/varieties of probable hybrid origin in distinct clusters or groups. Sequence analysis based on rbcL and matK gene provided unambiguous identification and disposition of true species like C. maxima, C. medica, C. reticulata, and related hybrids/cultivars. The separation of C. maxima, C. medica, and C. reticulata in distinct clusters or sub-clusters supports their distinctiveness as the basic species of edible Citrus. However, the cpDNA sequence analysis of rbcL and matK gene could not find any clear cut differentiation between subgenera Citrus and Papeda as proposed in Swingle's system of classification.

  10. Strain-Level Diversity of Secondary Metabolism in Streptomyces albus

    PubMed Central

    Seipke, Ryan F.

    2015-01-01

    Streptomyces spp. are robust producers of medicinally-, industrially- and agriculturally-important small molecules. Increased resistance to antibacterial agents and the lack of new antibiotics in the pipeline have led to a renaissance in natural product discovery. This endeavor has benefited from inexpensive high quality DNA sequencing technology, which has generated more than 140 genome sequences for taxonomic type strains and environmental Streptomyces spp. isolates. Many of the sequenced streptomycetes belong to the same species. For instance, Streptomyces albus has been isolated from diverse environmental niches and seven strains have been sequenced, consequently this species has been sequenced more than any other streptomycete, allowing valuable analyses of strain-level diversity in secondary metabolism. Bioinformatics analyses identified a total of 48 unique biosynthetic gene clusters harboured by Streptomyces albus strains. Eighteen of these gene clusters specify the core secondary metabolome of the species. Fourteen of the gene clusters are contained by one or more strain and are considered auxiliary, while 16 of the gene clusters encode the production of putative strain-specific secondary metabolites. Analysis of Streptomyces albus strains suggests that each strain of a Streptomyces species likely harbours at least one strain-specific biosynthetic gene cluster. Importantly, this implies that deep sequencing of a species will not exhaust gene cluster diversity and will continue to yield novelty. PMID:25635820

  11. Wheat EST resources for functional genomics of abiotic stress

    PubMed Central

    Houde, Mario; Belcaid, Mahdi; Ouellet, François; Danyluk, Jean; Monroy, Antonio F; Dryanova, Ani; Gulick, Patrick; Bergeron, Anne; Laroche, André; Links, Matthew G; MacCarthy, Luke; Crosby, William L; Sarhan, Fathey

    2006-01-01

    Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID:16772040

  12. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins.

    PubMed

    Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C; Morris, John H; Ferrin, Thomas E; Poole, Leslie B; Fetrow, Jacquelyn S

    2017-02-01

    Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.

  13. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

    PubMed Central

    Babbitt, Patricia C.; Ferrin, Thomas E.

    2017-01-01

    Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially—MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method’s novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences. PMID:28187133

  14. Genetic diversity of Rhizobia isolates from Amazon soils using cowpea (Vigna unguiculata) as trap plant

    PubMed Central

    Silva, F.V.; Simões-Araújo, J.L.; Silva Júnior, J.P.; Xavier, G.R.; Rumjanek, N.G.

    2012-01-01

    The aim of this work was to characterize rhizobia isolated from the root nodules of cowpea (Vigna unguiculata) plants cultivated in Amazon soils samples by means of ARDRA (Amplified rDNA Restriction Analysis) and sequencing analysis, to know their phylogenetic relationships. The 16S rRNA gene of rhizobia was amplified by PCR (polymerase chain reaction) using universal primers Y1 and Y3. The amplification products were analyzed by the restriction enzymes HinfI, MspI and DdeI and also sequenced with Y1, Y3 and six intermediate primers. The clustering analysis based on ARDRA profiles separated the Amazon isolates in three subgroups, which formed a group apart from the reference isolates of Bradyrhizobium japonicum and Bradyrhizobium elkanii. The clustering analysis of 16S rRNA gene sequences showed that the fast-growing isolates had similarity with Enterobacter, Rhizobium, Klebsiella and Bradyrhizobium and all the slow-growing clustered close to Bradyrhizobium. PMID:24031880

  15. Investigating the long-term course of schizophrenia by sequence analysis.

    PubMed

    An der Heiden, Wolfram; Häfner, Heinz

    2015-08-30

    In the present study we set out to explore the long-term clinical course of schizophrenia in a holistic manner by adopting sequence analysis. Our aim was to identify course types of illness by means of cluster analysis. The study was based on course and outcome data for 107 patients followed up over 134 months after first admission in the ABC Schizophrenia Study. Focusing on the main syndromes (positive, negative, depressive and unspecific symptoms) and their combinations we looked for similarities in individual illness courses using the 'optimal matching' method. A cluster analysis performed on the resulting similarity matrix yielded two main groups (a 'improving' and a 'chronic' group), which comprised a total of six different types of illness course. The course types differed in both quantitative (frequency of syndromes and syndrome combinations) and qualitative terms (clinical presentation, sequence of syndromes). Cluster membership was only rarely, but clearly associated with sociodemographic characteristics, treatment data and other illness variables. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  16. Whole Genome Sequence and Phylogenetic Analysis Show Helicobacter pylori Strains from Latin America Have Followed a Unique Evolution Pathway

    PubMed Central

    Muñoz-Ramírez, Zilia Y.; Mendez-Tenorio, Alfonso; Kato, Ikuko; Bravo, Maria M.; Rizzato, Cosmeri; Thorell, Kaisa; Torres, Roberto; Aviles-Jimenez, Francisco; Camorlinga, Margarita; Canzian, Federico; Torres, Javier

    2017-01-01

    Helicobacter pylori (HP) genetics may determine its clinical outcomes. Despite high prevalence of HP infection in Latin America (LA), there have been no phylogenetic studies in the region. We aimed to understand the structure of HP populations in LA mestizo individuals, where gastric cancer incidence remains high. The genome of 107 HP strains from Mexico, Nicaragua and Colombia were analyzed with 59 publicly available worldwide genomes. To study bacterial relationship on whole genome level we propose a virtual hybridization technique using thousands of high-entropy 13 bp DNA probes to generate fingerprints. Phylogenetic virtual genome fingerprint (VGF) was compared with Multi Locus Sequence Analysis (MLST) and with phylogenetic analyses of cagPAI virulence island sequences. With MLST some Nicaraguan and Mexican strains clustered close to Africa isolates, whereas European isolates were spread without clustering and intermingled with LA isolates. VGF analysis resulted in increased resolution of populations, separating European from LA strains. Furthermore, clusters with exclusively Colombian, Mexican, or Nicaraguan strains were observed, where the Colombian cluster separated from Europe, Asia, and Africa, while Nicaraguan and Mexican clades grouped close to Africa. In addition, a mixed large LA cluster including Mexican, Colombian, Nicaraguan, Peruvian, and Salvadorian strains was observed; all LA clusters separated from the Amerind clade. With cagPAI sequence analyses LA clades clearly separated from Europe, Asia and Amerind, and Colombian strains formed a single cluster. A NeighborNet analyses suggested frequent and recent recombination events particularly among LA strains. Results suggests that in the new world, H. pylori has evolved to fit mestizo LA populations, already 500 years after the Spanish colonization. This co-adaption may account for regional variability in gastric cancer risk. PMID:28293542

  17. CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.

    PubMed

    Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo

    2016-01-01

    High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.

  18. CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment

    PubMed Central

    Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo

    2016-01-01

    High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507

  19. High diversity and rapid diversification in the head louse, Pediculus humanus (Pediculidae: Phthiraptera)

    PubMed Central

    Ashfaq, Muhammad; Prosser, Sean; Nasir, Saima; Masood, Mariyam; Ratnasingham, Sujeevan; Hebert, Paul D. N.

    2015-01-01

    The study analyzes sequence variation of two mitochondrial genes (COI, cytb) in Pediculus humanus from three countries (Egypt, Pakistan, South Africa) that have received little prior attention, and integrates these results with prior data. Analysis indicates a maximum K2P distance of 10.3% among 960 COI sequences and 13.8% among 479 cytb sequences. Three analytical methods (BIN, PTP, ABGD) reveal five concordant OTUs for COI and cytb. Neighbor-Joining analysis of the COI sequences confirm five clusters; three corresponding to previously recognized mitochondrial clades A, B, C and two new clades, “D” and “E”, showing 2.3% and 2.8% divergence from their nearest neighbors (NN). Cytb data corroborate five clusters showing that clades “D” and “E” are both 4.6% divergent from their respective NN clades. Phylogenetic analysis supports the monophyly of all clusters recovered by NJ analysis. Divergence time estimates suggest that the earliest split of P. humanus clades occured slightly more than one million years ago (MYa) and the latest about 0.3 MYa. Sequence divergences in COI and cytb among the five clades of P. humanus are 10X those in their human host, a difference that likely reflects both rate acceleration and the acquisition of lice clades from several archaic hominid lineages. PMID:26373806

  20. Determination and analysis of the complete genome sequence of Paralichthys olivaceus rhabdovirus (PORV).

    PubMed

    Zhu, Ruo-Lin; Zhang, Qi-Ya

    2014-04-01

    Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.

  1. Sequence determination and analysis of S-adenosyl-L-homocysteine hydrolase from yellow lupine (Lupinus luteus).

    PubMed

    Brzeziński, K; Janowski, R; Podkowiński, J; Jaskólski, M

    2001-01-01

    The coding sequences of two S-adenosyl-L-homocysteine hydrolases (SAHases) were identified in yellow lupine by screenig of a cDNA library. One of them, corresponding to the complete protein, was sequenced and compared with 52 other SAHase sequences. Phylogenetic analysis of these proteins identified three groups of the enzymes. Group A comprises only bacterial sequences. Group B is subdivided into two subgroups, one of which (B1) is formed by animal sequences. Subgroup B2 consist of two distinct clusters, B2a and B2b. Cluster B2b comprises all known plant sequences, including the yellow lupine enzyme, which are distinguished by a 50-residue insert. Group C is heterogeneous and contains SAHases from Archaea as well as a new class of animal enzymes, distinctly different from those in group B1.

  2. The ergot alkaloid gene cluster in Claviceps purpurea: extension of the cluster sequence and intra species evolution.

    PubMed

    Haarmann, Thomas; Machado, Caroline; Lübbe, Yvonne; Correia, Telmo; Schardl, Christopher L; Panaccione, Daniel G; Tudzynski, Paul

    2005-06-01

    The genomic region of Claviceps purpurea strain P1 containing the ergot alkaloid gene cluster [Tudzynski, P., Hölter, K., Correia, T., Arntz, C., Grammel, N., Keller, U., 1999. Evidence for an ergot alkaloid gene cluster in Claviceps purpurea. Mol. Gen. Genet. 261, 133-141] was explored by chromosome walking, and additional genes probably involved in the ergot alkaloid biosynthesis have been identified. The putative cluster sequence (extending over 68.5kb) contains 4 different nonribosomal peptide synthetase (NRPS) genes and several putative oxidases. Northern analysis showed that most of the genes were co-regulated (repressed by high phosphate), and identified probable flanking genes by lack of co-regulation. Comparison of the cluster sequences of strain P1, an ergotamine producer, with that of strain ECC93, an ergocristine producer, showed high conservation of most of the cluster genes, but significant variation in the NRPS modules, strongly suggesting that evolution of these chemical races of C. purpurea is determined by evolution of NRPS module specificity.

  3. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    PubMed

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Cluster analysis of S. Cerevisiae nucleosome binding sites

    NASA Astrophysics Data System (ADS)

    Suvorova, Y.; Korotkov, E.

    2017-12-01

    It is well known that major part of a eukaryotic genome is wrapped around histone proteins forming nucleosomes. It was also demonstrated that the DNA sequence itself is playing an important role in the nucleosome positioning process. In this work, a cluster analysis of 67 517 nucleosome binding sites from the S. Cerevisiae genome was carried out. The classification method is based on the self-adjusting dinucleotides position weight matrix. As a result, 135 significant clusters were discovered that contain 43225 sequences (which constitutes 64% of the initial set). The meaning of the found classes is discussed, as well as the possibility of the further usage.

  5. Necessary Sequencing Depth and Clustering Method to Obtain Relatively Stable Diversity Patterns in Studying Fish Gut Microbiota.

    PubMed

    Xiao, Fanshu; Yu, Yuhe; Li, Jinjin; Juneau, Philippe; Yan, Qingyun

    2018-05-25

    The 16S rRNA gene is one of the most commonly used molecular markers for estimating bacterial diversity during the past decades. However, there is no consistency about the sequencing depth (from thousand to millions of sequences per sample), and the clustering methods used to generate OTUs may also be different among studies. These inconsistent premises make effective comparisons among studies difficult or unreliable. This study aims to examine the necessary sequencing depth and clustering method that would be needed to ensure a stable diversity patterns for studying fish gut microbiota. A total number of 42 samples dataset of Siniperca chuatsi (carnivorous fish) gut microbiota were used to test how the sequencing depth and clustering may affect the alpha and beta diversity patterns of fish intestinal microbiota. Interestingly, we found that the sequencing depth (resampling 1000-11,000 per sample) and the clustering methods (UPARSE and UCLUST) did not bias the estimates of the diversity patterns during the fish development from larva to adult. Although we should acknowledge that a suitable sequencing depth may differ case by case, our finding indicates that a shallow sequencing such as 1000 sequences per sample may be also enough to reflect the general diversity patterns of fish gut microbiota. However, we have shown in the present study that strict pre-processing of the original sequences is required to ensure reliable results. This study provides evidences to help making a strong scientific choice of the sequencing depth and clustering method for future studies on fish gut microbiota patterns, but at the same time reducing as much as possible the costs related to the analysis.

  6. Faster sequence homology searches by clustering subsequences.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2015-04-15

    Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  7. KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

    PubMed

    Laetsch, Dominik R; Blaxter, Mark L

    2017-10-05

    The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

  8. Novel gastric helicobacters and oral campylobacters are present in captive and wild cetaceans

    PubMed Central

    Goldman, Cinthia G.; Matteo, Mario J.; Loureiro, Julio D.; Almuzara, Marisa; Barberis, Claudia; Vay, Carlos; Catalano, Mariana; Heredia, Sergio Rodríguez; Mantero, Paula; Boccio, Jose R.; Zubillaga, Marcela B.; Cremaschi, Graciela A.; Solnick, Jay V.; Perez-Perez, Guillermo I.; Blaser, Martin J.

    2011-01-01

    The mammalian gastric and oral mucosa may be colonized by mixed Helicobacter and Campylobacter species, respectively, in individual animals. To better characterize the presence and distribution of Helicobacter and Campylobacter among marine mammals, we used PCR and 16S rDNA sequence analysis to examine gastric and oral samples from ten dolphins (Tursiops gephyreus), one killer whale (Orcinus orca), one false killer whale (Pseudorca crassidens), and three wild La Plata river dolphins (Pontoporia blainvillei). Helicobacter spp. DNA was widely distributed in gastric and oral samples from both captive and wild cetaceans. Phylogenetic analysis demonstrated two Helicobacter sequence clusters, one closely related to H. cetorum, a species isolated from dolphins and whales in North America. The second related cluster was to sequences obtained from dolphins in Australia and to gastric non-Helicobacter pylori helicobacters, and may represent a novel taxonomic group. Dental plaque sequences from four dolphins formed a third cluster within the Campylobacter genus that likely represents a novel species isolated from marine mammals. Identification of identical Helicobacter spp. DNA sequences from dental plaque, saliva and gastric fluids from the same hosts, suggests that the oral cavity may be involved in transmission. These results demonstrate that Helicobacter and Campylobacter species are commonly distributed in marine mammals, and identify taxonomic clusters that may represent novel species. PMID:21592686

  9. DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions.

    PubMed

    El-Mogharbel, Nisrine; Wakefield, Matthew; Deakin, Janine E; Tsend-Ayush, Enkhjargal; Grützner, Frank; Alsop, Amber; Ezaz, Tariq; Marshall Graves, Jennifer A

    2007-01-01

    We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.

  10. fluff: exploratory analysis and visualization of high-throughput sequencing data

    PubMed Central

    Georgiou, Georgios

    2016-01-01

    Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532

  11. Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes.

    PubMed

    Azevedo, Analice C; Bento, Cláudia B P; Ruiz, Jeronimo C; Queiroz, Marisa V; Mantovani, Hilário C

    2015-10-01

    Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  12. Assembly and features of secondary metabolite biosynthetic gene clusters in Streptomyces ansochromogenes.

    PubMed

    Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong

    2013-07-01

    A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.

  13. An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

    PubMed Central

    Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.

    2017-01-01

    Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422

  14. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

    PubMed

    Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2017-04-01

    Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  15. Genome Sequences of Three Cluster AU Arthrobacter Phages, Caterpillar, Nightmare, and Teacup

    PubMed Central

    Adair, Tamarah L.; Stowe, Emily; Pizzorno, Marie C.; Krukonis, Gregory; Harrison, Melinda; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah

    2017-01-01

    ABSTRACT Caterpillar, Nightmare, and Teacup are cluster AU siphoviral phages isolated from enriched soil on Arthrobacter sp. strain ATCC 21022. These genomes are 58 kbp long with an average G+C content of 50%. Sequence analysis predicts 86 to 92 protein-coding genes, including a large number of small proteins with predicted transmembrane domains. PMID:29122860

  16. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    PubMed

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  17. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    PubMed Central

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  18. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.

    PubMed

    Medema, Marnix H; Blin, Kai; Cimermancic, Peter; de Jager, Victor; Zakrzewski, Piotr; Fischbach, Michael A; Weber, Tilmann; Takano, Eriko; Breitling, Rainer

    2011-07-01

    Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.

  19. T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences

    PubMed Central

    Madi, Asaf; Poran, Asaf; Shifrut, Eric; Reich-Zeliger, Shlomit; Greenstein, Erez; Zaretsky, Irena; Arnon, Tomer; Laethem, Francois Van; Singer, Alfred; Lu, Jinghua; Sun, Peter D; Cohen, Irun R; Friedman, Nir

    2017-01-01

    Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity. DOI: http://dx.doi.org/10.7554/eLife.22057.001 PMID:28731407

  20. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    PubMed

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. A Cyber-Attack Detection Model Based on Multivariate Analyses

    NASA Astrophysics Data System (ADS)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  2. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species.

    PubMed

    Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q

    2015-07-01

    Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Population and genomic analysis of the genus Halorubrum

    PubMed Central

    Fullmer, Matthew S.; Soucy, Shannon M.; Swithers, Kristen S.; Makkay, Andrea M.; Wheeler, Ryan; Ventosa, Antonio; Gogarten, J. Peter; Papke, R. Thane

    2014-01-01

    The Halobacteria are known to engage in frequent gene transfer and homologous recombination. For stably diverged lineages to persist some checks on the rate of between lineage recombination must exist. We surveyed a group of isolates from the Aran-Bidgol endorheic lake in Iran and sequenced a selection of them. Multilocus Sequence Analysis (MLSA) and Average Nucleotide Identity (ANI) revealed multiple clusters (phylogroups) of organisms present in the lake. Patterns of intein and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) presence/absence and their sequence similarity, GC usage along with the ANI and the identities of the genes used in the MLSA revealed that two of these clusters share an exchange bias toward others in their phylogroup while showing reduced rates of exchange with other organisms in the environment. However, a third cluster, composed in part of named species from other areas of central Asia, displayed many indications of variability in exchange partners, from within the lake as well as outside the lake. We conclude that barriers to gene exchange exist between the two purely Aran-Bidgol phylogroups, and that the third cluster with members from other regions is not a single population and likely reflects an amalgamation of several populations. PMID:24782836

  4. CRAWview: for viewing splicing variation, gene families, and polymorphism in clusters of ESTs and full-length sequences.

    PubMed

    Chou, A; Burke, J

    1999-05-01

    DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :

  5. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.

  6. Characterization of a Major Cluster of nif, fix, and Associated Genes in a Sugarcane Endophyte, Acetobacter diazotrophicus

    PubMed Central

    Lee, Sunhee; Reth, Alexander; Meletzus, Dietmar; Sevilla, Myrna; Kennedy, Christina

    2000-01-01

    A major 30.5-kb cluster of nif and associated genes of Acetobacter diazotrophicus (syn. Gluconacetobacter diazotrophicus), a nitrogen-fixing endophyte of sugarcane, was sequenced and analyzed. This cluster represents the largest assembly of contiguous nif-fix and associated genes so far characterized in any diazotrophic bacterial species. Northern blots and promoter sequence analysis indicated that the genes are organized into eight transcriptional units. The overall arrangement of genes is most like that of the nif-fix cluster in Azospirillum brasilense, while the individual gene products are more similar to those in species of Rhizobiaceae or in Rhodobacter capsulatus. PMID:11092875

  7. Genomic analysis of coxsackieviruses A1, A19, A22, enteroviruses 113 and 104: viruses representing two clades with distinct tropism within enterovirus C

    PubMed Central

    Haq, Saddef; Sameroff, Stephen; Howie, Stephen R. C.; Lipkin, W. Ian

    2013-01-01

    Coxsackieviruses (CV) A1, CV-A19 and CV-A22 have historically comprised a distinct phylogenetic clade within Enterovirus (EV) C. Several novel serotypes that are genetically similar to these three viruses have been recently discovered and characterized. Here, we report the coding sequence analysis of two genotypes of a previously uncharacterized serotype EV-C113 from Bangladesh and demonstrate that it is most similar to CV-A22 and EV-C116 within the capsid region. We sequenced novel genotypes of CV-A1, CV-A19 and CV-A22 from Bangladesh and observed a high rate of recombination within this group. We also report genomic analysis of the rarely reported EV-C104 circulating in the Gambia in 2009. All available EV-C104 sequences displayed a high degree of similarity within the structural genes but formed two clusters within the non-structural genes. One cluster included the recently reported EV-C117, suggesting an ancestral recombination between these two serotypes. Phylogenetic analysis of all available complete genome sequences indicated the existence of two subgroups within this distinct Enterovirus C clade: one has been exclusively recovered from gastrointestinal samples, while the other cluster has been implicated in respiratory disease. PMID:23761409

  8. [Study of human immunodeficiency virus transmission chains in Andalusia: analysis from baseline antiretroviral resistance sequences].

    PubMed

    Pérez-Parra, Santiago; Chueca-Porcuna, Natalia; Álvarez-Estevez, Marta; Pasquau, Juan; Omar, Mohamed; Collado, Antonio; Vinuesa, David; Lozano, Ana Belen; García-García, Federico

    2015-11-01

    Protease and reverse transcriptase HIV-1 sequences provide useful information for patient clinical management, as well as information on resistance to antiretrovirals. The aim of this study is to evaluate transmission events, transmitted drug resistance, and to georeference subtypes among newly diagnosed patients referred to our center. A study was conducted on 693 patients diagnosed between 2005 and 2012 in Southern Spain. Protease and reverse transcriptase sequences were obtained for resistance to cART analysis with Trugene(®) HIV Genotyping Kit (Siemens, NAD). MEGA 5.2, Neighbor-Joining, ArcGIS and REGA were used for subsequent analysis. The results showed 298 patients clustered into 77 different transmission events. Most of the clusters were formed by pairs (n=49), of men having sex with men (n=26), Spanish (n=37), and below 45 years of age (73.5%). Urban areas from Granada, and the coastal areas of Almeria and Granada showed the greatest subtype heterogeneity. Five clusters were formed by more than 10 patients, and 15 clusters had transmitted drug resistance. The study data demonstrate how the phylogenetic characterization of transmission clusters is a powerful tool to monitor the spread of HIV, and may contribute to design correct preventive measures to minimize it. Copyright © 2015 Elsevier España, S.L.U. y Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.

  9. An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

    PubMed Central

    2013-01-01

    Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333

  10. Glycoprotein-G-gene-based molecular and phylogenetic analysis of rabies viruses associated with a large outbreak of bovine rabies in southern Brazil.

    PubMed

    Cargnelutti, Juliana F; de Quadros, João M; Martins, Mathias; Batista, Helena B C R; Weiblen, Rudi; Flores, Eduardo F

    2017-12-01

    A large outbreak of hematophagous-bat-associated bovine rabies has been occurring in Rio Grande do Sul (RS), the southernmost Brazilian state, since 2011, with official estimates exceeding 50,000 cattle deaths. The present article describes a genetic characterization of rabies virus (RABV) recovered from 59 affected cattle and two sheep, from 56 herds in 16 municipalities (2012-2016). Molecular analysis was performed using the nucleotide (nt) and predicted amino acid (aa) sequences of RABV glycoprotein G (G). A high level of nt and aa sequence identity was observed among the examined G sequences, ranging from 98.4 to 100%, and from 97.3 to 100%, respectively. Likewise, high levels of nt and aa sequence identity were observed with bovine (nt, 99.8%; aa, 99.8%) and hematophagous bat (nt, 99.5%; aa, 99.4%) RABV sequences from GenBank, and lower levels were observed with carnivore RABV sequences (nt, 92.8%; aa, 88.1%). Some random mutations were observed in the analyzed sequences, and a few consistent mutations were observed in some sequences belonging to cluster 2, subcluster 2b. The clustering of the sequences was observed in a phylogenetic tree, where two distinct clusters were evident. Cluster 1 comprised RABV sequences covering the entire study period (2012 to 2016), but subclusters corresponding to different years could be identified, indicating virus evolution and/or introduction of new viruses into the population. In some cases, viruses from the same location obtained within a short period grouped into different subclusters, suggesting co-circulation of viruses of different origins. Subcluster segregation was also observed in sequences obtained in the same region during different periods, indicating the involvement of different viruses in the cases at different times. In summary, our results indicate that the outbreaks occurring in RS (2012 to 2016) probably involved RABV of different origins, in addition to a possible evolution of RABV isolates within this period.

  11. Analysis of β-Subgroup Proteobacterial Ammonia Oxidizer Populations in Soil by Denaturing Gradient Gel Electrophoresis Analysis and Hierarchical Phylogenetic Probing

    PubMed Central

    Stephen, John R.; Kowalchuk, George A.; Bruns, Mary-Ann V.; McCaig, Allison E.; Phillips, Carol J.; Embley, T. Martin; Prosser, James I.

    1998-01-01

    A combination of denaturing gradient gel electrophoresis (DGGE) and oligonucleotide probing was used to investigate the influence of soil pH on the compositions of natural populations of autotrophic β-subgroup proteobacterial ammonia oxidizers. PCR primers specific to this group were used to amplify 16S ribosomal DNA (rDNA) from soils maintained for 36 years at a range of pH values, and PCR products were analyzed by DGGE. Genus- and cluster-specific probes were designed to bind to sequences within the region amplified by these primers. A sequence specific to all β-subgroup ammonia oxidizers could not be identified, but probes specific for Nitrosospira clusters 1 to 4 and Nitrosomonas clusters 6 and 7 (J. R. Stephen, A. E. McCaig, Z. Smith, J. I. Prosser, and T. M. Embley, Appl. Environ. Microbiol. 62:4147–4154, 1996) were designed. Elution profiles of probes against target sequences and closely related nontarget sequences indicated a requirement for high-stringency hybridization conditions to distinguish between different clusters. DGGE banding patterns suggested the presence of Nitrosomonas cluster 6a and Nitrosospira clusters 2, 3, and 4 in all soil plots, but results were ambiguous because of overlapping banding patterns. Unambiguous band identification of the same clusters was achieved by combined DGGE and probing of blots with the cluster-specific radiolabelled probes. The relative intensities of hybridization signals provided information on the apparent selection of different Nitrosospira genotypes in samples of soil of different pHs. The signal from the Nitrosospira cluster 3 probe decreased significantly, relative to an internal control probe, with decreasing soil pH in the range of 6.6 to 3.9, while Nitrosospira cluster 2 hybridization signals increased with increasing soil acidity. Signals from Nitrosospira cluster 4 were greatest at pH 5.5, decreasing at lower and higher values, while Nitrosomonas cluster 6a signals did not vary significantly with pH. These findings are in agreement with a previous molecular study (J. R. Stephen, A. E. McCaig, Z. Smith, J. I. Prosser, and T. M. Embley, Appl. Environ. Microbiol 62:4147–4154, 1996) of the same sites, which demonstrated the presence of the same four clusters of ammonia oxidizers and indicated that selection might be occurring for clusters 2 and 3 at acid and neutral pHs, respectively. The two studies used different sets of PCR primers for amplification of 16S rDNA sequences from soil, and the similar findings suggest that PCR bias was unlikely to be a significant factor. The present study demonstrates the value of DGGE and probing for rapid analysis of natural soil communities of β-subgroup proteobacterial ammonia oxidizers, indicates significant pH-associated differences in Nitrosospira populations, and suggests that Nitrosospira cluster 2 may be of significance for ammonia-oxidizing activity in acid soils. PMID:9687457

  12. Hepatitis a virus genotypes and strains from an endemic area of Europe, Bulgaria 2012-2014.

    PubMed

    Bruni, Roberto; Taffon, Stefania; Equestre, Michele; Cella, Eleonora; Lo Presti, Alessandra; Costantino, Angela; Chionne, Paola; Madonna, Elisabetta; Golkocheva-Markova, Elitsa; Bankova, Diljana; Ciccozzi, Massimo; Teoharov, Pavel; Ciccaglione, Anna Rita

    2017-07-14

    Hepatitis A virus (HAV) infection is endemic in Eastern European and Balkan region countries. In 2012, Bulgaria showed the highest rate (67.13 cases per 100,000) in Europe. Nevertheless, HAV genotypes and strains circulating in this country have never been described. The present study reports the molecular characterization of HAV from 105 patients from Bulgaria. Anti-HAV IgM positive serum samples collected in 2012-2014 from different towns and villages in Bulgaria were analysed by nested RT-PCR, sequencing of the VP1/2A region and phylogenetic analysis; the results were analysed together with patient and geographical data. Phylogenetic analysis revealed two main sequence groups corresponding to the IA (78/105, 74%) and IB (27/105, 26%) sub-genotypes. In the IA group, a major and a minor cluster were observed (62 and 16 sequences, respectively). Most sequences from the major cluster (44/62, 71%) belonged to either of two strains, termed "strain 1" and "strain 2", differing only for a single specific nucleotide; the remaining sequences (18/62, 29%) showed few (1 to 4) nucleotide variations respect to strain 1 and 2. Strain 2 is identical to the strain previously responsible for an outbreak in the Czech Republic in 2008 and a large multi-country European outbreak caused by contaminated mixed frozen berries in 2013. Most sequences of the IA minor cluster and the IB group were detected in large/medium centers (LMCs). Overall, sequences from the IA major cluster were more frequent in small centers (SCs), but strain 1 and strain 2 showed an opposite relative frequency in SCs and LMCs (strain 1 more frequent in SCs, strain 2 in LMCs). Genotype IA predominated in Bulgaria in 2012-2014 and phylogenetic analysis identified a major cluster of highly related or identical IA sequences, representing 59% of the analysed cases; these isolates were mostly detected in SCs, in which HAV shows higher endemicity than in LMCs. The distribution of viral sequences suggests the existence of some differences between the transmission routes in SCs and LMCs. Molecular characterization of an increased number of isolates from Bulgaria, regularly collected over time, will be useful to explore specific transmission routes and plan appropriate preventing measures.

  13. Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

    NASA Astrophysics Data System (ADS)

    Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

    2017-08-01

    Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.

  14. Characteristics of HIV-infected U.S. Army soldiers linked in molecular transmission clusters, 2001-2012

    PubMed Central

    Jagodzinski, Linda L.; Liu, Ying; Pham, Peter T.; Kijak, Gustavo H.; Tovanabutra, Sodsai; McCutchan, Francine E.; Scoville, Stephanie L.; Cersovsky, Steven B.; Michael, Nelson L.; Scott, Paul T.; Peel, Sheila A.

    2017-01-01

    Objective Recent surveillance data suggests the United States (U.S.) Army HIV epidemic is concentrated among men who have sex with men. To identify potential targets for HIV prevention strategies, the relationship between demographic and clinical factors and membership within transmission clusters based on baseline pol sequences of HIV-infected Soldiers from 2001 through 2012 were analyzed. Methods We conducted a retrospective analysis of baseline partial pol sequences, demographic and clinical characteristics available for all Soldiers in active service and newly-diagnosed with HIV-1 infection from January 1, 2001 through December 31, 2012. HIV-1 subtype designations and transmission clusters were identified from phylogenetic analysis of sequences. Univariate and multivariate logistic regression models were used to evaluate and adjust for the association between characteristics and cluster membership. Results Among 518 of 995 HIV-infected Soldiers with available partial pol sequences, 29% were members of a transmission cluster. Assignment to a southern U.S. region at diagnosis and year of diagnosis were independently associated with cluster membership after adjustment for other significant characteristics (p<0.10) of age, race, year of diagnosis, region of duty assignment, sexually transmitted infections, last negative HIV test, antiretroviral therapy, and transmitted drug resistance. Subtyping of the pol fragment indicated HIV-1 subtype B infection predominated (94%) among HIV-infected Soldiers. Conclusion These findings identify areas to explore as HIV prevention targets in the U.S. Army. An increased frequency of current force testing may be justified, especially among Soldiers assigned to duty in installations with high local HIV prevalence such as southern U.S. states. PMID:28759645

  15. Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.

    PubMed

    Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun

    2012-01-01

    The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.

  16. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  17. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection

    PubMed Central

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike

    2018-01-01

    ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396

  18. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    PubMed

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.

  19. The complete sequence of Cymbidium mosaic virus from Vanilla fragrans in Hainan, China.

    PubMed

    He, Zhen; Jiang, Dongmei; Liu, Aiqin; Sang, Liwei; Li, Wenfeng; Li, Shifang

    2011-06-01

    The complete nucleotide sequence of Cymbidium mosaic virus (CymMV) isolated from vanilla in Hainan province, China was determined for the first time. It comprised 6,224 nucleotides; sequence analysis suggested that the isolate we obtained was a member of the genus Potexvirus, and its sequence shared 86.67-96.61% identities with previously reported sequences. Phylogenetic analysis suggested that CymMV from vanilla fragrans was clustered into subgroup A and the isolates in this subgroup displayed little regional difference.

  20. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic catalog into background seismicity and individual sequences of earthquake clusters, also in areas characterized by moderate seismic activity, where the standard declustering techniques may turn out rather gross approximations. With these results acquired, the main statistical features of seismic clusters are explored, including complex interdependence of related events, with the aim to characterize the space-time patterns of earthquakes occurrence in North-Eastern Italy and capture their basic differences with Central Italy sequences.

  1. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

    PubMed

    Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.

  2. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets

    PubMed Central

    Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225

  3. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

    PubMed Central

    2009-01-01

    Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996

  4. Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

    PubMed

    Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

    2009-10-12

    Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.

  5. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom.

    PubMed

    Turton, Jane F; Wright, Laura; Underwood, Anthony; Witney, Adam A; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-08-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  6. Analysis of intra-host genetic diversity of Prunus necrotic ringspot virus (PNRSV) using amplicon next generation sequencing.

    PubMed

    Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan

    2017-01-01

    PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored.

  7. Identification and Functional Analysis of the Nocardithiocin Gene Cluster in Nocardia pseudobrasiliensis

    PubMed Central

    Sakai, Kanae; Komaki, Hisayuki; Gonoi, Tohru

    2015-01-01

    Nocardithiocin is a thiopeptide compound isolated from the opportunistic pathogen Nocardia pseudobrasiliensis. It shows a strong activity against acid-fast bacteria and is also active against rifampicin-resistant Mycobacterium tuberculosis. Here, we report the identification of the nocardithiocin gene cluster in N. pseudobrasiliensis IFM 0761 based on conserved thiopeptide biosynthesis gene sequence and the whole genome sequence. The predicted gene cluster was confirmed by gene disruption and complementation. As expected, strains containing the disrupted gene did not produce nocardithiocin while gene complementation restored nocardithiocin production in these strains. The predicted cluster was further analyzed using RNA-seq which showed that the nocardithiocin gene cluster contains 12 genes within a 15.2-kb region. This finding will promote the improvement of nocardithiocin productivity and its derivatives production. PMID:26588225

  8. Phylogenetic analysis of HIV-1 reverse transcriptase sequences from 382 patients recruited in JJ Hospital of Mumbai, India, between 2002 and 2008.

    PubMed

    Deshpande, Alaka; Jauvin, Valerie; Pinson, Patricia; Jeannot, Anne Cecile; Fleury, Herve J

    2009-06-01

    Analysis of reverse transcriptase (RT) sequences of 382 HIV-1 isolates from untreated and treated patients recruited in JJ Hospital (Mumbai, India) between 2002 and 2008 shows that subtype C is largely predominant (98%) and that non-C sequences cluster with A1, B, CRF01_AE, and CRF06_cpx.

  9. GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

    PubMed

    Schulz, Tizian; Stoye, Jens; Doerr, Daniel

    2018-05-08

    Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

  10. Constructing storyboards based on hierarchical clustering analysis

    NASA Astrophysics Data System (ADS)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  11. Phylogeny of Bacteroides, Prevotella, and Porphyromonas spp. and related bacteria.

    PubMed Central

    Paster, B J; Dewhirst, F E; Olsen, I; Fraser, G J

    1994-01-01

    The phylogenetic structure of the bacteroides subgroup of the cytophaga-flavobacter-bacteroides (CFB) phylum was examined by 16S rRNA sequence comparative analysis. Approximately 95% of the 16S rRNA sequence was determined for 36 representative strains of species of Prevotella, Bacteroides, and Porphyromonas and related species by a modified Sanger sequencing method. A phylogenetic tree was constructed from a corrected distance matrix by the neighbor-joining method, and the reliability of tree branching was established by bootstrap analysis. The bacteroides subgroup was divided primarily into three major phylogenetic clusters which contained most of the species examined. The first cluster, termed the prevotella cluster, was composed of 16 species of Prevotella, including P. melaninogenica, P. intermedia, P. nigrescens, and the ruminal species P. ruminicola. Two oral species, P. zoogleoformans and P. heparinolytica, which had been recently placed in the genus Prevotella, did not fall within the prevotella cluster. These two species and six species of Bacteroides, including the type species B. fragilis, formed the second cluster, termed the bacteroides cluster. The third cluster, termed the porphyromonas cluster, was divided into two subclusters. The first contained Porphyromonas gingivalis, P. endodontalis, P. asaccharolytica, P. circumdentaria, P. salivosa, [Bacteroides] levii (the brackets around genus are used to indicate that the species does not belong to the genus by the sensu stricto definition), and [Bacteroides] macacae, and the second subcluster contained [Bacteroides] forsythus and [Bacteroides] distasonis. [Bacteroides] splanchnicus fell just outside the three major clusters but still belonged within the bacteroides subgroup. With few exceptions, the 16 S rRNA data were in overall agreement with previously proposed reclassifications of species of Bacteroides, Prevotella, and Porphyromonas. Suggestions are made to accommodate those species which do not fit previous reclassification schemes. PMID:8300528

  12. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  13. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-04-28

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

  14. [Analysis of 4 clustered high risk acute flaccid paralysis cases in Shanxi Province in 2006].

    PubMed

    Yan, Dong-mei; Zhang, Yong; Wang, Dong-yan

    2010-04-01

    Analysis of epidemiology of 4 clustered high risk acute flaccid paralysis(AFP) cases reported by Shanxi province in 2006 and VP1 gene characteristic for type III poliovirus isolated from the four AFP cases. Virus isolation and identification were conducted according to the 4th edition of WHO polio laboratory manual. The sequence of VP1 region were amplified and sequenced. The phylogenetic trees based on VP1 region were constructed. Three of four high risk AFP cases were suspected as vaccine associated paralysis poliomyelitis (VAPP), the onset date of them were close. VP1 sequencing of the four type III isolates revealed that the identity were 99.7%, 99.9%, 99.4% and 99.9% respectively compared with vaccine reference strain-BJOPV3. According to WHO criteria, the four isolates were identified as type III vaccine-related poliovirus. Phylogenetic analysis based on VP1 coding sequence showed that the four type III poliovirus were not related significantly. The type III poliovirus isolated from 3 suspected VAPP cases shared one nucleotide mutation at 2637 (C-->U), which result in the amino acid mutation from Val into Ala. The improvement of laboratory surveillance for clustered high risk AFP cases should be strengthened so as to detect and prevent poliovirus circulation timely.

  15. Genetic diversity among eight Dendrolimus species in Eurasia (Lepidoptera: Lasiocampidae) inferred from mitochondrial COI and COII, and nuclear ITS2 markers.

    PubMed

    Kononov, Alexander; Ustyantsev, Kirill; Wang, Baode; Mastro, Victor C; Fet, Victor; Blinov, Alexander; Baranchikov, Yuri

    2016-12-22

    Moths of genus Dendrolimus (Lepidoptera: Lasiocampidae) are among the major pests of coniferous forests worldwide. Taxonomy and nomenclature of this genus are not entirely established, and there are many species with a controversial taxonomic position. We present a comparative evolutionary analysis of the most economically important Dendrolimus species in Eurasia. Our analysis was based on the nucleotide sequences of COI and COII mitochondrial genes and ITS2 spacer of nuclear ribosomal genes. All known sequences were extracted from GenBank. Additional 112 new sequences were identified for 28 specimens of D. sibiricus, D. pini, and D. superans from five regions of Siberia and the Russian Far East to be able to compare the disparate data from all previous studies. In total, 528 sequences were used in phylogenetic analysis. Two clusters of closely related species in Dendrolimus were found. The first cluster includes D. pini, D. sibiricus, and D. superans; and the second, D. spectabilis, D. punctatus, and D. tabulaeformis. Species D. houi and D. kikuchii appear to be the most basal in the genus. Genetic difference among the second cluster species is very low in contrast to the first cluster species. Phylogenetic position D. tabulaeformis as a subspecies was supported. It was found that D. sibiricus recently separated from D. superans. Integration of D. sibiricus mitochondrial DNA sequences and the spread of this species to the west of Eurasia have been established as the cause of the unjustified allocation of a new species: D. kilmez. Our study further clarifies taxonomic problems in the genus and gives more complete information on the genetic structure of D. pini, D. sibiricus, and D. superans.

  16. HTLV-1aA introduction into Brazil and its association with the trans-Atlantic slave trade.

    PubMed

    Amoussa, Adjile Edjide Roukiyath; Wilkinson, Eduan; Giovanetti, Marta; de Almeida Rego, Filipe Ferreira; Araujo, Thessika Hialla A; de Souza Gonçalves, Marilda; de Oliveira, Tulio; Alcantara, Luiz Carlos Junior

    2017-03-01

    Human T-lymphotropic virus (HTLV) is an endemic virus in some parts of the world, with Africa being home to most of the viral genetic diversity. In Brazil, HTLV-1 is endemic amongst Japanese and African immigrant populations. Multiple introductions of the virus in Brazil from other epidemic foci were hypothesized. The long terminal repeat (LTR) region of HTLV-1 was used to infer the origin of the virus in Brazil, using phylogenetic analysis. LTR sequences were obtained from the HTLV-1 database (http://htlv1db.bahia.fiocruz.br). Sequences were aligned and maximum-likelihood and Bayesian tree topologies were inferred. Brazilian specific clusters were identified and molecular-clock and coalescent models were used to estimate each cluster's time to the most recent common ancestor (tMRCA). Three Brazilian clusters were identified with a posterior probability ranged from 0.61 to 0.99. Molecular clock analysis of these three clusters dated back their respective tMRCAs between the year 1499 and the year 1668. Additional analysis also identified a close association between Brazilian sequences and new sequences from South Africa. Our results support the hypothesis of a multiple introductions of HTLV-1 into Brazil, with the majority of introductions occurring in the post-Colombian period. Our results further suggest that HTLV-1 introduction into Brazil was facilitated by the trans-Atlantic slave trade from endemic areas of Africa. The close association between southern African and Brazilian sequences also suggested that greater numbers of the southern African Bantu population might also have been part of the slave trade than previously thought. Copyright © 2016. Published by Elsevier B.V.

  17. Differences in community composition of bacteria in four glaciers in western China

    NASA Astrophysics Data System (ADS)

    An, L. Z.; Chen, Y.; Xiang, S.-R.; Shang, T.-C.; Tian, L.-D.

    2010-06-01

    Microbial community patterns vary in glaciers worldwide, presenting unique responses to global climatic and environmental changes. Four bacterial clone libraries were established by 16S rRNA gene amplification from four ice layers along the 42-m-long ice core MuztB drilled from the Muztag Ata Glacier. A total of 151 bacterial sequences obtained from the ice core MuztB were phylogenetically compared with the 71 previously reported sequences from three ice cores extracted from ice caps Malan, Dunde, and Puruogangri. Six phylogenetic clusters Flavisolibacter, Flexibacter (Bacteroidetes), Acinetobacter, Enterobacter (Gammaproteobacteria), Planococcus/Anoxybacillus (Firmicutes), and Propionibacter/Luteococcus (Actinobacteria) frequently occurred along the Muztag Ata Glacier profile, and their proportion varied by seasons. Sequence analysis showed that most of the sequences from the ice core clustered with those from cold environments, and the sequence clusters from the same glacier more closely grouped together than those from the geographically isolated glaciers. Moreover, bacterial communities from the same location or similarly aged ice formed a cluster, and were clearly separate from those from other geographically isolated glaciers. In summary, the findings provide preliminary evidence of zonal distribution of microbial community, and suggest biogeography of microorganisms in glacier ice.

  18. Differences in community composition of bacteria in four deep ice sheets in western China

    NASA Astrophysics Data System (ADS)

    An, L.; Chen, Y.; Xiang, S.-R.; Shang, T.-C.; Tian, L.-De

    2010-02-01

    Microbial community patterns vary in glaciers world wide, presenting unique responses to global climatic and environmental changes. Four bacterial clone libraries were established by 16S rRNA gene amplification from four ice layers along the 42-m-long ice core MuztB drilled from the Muztag Ata Glacier. A total of 152 bacterial sequences obtained from the ice core MuztB were phylogenetically compared with the 71 previously reported sequences from three ice cores extracted from ice caps Malan, Dunde, and Puruoganri. The six functional clusters Flavisolibacter, Flexibacter (Bacteroidetes), Acinetobacter, Enterobacter (Gammaproteobacteria), Planococcus/Anoxybacillus (Firmicutes), and Propionibacter/Luteococcus (Actinobacteria) frequently occurred along the Muztag Ata Glacier profile. Sequence analysis showed that most of the sequences from the ice core clustered with those from cold environments, and the sequences from the same glacier formed a distinct cluster. Moreover, bacterial communities from the same location or similarly aged ice formed a cluster, and were clearly separate from those from other geographically isolated glaciers. In a summary, the findings provide preliminary evidence of zone distribution of microbial community, support our hypothesis of the spatial and temporal biogeography of microorganisms in glacial ice.

  19. SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

    PubMed

    Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade

    2018-01-01

    Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.

  20. Phylodynamic and Phylogeographic Patterns of the HIV Type 1 Subtype F1 Parenteral Epidemic in Romania

    PubMed Central

    Hué, Stéphane; Buckton, Andrew J.; Myers, Richard E.; Duiculescu, Dan; Ene, Luminita; Oprea, Cristiana; Tardei, Gratiela; Rugina, Sorin; Mardarescu, Mariana; Floch, Corinne; Notheis, Gundula; Zöhrer, Bettina; Cane, Patricia A.; Pillay, Deenan

    2012-01-01

    Abstract In the late 1980s an HIV-1 epidemic emerged in Romania that was dominated by subtype F1. The main route of infection is believed to be parenteral transmission in children. We sequenced partial pol coding regions of 70 subtype F1 samples from children and adolescents from the PENTA-EPPICC network of which 67 were from Romania. Phylogenetic reconstruction using the sequences and other publically available global subtype F sequences showed that 79% of Romanian F1 sequences formed a statistically robust monophyletic cluster. The monophyletic cluster was epidemiologically linked to parenteral transmission in children. Coalescent-based analysis dated the origins of the parenteral epidemic to 1983 [1981–1987; 95% HPD]. The analysis also shows that the epidemic's effective population size has remained fairly constant since the early 1990s suggesting limited onward spread of the virus within the population. Furthermore, phylogeographic analysis suggests that the root location of the parenteral epidemic was Bucharest. PMID:22251065

  1. The application of cluster analysis in the intercomparison of loop structures in RNA.

    PubMed

    Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

    2005-04-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.

  2. The application of cluster analysis in the intercomparison of loop structures in RNA

    PubMed Central

    HUANG, HUNG-CHUNG; NAGASWAMY, UMA; FOX, GEORGE E.

    2005-01-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence. PMID:15769871

  3. International linkage of two food-borne hepatitis A clusters through traceback of mussels, the Netherlands, 2012.

    PubMed

    Boxman, Ingeborg L A; Verhoef, Linda; Vennema, Harry; Ngui, Siew-Lin; Friesema, Ingrid H M; Whiteside, Chris; Lees, David; Koopmans, Marion

    2016-01-01

    This report describes an outbreak investigation starting with two closely related suspected food-borne clusters of Dutch hepatitis A cases, nine primary cases in total, with an unknown source in the Netherlands. The hepatitis A virus (HAV) genotype IA sequences of both clusters were highly similar (459/460 nt) and were not reported earlier. Food questionnaires and a case-control study revealed an association with consumption of mussels. Analysis of mussel supply chains identified the most likely production area. International enquiries led to identification of a cluster of patients near this production area with identical HAV sequences with onsets predating the first Dutch cluster of cases. The most likely source for this cluster was a case who returned from an endemic area in Central America, and a subsequent household cluster from which treated domestic sewage was discharged into the suspected mussel production area. Notably, mussels from this area were also consumed by a separate case in the United Kingdom sharing an identical strain with the second Dutch cluster. In conclusion, a small number of patients in a non-endemic area led to geographically dispersed hepatitis A outbreaks with food as vehicle. This link would have gone unnoticed without sequence analyses and international collaboration.

  4. The nif Gene Operon of the Methanogenic Archaeon Methanococcus maripaludis

    PubMed Central

    Kessler, Peter S.; Blank, Carrine; Leigh, John A.

    1998-01-01

    Nitrogen fixation occurs in two domains, Archaea and Bacteria. We have characterized a nif (nitrogen fixation) gene cluster in the methanogenic archaeon Methanococcus maripaludis. Sequence analysis revealed eight genes, six with sequence similarity to known nif genes and two with sequence similarity to glnB. The gene order, nifH, ORF105 (similar to glnB), ORF121 (similar to glnB), nifD, nifK, nifE, nifN, and nifX, was the same as that found in part in other diazotrophic methanogens and except for the presence of the glnB-like genes, also resembled the order found in many members of the Bacteria. Using transposon insertion mutagenesis, we determined that an 8-kb region required for nitrogen fixation corresponded to the nif gene cluster. Northern analysis revealed the presence of either a single 7.6-kb nif mRNA transcript or 10 smaller mRNA species containing portions of the large transcript. Polar effects of transposon insertions demonstrated that all of these mRNAs arose from a single promoter region, where transcription initiated 80 bp 5′ to nifH. Distinctive features of the nif gene cluster include the presence of the six primary nif genes in a single operon, the placement of the two glnB-like genes within the cluster, the apparent physical separation of the cluster from any other nif genes that might be in the genome, the fragmentation pattern of the mRNA, and the regulation of expression by a repression mechanism described previously. Our study and others with methanogenic archaea reporting multiple mRNAs arising from gene clusters with only a single putative promoter sequence suggest that mRNA processing following transcription may be a common occurrence in methanogens. PMID:9515920

  5. Comprehensive identification and clustering of CLV3/ESR-related (CLE) genes in plants finds groups with potentially shared function.

    PubMed

    Goad, David M; Zhu, Chuanmei; Kellogg, Elizabeth A

    2017-10-01

    CLV3/ESR (CLE) proteins are important signaling peptides in plants. The short CLE peptide (12-13 amino acids) is cleaved from a larger pre-propeptide and functions as an extracellular ligand. The CLE family is large and has resisted attempts at classification because the CLE domain is too short for reliable phylogenetic analysis and the pre-propeptide is too variable. We used a model-based search for CLE domains from 57 plant genomes and used the entire pre-propeptide for comprehensive clustering analysis. In total, 1628 CLE genes were identified in land plants, with none recognizable from green algae. These CLEs form 12 groups within which CLE domains are largely conserved and pre-propeptides can be aligned. Most clusters contain sequences from monocots, eudicots and Amborella trichopoda, with sequences from Picea abies, Selaginella moellendorffii and Physcomitrella patens scattered in some clusters. We easily identified previously known clusters involved in vascular differentiation and nodulation. In addition, we found a number of discrete groups whose function remains poorly characterized. Available data indicate that CLE proteins within a cluster are likely to share function, whereas those from different clusters play at least partially different roles. Our analysis provides a foundation for future evolutionary and functional studies. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  6. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  7. Molecular comparison of the structural proteins encoding gene clusters of two related Lactobacillus delbrueckii bacteriophages.

    PubMed Central

    Vasala, A; Dupont, L; Baumann, M; Ritzenthaler, P; Alatossava, T

    1993-01-01

    Virulent phage LL-H and temperate phage mv4 are two related bacteriophages of Lactobacillus delbrueckii. The gene clusters encoding structural proteins of these two phages have been sequenced and further analyzed. Six open reading frames (ORF-1 to ORF-6) were detected. Protein sequencing and Western immunoblotting experiments confirmed that ORF-3 (g34) encoded the main capsid protein Gp34. The presence of a putative late promoter in front of the phage LL-H g34 gene was suggested by primer extension experiments. Comparative sequence analysis between phage LL-H and phage mv4 revealed striking similarities in the structure and organization of this gene cluster, suggesting that the genes encoding phage structural proteins belong to a highly conservative module. Images PMID:8497043

  8. Next-Generation Sequencing of Coccidioides immitis Isolated during Cluster Investigation

    PubMed Central

    Engelthaler, David M.; Chiller, Tom; Schupp, James A.; Colvin, Joshua; Beckstrom-Sternberg, Stephen M.; Driebe, Elizabeth M.; Moses, Tracy; Tembe, Waibhav; Sinari, Shripad; Beckstrom-Sternberg, James S.; Christoforides, Alexis; Pearson, John V.; Carpten, John; Keim, Paul; Peterson, Ashley; Terashita, Dawn

    2011-01-01

    Next-generation sequencing enables use of whole-genome sequence typing (WGST) as a viable and discriminatory tool for genotyping and molecular epidemiologic analysis. We used WGST to confirm the linkage of a cluster of Coccidioides immitis isolates from 3 patients who received organ transplants from a single donor who later had positive test results for coccidioidomycosis. Isolates from the 3 patients were nearly genetically identical (a total of 3 single-nucleotide polymorphisms identified among them), thereby demonstrating direct descent of the 3 isolates from an original isolate. We used WGST to demonstrate the genotypic relatedness of C. immitis isolates that were also epidemiologically linked. Thus, WGST offers unique benefits to public health for investigation of clusters considered to be linked to a single source. PMID:21291593

  9. A pyrosequencing assay for the quantitative methylation analysis of the PCDHB gene cluster, the major factor in neuroblastoma methylator phenotype.

    PubMed

    Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo

    2012-03-01

    Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.

  10. Cutaneous Granulomas in Dolphins Caused by Novel Uncultivated Paracoccidioides brasiliensis

    PubMed Central

    Vilela, Raquel; Bossart, Gregory D.; St. Leger, Judy A.; Dalton, Leslie M.; Reif, John S.; Schaefer, Adam M.; McCarthy, Peter J.; Fair, Patricia A.

    2016-01-01

    Cutaneous granulomas in dolphins were believed to be caused by Lacazia loboi, which also causes a similar disease in humans. This hypothesis was recently challenged by reports that fungal DNA sequences from dolphins grouped this pathogen with Paracoccidioides brasiliensis. We conducted phylogenetic analysis of fungi from 6 bottlenose dolphins (Tursiops truncatus) with cutaneous granulomas and chains of yeast cells in infected tissues. Kex gene sequences of P. brasiliensis from dolphins showed 100% homology with sequences from cultivated P. brasiliensis, 73% with those of L. loboi, and 93% with those of P. lutzii. Parsimony analysis placed DNA sequences from dolphins within a cluster with human P. brasiliensis strains. This cluster was the sister taxon to P. lutzii and L. loboi. Our molecular data support previous findings and suggest that a novel uncultivated strain of P. brasiliensis restricted to cutaneous lesions in dolphins is probably the cause of lacaziosis/lobomycosis, herein referred to as paracoccidioidomycosis ceti. PMID:27869614

  11. Cutaneous Granulomas in Dolphins Caused by Novel Uncultivated Paracoccidioides brasiliensis.

    PubMed

    Vilela, Raquel; Bossart, Gregory D; St Leger, Judy A; Dalton, Leslie M; Reif, John S; Schaefer, Adam M; McCarthy, Peter J; Fair, Patricia A; Mendoza, Leonel

    2016-12-01

    Cutaneous granulomas in dolphins were believed to be caused by Lacazia loboi, which also causes a similar disease in humans. This hypothesis was recently challenged by reports that fungal DNA sequences from dolphins grouped this pathogen with Paracoccidioides brasiliensis. We conducted phylogenetic analysis of fungi from 6 bottlenose dolphins (Tursiops truncatus) with cutaneous granulomas and chains of yeast cells in infected tissues. Kex gene sequences of P. brasiliensis from dolphins showed 100% homology with sequences from cultivated P. brasiliensis, 73% with those of L. loboi, and 93% with those of P. lutzii. Parsimony analysis placed DNA sequences from dolphins within a cluster with human P. brasiliensis strains. This cluster was the sister taxon to P. lutzii and L. loboi. Our molecular data support previous findings and suggest that a novel uncultivated strain of P. brasiliensis restricted to cutaneous lesions in dolphins is probably the cause of lacaziosis/lobomycosis, herein referred to as paracoccidioidomycosis ceti.

  12. A Phylogenetic Analysis of the Genus Fragaria (Strawberry) Using Intron-Containing Sequence from the ADH-1 Gene

    PubMed Central

    DiMeglio, Laura M.; Yu, Hongrun; Davis, Thomas M.

    2014-01-01

    The genus Fragaria encompasses species at ploidy levels ranging from diploid to decaploid. The cultivated strawberry, Fragaria×ananassa, and its two immediate progenitors, F. chiloensis and F. virginiana, are octoploids. To elucidate the ancestries of these octoploid species, we performed a phylogenetic analysis using intron-containing sequences of the nuclear ADH-1 gene from 39 germplasm accessions representing nineteen Fragaria species and one outgroup species, Dasiphora fruticosa. All trees from Maximum Parsimony and Maximum Likelihood analyses showed two major clades, Clade A and Clade B. Each of the sampled octoploids contributed alleles to both major clades. All octoploid-derived alleles in Clade A clustered with alleles of diploid F. vesca, with the exception of one octoploid allele that clustered with the alleles of diploid F. mandshurica. All octoploid-derived alleles in clade B clustered with the alleles of only one diploid species, F. iinumae. When gaps encoded as binary characters were included in the Maximum Parsimony analysis, tree resolution was improved with the addition of six nodes, and the bootstrap support was generally higher, rising above the 50% threshold for an additional nine branches. These results, coupled with the congruence of the sequence data and the coded gap data, validate and encourage the employment of sequence sets containing gaps for phylogenetic analysis. Our phylogenetic conclusions, based upon sequence data from the ADH-1 gene located on F. vesca linkage group II, complement and generally agree with those obtained from analyses of protein-encoding genes GBSSI-2 and DHAR located on F. vesca linkage groups V and VII, respectively, but differ from a previous study that utilized rDNA sequences and did not detect the ancestral role of F. iinumae. PMID:25078607

  13. Novel species including Mycobacterium fukienense sp. is found from tuberculosis patients in Fujian Province, China, using phylogenetic analysis of Mycobacterium chelonae/abscessus complex.

    PubMed

    Zhang, Yuan Yuan; Li, Yan Bing; Huang, Ming Xiang; Zhao, Xiu Qin; Zhang, Li Shui; Liu, Wen En; Wan, Kang Lin

    2013-11-01

    To identify the novel species 'Mycobacterium fukienense' sp. nov of Mycobacterium chelonae/abscessus complex from tuberculosis patients in Fujian Province, China. Five of 27 clinical Mycobacterium isolates (Cls) were previously identified as M. chelonae/abscessus complex by sequencing the hsp65, rpoB, 16S-23S rRNA internal transcribed spacer region (its), recA and sodA house-keeping genes commonly used to describe the molecular characteristics of Mycobacterium. Clinical Mycobacterium isolates were classified according to the gene sequence using a clustering analysis program. Sequence similarity within clusters and diversity between clusters were analyzed. The 5 isolates were identified with distinct sequences exhibiting 99.8% homology in the hsp65 gene. However, a complete lack of homology was observed among the sequences of the rpoB, 16S-23S rRNA internal transcribed spacer region (its), sodA, and recA genes as compared with the M. abscessus. Furthermore, no match for rpoB, sodA, and recA genes was identified among the published sequences. The novel species, Mycobacterium fukienense, is identified from tuberculosis patients in Fujian Province, China, which does not belong to any existing subspecies of M. chelonea/abscessus complex. Copyright © 2013 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  14. Genetic structure of Cantharellus formosus populations in a second-growth temperate rain forest of the Pacific Northwest

    USGS Publications Warehouse

    Redman, Regina S.; Ranson, Judith; Rodriguez, Rusty J.

    2006-01-01

    Cantharellus formosus growing on the Olympic Peninsula of the Pacific Northwest was sampled from September – November 1995 for genetic analysis. A total of ninety-six basidiomes from five clusters separated from one another by 3 - 25 meters were genetically characterized by PCR analysis of 13 arbitrary loci and rDNA sequences. The number of basidiomes in each cluster varied from 15 to 25 and genetic analysis delineated 15 genets among the clusters. Analysis of variance utilizing thirteen apPCR generated genetic molecular markers and PCR amplification of the ribosomal ITS regions indicated that 81.41% of the genetic variation occurred between clusters and 18.59% within clusters. Proximity of the basidiomes within a cluster was not an indicator of genotypic similarity. The molecular profiles of each cluster were distinct and defined as unique populations containing 2 - 6 genets. The monitoring and analysis of this species through non-lethal sampling and future applications is discussed.

  15. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

    PubMed

    Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

    2012-01-01

    Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

  16. Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

    PubMed

    Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

    1998-08-01

    The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.

  17. Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

    NASA Astrophysics Data System (ADS)

    Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

  18. Phylogenetic Distribution of the Capsid Assembly Protein Gene (g20) of Cyanophages in Paddy Floodwaters in Northeast China

    PubMed Central

    Jing, Ruiyong; Liu, Junjie; Yu, Zhenhua; Liu, Xiaobing; Wang, Guanghua

    2014-01-01

    Numerous studies have revealed the high diversity of cyanophages in marine and freshwater environments, but little is currently known about the diversity of cyanophages in paddy fields, particularly in Northeast (NE) China. To elucidate the genetic diversity of cyanophages in paddy floodwaters in NE China, viral capsid assembly protein gene (g20) sequences from five floodwater samples were amplified with the primers CPS1 and CPS8. Denaturing gradient gel electrophoresis (DGGE) was applied to distinguish different g20 clones. In total, 54 clones differing in g20 nucleotide sequences were obtained in this study. Phylogenetic analysis showed that the distribution of g20 sequences in this study was different from that in Japanese paddy fields, and all the sequences were grouped into Clusters α, β, γ and ε. Within Clusters α and β, three new small clusters (PFW-VII∼-IX) were identified. UniFrac analysis of g20 clone assemblages demonstrated that the community compositions of cyanophage varied among marine, lake and paddy field environments. In paddy floodwater, community compositions of cyanophage were also different between NE China and Japan. PMID:24533125

  19. TCW: Transcriptome Computational Workbench

    PubMed Central

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.

    2013-01-01

    Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959

  20. TCW: transcriptome computational workbench.

    PubMed

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

    2013-01-01

    The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.

  1. Phytoplasma phylogenetics based on analysis of secA and 23S rRNA gene sequences for improved resolution of candidate species of 'Candidatus Phytoplasma'.

    PubMed

    Hodgetts, Jennifer; Boonham, Neil; Mumford, Rick; Harrison, Nigel; Dickinson, Matthew

    2008-08-01

    Phytoplasma phylogenetics has focused primarily on sequences of the non-coding 16S rRNA gene and the 16S-23S rRNA intergenic spacer region (16-23S ISR), and primers that enable amplification of these regions from all phytoplasmas by PCR are well established. In this study, primers based on the secA gene have been developed into a semi-nested PCR assay that results in a sequence of the expected size (about 480 bp) from all 34 phytoplasmas examined, including strains representative of 12 16Sr groups. Phylogenetic analysis of secA gene sequences showed similar clustering of phytoplasmas when compared with clusters resolved by similar sequence analyses of a 16-23S ISR-23S rRNA gene contig or of the 16S rRNA gene alone. The main differences between trees were in the branch lengths, which were elongated in the 16-23S ISR-23S rRNA gene tree when compared with the 16S rRNA gene tree and elongated still further in the secA gene tree, despite this being a shorter sequence. The improved resolution in the secA gene-derived phylogenetic tree resulted in the 16SrII group splitting into two distinct clusters, while phytoplasmas associated with coconut lethal yellowing-type diseases split into three distinct groups, thereby supporting past proposals that they represent different candidate species within 'Candidatus Phytoplasma'. The ability to differentiate 16Sr groups and subgroups by virtual RFLP analysis of secA gene sequences suggests that this gene may provide an informative alternative molecular marker for pathogen identification and diagnosis of phytoplasma diseases.

  2. `Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny

    NASA Astrophysics Data System (ADS)

    Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila

    2010-10-01

    Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.

  3. Functional organization of a single nif cluster in the mesophilic archaeon Methanosarcina mazei strain Gö1

    PubMed Central

    Ehlers, Claudia; Veit, Katharina; Gottschalk, Gerhard; Schmitz, Ruth A.

    2002-01-01

    The mesophilic methanogenic archaeon Methanosarcina mazei strain Gö1 is able to utilize molecular nitrogen (N2) as its sole nitrogen source. We have identified and characterized a single nitrogen fixation (nif) gene cluster in M. mazei Gö1 with an approximate length of 9 kbp. Sequence analysis revealed seven genes with sequence similarities to nifH, nifI1, nifI2, nifD, nifK, nifE and nifN, similar to other diazotrophic methanogens and certain bacteria such as Clostridium acetobutylicum, with the two glnB-like genes (nifI1 and nifI2) located between nifH and nifD. Phylogenetic analysis of deduced amino acid sequences for the nitrogenase structural genes of M. mazei Gö1 showed that they are most closely related to Methanosarcina barkeri nif2 genes, and also closely resemble those for the corresponding nif products of the gram-positive bacterium C. acetobutylicum. Northern blot analysis and reverse transcription PCR analysis demonstrated that the M. mazei nif genes constitute an operon transcribed only under nitrogen starvation as a single 8 kb transcript. Sequence analysis revealed a palindromic sequence at the transcriptional start site in front of the M. mazei nifH gene, which may have a function in transcriptional regulation of the nif operon. PMID:15803652

  4. Expressed sequence tag analysis of guinea pig (Cavia porcellus) eye tissues for NEIBank

    PubMed Central

    Simpanya, Mukoma F.; Wistow, Graeme; Gao, James; David, Larry L.; Giblin, Frank J.

    2008-01-01

    Purpose To characterize gene expression patterns in guinea pig ocular tissues and identify orthologs of human genes from NEIBank expressed sequence tags. Methods RNA was extracted from dissected eye tissues of 2.5-month-old guinea pigs to make three unamplified and unnormalized cDNA libraries in the pCMVSport-6 vector for the lens, retina, and eye minus lens and retina. Over 4,000 clones were sequenced from each library and were analyzed using GRIST for clustering and gene identification. Lens crystallin EST data were validated using two-dimensional electrophoresis (2-DE), matrix assisted laser desorption (MALDI), and electrospray ionization mass spectrometry (ESIMS). Results Combined data from the three libraries generated a total of 6,694 distinctive gene clusters, with each library having between 1,000 and 3,000 clusters. Approximately 60% of the total gene clusters were novel cDNA sequences and had significant homologies to other mammalian sequences in GenBank. Complete cDNA sequences were obtained for many guinea pig lens proteins, including αA/αAinsert-, γN-, and γS-crystallins, lengsin and GRIFIN. The ratio of αA- to αB-crystallin on 2-DE gels was 8: 1 in the lens nucleus and 6.5: 1 in the cortex. Analysis of ESTs, genome sequence, and proteins (by MALDI), did not reveal any evidence for the presence of γD-, γE-, and γF-crystallin in the guinea pig. Predicted masses of many guinea pig lens crystallins were confirmed by ESIMS analysis. For the retina, orthologs of human phototransduction genes were found, such as Rhodopsin, S-antigen (Sag, Arrestin), and Transducin. The guinea-pig ortholog of NRL, a key rod photoreceptor-specific transcription factor, was also represented in EST data. In the ‘rest-of-eye’ library, the most abundant transcripts included decorin and keratin 12, representative of the cornea. Conclusions Genomic analysis of guinea pig eye tissues provides sequence-verified clones for future studies. Guinea pig orthologs of many human eye specific genes were identified. Guinea pig gene structures were similar to their human and rodent gene counterparts. Surprisingly, no orthologs of γD-, γE-, and γF-crystallin were found in EST, proteomic, or the current guinea pig genome data. PMID:19104676

  5. Comparative analysis of prophages in Streptococcus mutans genomes

    PubMed Central

    Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

    2017-01-01

    Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986

  6. Identification of a current hot spot of HIV type 1 transmission in Mongolia by molecular epidemiological analysis.

    PubMed

    Davaalkham, Jagdagsuren; Unenchimeg, Puntsag; Baigalmaa, Chultem; Erdenetuya, Gombo; Nyamkhuu, Dulmaa; Shiino, Teiichiro; Tsuchiya, Kiyoto; Hayashida, Tsunefusa; Gatanaga, Hiroyuki; Oka, Shinichi

    2011-10-01

    We investigated the current molecular epidemiological status of HIV-1 in Mongolia, a country with very low incidence of HIV-1 though with rapid expansion in recent years. HIV-1 pol (1065 nt) and env (447 nt) genes were sequenced to construct phylogenetic trees. The evolutionary rates, molecular clock phylogenies, and other evolutionary parameters were estimated from heterochronous genomic sequences of HIV-1 subtype B by the Bayesian Markov chain Monte Carlo method. We obtained 41 sera from 56 reported HIV-1-positive cases as of May 2009. The main route of infection was men who have sex with men (MSM). Dominant subtypes were subtype B in 32 cases (78%) followed by subtype CRF02_AG (9.8%). The phylogenetic analysis of the pol gene identified two clusters in subtype B sequences. Cluster 1 consisted of 21 cases including MSM and other routes of infection, and cluster 2 consisted of eight MSM cases. The tree analyses demonstrated very short branch lengths in cluster 1, suggesting a surprisingly active expansion of HIV-1 transmission during a short period with the same ancestor virus. Evolutionary analysis indicated that the outbreak started around the early 2000s. This study identified a current hot spot of HIV-1 transmission and potential seed of the epidemic in Mongolia. Comprehensive preventive measures targeting this group are urgently needed.

  7. Differentiation of Trichophyton rubrum clinical isolates from Japanese and Chinese patients by randomly amplified polymorphic DNA and DNA sequence analysis of the non-transcribed spacer region of the rRNA gene.

    PubMed

    Yang, Xiumin; Sugita, Takashi; Takashima, Masako; Hiruma, Masataro; Li, Ruoyu; Sudo, Hajime; Ogawa, Hideoki; Ikeda, Shigaku

    2009-04-01

    Trichophyton rubrum is the most common pathogen causing dermatophytosis worldwide. Recent genetic investigations showed that the microorganism originated in Africa and then spread to Europe and North America via Asia. We investigated the intraspecific diversity of T. rubrum isolated from two closely located Asian countries, Japan and China. A total of 150 clinical isolates of T. rubrum obtained from Japanese and Chinese patients were analyzed by randomly amplified polymorphic DNA (RAPD) and DNA sequence analysis of the non-transcribed spacer (NTS) region in the rRNA gene. RAPD analysis divided the 150 strains into two major clusters, A and B. Of the Japanese isolates, 30% belonged to cluster A and 70% belonged to cluster B, whereas 91% of the Chinese isolates were in cluster A. The NTS region of the rRNA gene was divided into four major groups (I-IV) based on DNA sequencing. The majority of Japanese isolates were type IV (51%), and the majority of Chinese isolates were type III (75%). These results suggest that although Japan and China are neighboring countries, the origins of T. rubrum isolates from these countries may not be identical. These findings provide information useful for tracing the global transmission routes of T. rubrum.

  8. Recent increased identification and transmission of HIV-1 unique recombinant forms in Sweden.

    PubMed

    Neogi, Ujjwal; Siddik, Abu Bakar; Kalaghatgi, Prabhav; Gisslén, Magnus; Bratt, Göran; Marrone, Gaetano; Sönnerborg, Anders

    2017-07-25

    A temporal increase in non-B subtypes has earlier been described in Sweden by us and we hypothesized that this increased viral heterogeneity may become a hotspot for the development of more complex and unique recombinant forms (URFs) if the epidemics converge. In the present study, we performed subtyping using four automated tools and phylogenetic analysis by RAxML of pol gene sequences (n = 5246) and HIV-1 near full-length genome (HIV-NFLG) sequences (n = 104). A CD4 + T-cell decline trajectory algorithm was used to estimate time of HIV infection. Transmission clusters were identified using the family-joining method. The analysis of HIV-NFLG and pol gene described 10.6% (11/104) and 2.6% (137/5246) of the strains as URFs, respectively. An increasing trend of URFs was observed in recent years by both approaches (p = 0·0082; p < 0·0001). Transmission cluster analysis using the pol gene of all URFs identified 14 clusters with two to eight sequences. Larger transmission clusters of URFs (BF1 and 01B) were observed among MSM who mostly were sero-diagnosed in recent time. Understanding the increased appearance and transmission of URFs in recent years could have importance for public health interventions and the use of HIV-NFLG would provide better statistical support for such assessments.

  9. Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis

    USDA-ARS?s Scientific Manuscript database

    In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T formed a cluster with 5 other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these ot...

  10. Seismic clusters analysis in Northeastern Italy by the nearest-neighbor approach

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2018-01-01

    The main features of earthquake clusters in Northeastern Italy are explored, with the aim to get new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters, which are identified by a statistical method, based on nearest-neighbor distances of events in the space-time-energy domain. The method permits us to highlight and investigate the internal structure of earthquake sequences, and to differentiate the spatial properties of seismicity according to the different topological features of the clusters structure. To analyze seismicity of Northeastern Italy, we use information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. A preliminary reappraisal of the earthquake bulletins is carried out and the area of sufficient completeness is outlined. Various techniques are considered to estimate the scaling parameters that characterize earthquakes occurrence in the region, namely the b-value and the fractal dimension of epicenters distribution, required for the application of the nearest-neighbor technique. Specifically, average robust estimates of the parameters of the Unified Scaling Law for Earthquakes, USLE, are assessed for the whole outlined region and are used to compute the nearest-neighbor distances. Clusters identification by the nearest-neighbor method turn out quite reliable and robust with respect to the minimum magnitude cutoff of the input catalog; the identified clusters are well consistent with those obtained from manual aftershocks identification of selected sequences. We demonstrate that the earthquake clusters have distinct preferred geographic locations, and we identify two areas that differ substantially in the examined clustering properties. Specifically, burst-like sequences are associated with the north-western part and swarm-like sequences with the south-eastern part of the study region. The territorial heterogeneity of earthquakes clustering is in good agreement with spatial variability of scaling parameters identified by the USLE. In particular, the fractal dimension is higher to the west (about 1.2-1.4), suggesting a spatially more distributed seismicity, compared to the eastern parte of the investigated territory, where fractal dimension is very low (about 0.8-1.0).

  11. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  12. Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.

    PubMed

    Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju

    2015-01-01

    Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.

  13. A Nomadic Subtelomeric Disease Resistance Gene Cluster in Common Bean1[W

    PubMed Central

    David, Perrine; Chen, Nicolas W.G.; Pedrosa-Harand, Andrea; Thareau, Vincent; Sévignac, Mireille; Cannon, Steven B.; Debouck, Daniel; Langin, Thierry; Geffroy, Valérie

    2009-01-01

    The B4 resistance (R) gene cluster is one of the largest clusters known in common bean (Phaseolus vulgaris [Pv]). It is located in a peculiar genomic environment in the subtelomeric region of the short arm of chromosome 4, adjacent to two heterochromatic blocks (knobs). We sequenced 650 kb spanning this locus and annotated 97 genes, 26 of which correspond to Coiled-Coil-Nucleotide-Binding-Site-Leucine-Rich-Repeat (CNL). Conserved microsynteny was observed between the Pv B4 locus and corresponding regions of Medicago truncatula and Lotus japonicus in chromosomes Mt6 and Lj2, respectively. The notable exception was the CNL sequences, which were completely absent in these regions. The origin of the Pv B4-CNL sequences was investigated through phylogenetic analysis, which reveals that, in the Pv genome, paralogous CNL genes are shared among nonhomologous chromosomes (4 and 11). Together, our results suggest that Pv B4-CNL was derived from CNL sequences from another cluster, the Co-2 cluster, through an ectopic recombination event. Integration of the soybean (Glycine max) genome data enables us to date more precisely this event and also to infer that a single CNL moved from the Co-2 to the B4 cluster. Moreover, we identified a new 528-bp satellite repeat, referred to as khipu, specific to the Phaseolus genus, present both between B4-CNL sequences and in the two knobs identified at the B4 R gene cluster. The khipu repeat is present on most chromosomal termini, indicating the existence of frequent ectopic recombination events in Pv subtelomeric regions. Our results highlight the importance of ectopic recombination in R gene evolution. PMID:19776165

  14. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  15. Full genome analysis of bovine astrovirus from fecal samples of cattle in Japan: identification of possible interspecies transmission of bovine astrovirus.

    PubMed

    Nagai, Makoto; Omatsu, Tsutomu; Aoki, Hiroshi; Otomaru, Konosuke; Uto, Takehiko; Koizumi, Motoya; Minami-Fukuda, Fujiko; Takai, Hikaru; Murakami, Toshiaki; Masuda, Tsuneyuki; Yamasato, Hiroshi; Shiokawa, Mai; Tsuchiaka, Shinobu; Naoi, Yuki; Sano, Kaori; Okazaki, Sachiko; Katayama, Yukie; Oba, Mami; Furuya, Tetsuya; Shirai, Junsuke; Mizutani, Tetsuya

    2015-10-01

    A viral metagenomics approach was used to investigate fecal samples of Japanese calves with and without diarrhea. Of the different viral pathogens detected, read counts gave nearly complete astrovirus-related RNA sequences in 15 of the 146 fecal samples collected in three distinct areas (Hokkaido, Ishikawa, and Kagoshima Prefectures) between 2009 and 2015. Due to the lack of genetic information about bovine astroviruses (BoAstVs) in Japan, these sequences were analyzed in this study. Nine of the 15 Japanese BoAstVs were closely related to Chinese BoAstVs and clustered into a lineage (tentatively named lineage 1) in all phylogenetic trees. Three of 15 strains were phylogenetically separate from lineage 1, showing low sequence identities, and clustered instead with an American strain isolated from cattle with respiratory disease (tentatively named lineage 2). Interestingly, two of 15 strains clustered with lineage 1 in the open reading frame (ORF)1a and ORF1b regions, while they clustered with lineage 2 in the ORF2 region. Remarkably, one of 15 strains exhibited low amino acid sequence similarity to other BoAstVs and was clustered separately with porcine astrovirus type 5 in all trees, and ovine astrovirus in the ORF2 region, suggesting past interspecies transmission.

  16. Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

    PubMed

    Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

    1993-01-01

    Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.

  17. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  18. Genetic diversity analysis of Gossypium arboreum germplasm accessions using genotyping-by-sequencing.

    PubMed

    Li, Ruijuan; Erpelding, John E

    2016-10-01

    The diploid cotton species Gossypium arboreum possesses many favorable agronomic traits such as drought tolerance and disease resistance, which can be utilized in the development of improved upland cotton cultivars. The USDA National Plant Germplasm System maintains more than 1600 G. arboreum accessions. Little information is available on the genetic diversity of the collection thereby limiting the utilization of this cotton species. The genetic diversity and population structure of the G. arboreum germplasm collection were assessed by genotyping-by-sequencing of 375 accessions. Using genome-wide single nucleotide polymorphism sequence data, two major clusters were inferred with 302 accessions in Cluster 1, 64 accessions in Cluster 2, and nine accessions unassigned due to their nearly equal membership to each cluster. These two clusters were further evaluated independently resulting in the identification of two sub-clusters for the 302 Cluster 1 accessions and three sub-clusters for the 64 Cluster 2 accessions. Low to moderate genetic diversity between clusters and sub-clusters were observed indicating a narrow genetic base. Cluster 2 accessions were more genetically diverse and the majority of the accessions in this cluster were landraces. In contrast, Cluster 1 is composed of varieties or breeding lines more recently added to the collection. The majority of the accessions had kinship values ranging from 0.6 to 0.8. Eight pairs of accessions were identified as potential redundancies due to their high kinship relatedness. The genetic diversity and genotype data from this study are essential to enhance germplasm utilization to identify genetically diverse accessions for the detection of quantitative trait loci associated with important traits that would benefit upland cotton improvement.

  19. Non-redundant patent sequence databases with value-added annotations at two levels

    PubMed Central

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. PMID:19884134

  20. Non-redundant patent sequence databases with value-added annotations at two levels.

    PubMed

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.

  1. HIV transmission patterns among The Netherlands, Suriname, and The Netherlands Antilles: a molecular epidemiological study.

    PubMed

    Kramer, Merlijn A; Cornelissen, Marion; Paraskevis, Dimitrios; Prins, Maria; Coutinho, Roel A; van Sighem, Ard I; Sabajo, Lesley; Duits, Ashley J; Winkel, Cai N; Prins, Jan M; van der Ende, Marchina E; Kauffmann, Robert H; Op de Coul, Eline L

    2011-02-01

    We aimed to study patterns of HIV transmission among Suriname, The Netherlands Antilles, and The Netherlands. Fragments of env, gag, and pol genes of 55 HIV-infected Surinamese, Antillean, and Dutch heterosexuals living in The Netherlands and 72 HIV-infected heterosexuals living in Suriname and the Antilles were amplified and sequenced. We included 145 pol sequences of HIV-infected Surinamese, Antillean, and Dutch heterosexuals living in The Netherlands from an observational cohort. All sequences were phylogenetically analyzed by neighbor-joining. Additionally, HIV-1 mobility among ethnic groups was estimated. A phylogenetic tree of all pol sequences showed two Surinamese and three Antillean clusters of related strains, but no clustering between ethnic groups. Clusters included sequences of individuals living in Suriname and the Antilles as well as those who have migrated to The Netherlands. Similar clustering patterns were observed in env and gag. Analysis of HIV mobility among ethnic groups showed significantly lower migration between groups than expected under the hypothesis of panmixis, apart from higher HIV migration between Antilleans in The Netherlands and all other groups. Our study shows that HIV transmission mainly occurs within the ethnic group. This suggests that cultural factors could have a larger impact on HIV mobility than geographic distance.

  2. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    PubMed Central

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  3. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    PubMed

    Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

    2013-01-01

    Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  4. The morphological transformation of red sequence galaxies in clusters since z ˜ 1

    NASA Astrophysics Data System (ADS)

    Cerulo, P.; Couch, W. J.; Lidman, C.; Demarco, R.; Huertas-Company, M.; Mei, S.; Sánchez-Janssen, R.; Barrientos, L. F.; Muñoz, R.

    2017-11-01

    The study of galaxy morphology is fundamental to understand the physical processes driving the structural evolution of galaxies. It has long been known that dense environments host high fractions of early-type galaxies and low fractions of late-type galaxies, indicating that the environment affects the structural evolution of galaxies. In this paper, we present an analysis of the morphological composition of red sequence galaxies in a sample of nine galaxy clusters at 0.8 < z < 1.5 drawn from the HAWK-I Cluster Survey (HCS), with the aim of investigating the evolutionary paths of galaxies with different morphologies. We classify galaxies according to their apparent bulge-to-total light ratio and compare with red sequence galaxies from the lower redshift WIde-field Nearby Galaxy-cluster Survey (WINGS) and ESO Distant Cluster Survey (EDisCS). We find that, while the HCS red sequence is dominated by elliptical galaxies at all luminosities and stellar masses, the WINGS red sequence is dominated by elliptical galaxies only at its bright end (MV < -21.0 mag), while S0s become the most frequent class at fainter luminosities. Disc-dominated galaxies comprise 10-14 per cent of the red sequence population in the low (WINGS) and high (HCS) redshift samples, although their fraction increases up to 40 per cent at 0.4 < z < 0.8 (EDisCS). We find a 20 per cent increase in the fraction of S0 galaxies from z ∼ 1.5 to 0.05 on the red sequence. These results suggest that elliptical and S0 galaxies follow different evolutionary histories and, in particular, that S0 galaxies result, at least at intermediate luminosities (-22.0 < MV < -20.0), from the morphological transformation of quiescent spiral galaxies.

  5. Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

    NASA Astrophysics Data System (ADS)

    Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.

  6. High-accuracy identification of incident HIV-1 infections using a sequence clustering based diversity measure.

    PubMed

    Xia, Xia-Yu; Ge, Meng; Hsi, Jenny H; He, Xiang; Ruan, Yu-Hua; Wang, Zhi-Xin; Shao, Yi-Ming; Pan, Xian-Ming

    2014-01-01

    Accurate estimates of HIV-1 incidence are essential for monitoring epidemic trends and evaluating intervention efforts. However, the long asymptomatic stage of HIV-1 infection makes it difficult to effectively distinguish incident infections from chronic ones. Current incidence assays based on serology or viral sequence diversity are both still lacking in accuracy. In the present work, a sequence clustering based diversity (SCBD) assay was devised by utilizing the fact that viral sequences derived from each transmitted/founder (T/F) strain tend to cluster together at early stage, and that only the intra-cluster diversity is correlated with the time since HIV-1 infection. The dot-matrix pairwise alignment was used to eliminate the disproportional impact of insertion/deletions (indels) and recombination events, and so was the proportion of clusterable sequences (Pc) as an index to identify late chronic infections with declined viral genetic diversity. Tested on a dataset containing 398 incident and 163 chronic infection cases collected from the Los Alamos HIV database (last modified 2/8/2012), our SCBD method achieved 99.5% sensitivity and 98.8% specificity, with an overall accuracy of 99.3%. Further analysis and evaluation also suggested its performance was not affected by host factors such as the viral subtypes and transmission routes. The SCBD method demonstrated the potential of sequencing based techniques to become useful for identifying incident infections. Its use may be most advantageous for settings with low to moderate incidence relative to available resources. The online service is available at http://www.bioinfo.tsinghua.edu.cn:8080/SCBD/index.jsp.

  7. Wide distribution of O157-antigen biosynthesis gene clusters in Escherichia coli.

    PubMed

    Iguchi, Atsushi; Shirai, Hiroki; Seto, Kazuko; Ooka, Tadasuke; Ogura, Yoshitoshi; Hayashi, Tetsuya; Osawa, Kayo; Osawa, Ro

    2011-01-01

    Most Escherichia coli O157-serogroup strains are classified as enterohemorrhagic E. coli (EHEC), which is known as an important food-borne pathogen for humans. They usually produce Shiga toxin (Stx) 1 and/or Stx2, and express H7-flagella antigen (or nonmotile). However, O157 strains that do not produce Stxs and express H antigens different from H7 are sometimes isolated from clinical and other sources. Multilocus sequence analysis revealed that these 21 O157:non-H7 strains tested in this study belong to multiple evolutionary lineages different from that of EHEC O157:H7 strains, suggesting a wide distribution of the gene set encoding the O157-antigen biosynthesis in multiple lineages. To gain insight into the gene organization and the sequence similarity of the O157-antigen biosynthesis gene clusters, we conducted genomic comparisons of the chromosomal regions (about 59 kb in each strain) covering the O-antigen gene cluster and its flanking regions between six O157:H7/non-H7 strains. Gene organization of the O157-antigen gene cluster was identical among O157:H7/non-H7 strains, but was divided into two distinct types at the nucleotide sequence level. Interestingly, distribution of the two types did not clearly follow the evolutionary lineages of the strains, suggesting that horizontal gene transfer of both types of O157-antigen gene clusters has occurred independently among E. coli strains. Additionally, detailed sequence comparison revealed that some positions of the repetitive extragenic palindromic (REP) sequences in the regions flanking the O-antigen gene clusters were coincident with possible recombination points. From these results, we conclude that the horizontal transfer of the O157-antigen gene clusters induced the emergence of multiple O157 lineages within E. coli and speculate that REP sequences may involve one of the driving forces for exchange and evolution of O-antigen loci.

  8. Structure-sequence based analysis for identification of conserved regions in proteins

    DOEpatents

    Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth

    2013-05-28

    Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.

  9. Genome mining-directed activation of a silent angucycline biosynthetic gene cluster in Streptomyces chattanoogensis.

    PubMed

    Zhou, Zhenxing; Xu, Qingqing; Bu, Qingting; Guo, Yuanyang; Liu, Shuiping; Liu, Yu; Du, Yiling; Li, Yongquan

    2015-02-09

    Genomic sequencing of actinomycetes has revealed the presence of numerous gene clusters seemingly capable of natural product biosynthesis, yet most clusters are cryptic under laboratory conditions. Bioinformatics analysis of the completely sequenced genome of Streptomyces chattanoogensis L10 (CGMCC 2644) revealed a silent angucycline biosynthetic gene cluster. The overexpression of a pathway-specific activator gene under the constitutive ermE* promoter successfully triggered the expression of the angucycline biosynthetic genes. Two novel members of the angucycline antibiotic family, chattamycins A and B, were further isolated and elucidated. Biological activity assays demonstrated that chattamycin B possesses good antitumor activities against human cancer cell lines and moderate antibacterial activities. The results presented here provide a feasible method to activate silent angucycline biosynthetic gene clusters to discover potential new drug leads. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.

    PubMed

    King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.

  11. Analysis of intra-host genetic diversity of Prunus necrotic ringspot virus (PNRSV) using amplicon next generation sequencing

    PubMed Central

    Constable, Fiona E.; Nancarrow, Narelle; Plummer, Kim M.; Rodoni, Brendan

    2017-01-01

    PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored. PMID:28632759

  12. Identification of Sinorhizobium (Ensifer) medicae based on a specific genomic sequence unveiled by M13-PCR fingerprinting.

    PubMed

    Dourado, Ana Catarina; Alves, Paula I L; Tenreiro, Tania; Ferreira, Eugénio M; Tenreiro, Rogério; Fareleira, Paula; Crespo, M Teresa Barreto

    2009-12-01

    A collection of nodule isolates from Medicago polymorpha obtained from southern and central Portugal was evaluated by M13-PCR fingerprinting and hierarchical cluster analysis. Several genomic clusters were obtained which, by 16S rRNA gene sequencing of selected representatives, were shown to be associated with particular taxonomic groups of rhizobia and other soil bacteria. The method provided a clear separation between rhizobia and co-isolated non-symbiotic soil contaminants. Ten M13-PCR groups were assigned to Sinorhizobium (Ensifer) medicae and included all isolates responsible for the formation of nitrogen-fixing nodules upon re-inoculation of M. polymorpha test-plants. In addition, enterobacterial repetitive intergenic consensus (ERIC)-PCR fingerprinting indicated a high genomic heterogeneity within the major M13- PCR clusters of S. medicae isolates. Based on nucleotide sequence data of an M13-PCR amplicon of ca. 1500 bp, observed only in S. medicae isolates and spanning locus Smed_3707 to Smed_3709 from the pSMED01 plasmid sequence of S. medicae WSM419 genome's sequence, a pair of PCR primers was designed and used for direct PCR amplification of a 1399-bp sequence within this fragment. Additional in silico and in vitro experiments, as well as phylogenetic analysis, confirmed the specificity of this primer combination and therefore the reliability of this approach in the prompt identification of S. medicae isolates and their distinction from other soil bacteria.

  13. Characterization of apple stem grooving virus and apple chlorotic leaf spot virus identified in a crab apple tree.

    PubMed

    Li, Yongqiang; Deng, Congliang; Bian, Yong; Zhao, Xiaoli; Zhou, Qi

    2017-04-01

    Apple stem grooving virus (ASGV), apple chlorotic leaf spot virus (ACLSV), and prunus necrotic ringspot virus (PNRSV) were identified in a crab apple tree by small RNA deep sequencing. The complete genome sequence of ACLSV isolate BJ (ACLSV-BJ) was 7554 nucleotides and shared 67.0%-83.0% nucleotide sequence identity with other ACLSV isolates. A phylogenetic tree based on the complete genome sequence of all available ACLSV isolates showed that ACLSV-BJ clustered with the isolates SY01 from hawthorn, MO5 from apple, and JB, KMS and YH from pear. The complete nucleotide sequence of ASGV-BJ was 6509 nucleotides (nt) long and shared 78.2%-80.7% nucleotide sequence identity with other isolates. ASGV-BJ and the isolate ASGV_kfp clustered together in the phylogenetic tree as an independent clade. Recombination analysis showed that isolate ASGV-BJ was a naturally occurring recombinant.

  14. Fuzzy cluster analysis of simple physicochemical properties of amino acids for recognizing secondary structure in proteins.

    PubMed Central

    Mocz, G.

    1995-01-01

    Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein's sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction. PMID:7549882

  15. Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana).

    PubMed

    Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila

    2010-07-16

    Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.

  16. Phylogenetic analysis of Hungarian goose parvovirus isolates and vaccine strains.

    PubMed

    Tatár-Kis, Tímea; Mató, Tamás; Markos, Béla; Palya, Vilmos

    2004-08-01

    Polymerase chain reaction and sequencing were used to analyse goose parvovirus field isolates and vaccine strains. Two fragments of the genome were amplified. Fragment "A" represents a region of VP3 gene, while fragment "B" represents a region upstream of the VP3 gene, encompassing part of the VP1 gene. In the region of fragment "A" the deduced amino acid sequence of the strains was identical, therefore differentiation among strains could be done only at the nucleotide level, which resulted in the formation of three groups: Hungarian, West-European and Asian strains. In the region of fragment "B", separation of groups could be done by both nucleotide and deduced amino acid sequence level. The nucleotide sequences resulted in the same groups as for fragment "A" but with a different clustering pattern among the Hungarian strains. Within the "Hungarian" group most of the recent field isolates fell into one cluster, very closely related or identical to each other, indicating a very slow evolutionary change. The attenuated strains and field isolates from 1979/80 formed a separate cluster. When vaccine strains and field isolates were compared, two specific amino acid differences were found that can be considered as possible markers for vaccinal strains. Sequence analysis of fragment "B" seems to be a suitable method for differentiation of attenuated vaccine strains from virulent strains. Copyright 2004 Houghton Trust Ltd

  17. Valine/isoleucine variants drive selective pressure in the VP1 sequence of EV-A71 enteroviruses.

    PubMed

    Duy, Nghia Ngu; Huong, Le Thi Thanh; Ravel, Patrice; Huong, Le Thi Song; Dwivedi, Ankit; Sessions, October Michael; Hou, Yan'An; Chua, Robert; Kister, Guilhem; Afelt, Aneta; Moulia, Catherine; Gubler, Duane J; Thiem, Vu Dinh; Thanh, Nguyen Thi Hien; Devaux, Christian; Duong, Tran Nhu; Hien, Nguyen Tran; Cornillot, Emmanuel; Gavotte, Laurent; Frutos, Roger

    2017-05-08

    In 2011-2012, Northern Vietnam experienced its first large scale hand foot and mouth disease (HFMD) epidemic. In 2011, a major HFMD epidemic was also reported in South Vietnam with fatal cases. This 2011-2012 outbreak was the first one to occur in North Vietnam providing grounds to study the etiology, origin and dynamic of the disease. We report here the analysis of the VP1 gene of strains isolated throughout North Vietnam during the 2011-2012 outbreak and before. The VP1 gene of 106 EV-A71 isolates from North Vietnam and 2 from Central Vietnam were sequenced. Sequence alignments were analyzed at the nucleic acid and protein level. Gene polymorphism was also analyzed. A Factorial Correspondence Analysis was performed to correlate amino acid mutations with clinical parameters. The sequences were distributed into four phylogenetic clusters. Three clusters corresponded to the subgenogroup C4 and the last one corresponded to the subgenogroup C5. Each cluster displayed different polymorphism characteristics. Proteins were highly conserved but three sites bearing only Isoleucine (I) or Valine (V) were characterized. The isoleucine/valine variability matched the clusters. Spatiotemporal analysis of the I/V variants showed that all variants which emerged in 2011 and then in 2012 were not the same but were all present in the region prior to the 2011-2012 outbreak. Some correlation was found between certain I/V variants and ethnicity and severity. The 2011-2012 outbreak was not caused by an exogenous strain coming from South Vietnam or elsewhere but by strains already present and circulating at low level in North Vietnam. However, what triggered the outbreak remains unclear. A selective pressure is applied on I/V variants which matches the genetic clusters. I/V variants were shown on other viruses to correlate with pathogenicity. This should be investigated in EV-A71. I/V variants are an easy and efficient way to survey and identify circulating EV-A71 strains.

  18. Phylogenetic analysis of Newcastle disease viruses from Bangladesh suggests continuing evolution of genotype XIII.

    PubMed

    Barman, Lalita Rani; Nooruzzaman, Mohammed; Sarker, Rahul Deb; Rahman, Md Tazinur; Saife, Md Rajib Bin; Giasuddin, Mohammad; Das, Bidhan Chandra; Das, Priya Mohan; Chowdhury, Emdadul Haque; Islam, Mohammad Rafiqul

    2017-10-01

    A total of 23 Newcastle disease virus (NDV) isolates from Bangladesh taken between 2010 and 2012 were characterized on the basis of partial F gene sequences. All the isolates belonged to genotype XIII of class II NDV but segregated into three sub-clusters. One sub-cluster with 17 isolates aligned with sub-genotype XIIIc. The other two sub-clusters were phylogenetically distinct from the previously described sub-genotypes XIIIa, XIIIb and XIIIc and could be candidates of new sub-genotypes; however, that needs to be validated through full-length F gene sequence data. The results of the present study suggest that genotype XIII NDVs are under continuing evolution in Bangladesh.

  19. Pichia stipitis genomics, transcriptomics, and gene clusters

    Treesearch

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  20. Phylogenomic and MALDI-TOF MS Analysis of Streptococcus sinensis HKU4T Reveals a Distinct Phylogenetic Clade in the Genus Streptococcus

    PubMed Central

    Tse, Herman; Chen, Jonathan H.K.; Tang, Ying; Lau, Susanna K.P.; Woo, Patrick C.Y.

    2014-01-01

    Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. PMID:25331233

  1. Phylogenomic and MALDI-TOF MS analysis of Streptococcus sinensis HKU4T reveals a distinct phylogenetic clade in the genus Streptococcus.

    PubMed

    Teng, Jade L L; Huang, Yi; Tse, Herman; Chen, Jonathan H K; Tang, Ying; Lau, Susanna K P; Woo, Patrick C Y

    2014-10-20

    Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the "sanguinis group." As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the "mitis group." On the basis of the findings, we propose a novel group, named "sinensis group," to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Characterization of a microcystin and detection of microcystin synthetase genes from a Brazilian isolate of Nostoc.

    PubMed

    Genuário, Diego Bonaldo; Silva-Stenico, Maria Estela; Welker, Martin; Beraldo Moraes, Luiz Alberto; Fiore, Marli Fátima

    2010-04-01

    A nostocalean nitrogen-fixing cyanobacterium isolated from an eutrophic freshwater reservoir located in Piracicaba, São Paulo, Brazil, was evaluated for the production of hepatotoxic cyclic heptapeptides, microcystins. Morphologically this new cyanobacterium strain appears closest to Nostoc, however, in the phylogenetic analysis of 16S rRNA gene it falls into a highly stable cluster distantly only related to the typical Nostoc cluster. Extracts of Nostoc sp. CENA88 cultured cells, investigated using ELISA assay, gave positive results and the microcystin profile revealed by ESI-Q-TOF/MS/MS analysis confirmed the production of [Dha(7)]MCYST-YR. Further, Nostoc sp. CENA88 genomic DNA was analyzed by PCR for sequences of mcyD, mcyE and mcyG genes of microcystin synthetase (mcy) cluster. The result revealed the presence of mcyD, mcyE and mcyG genes with similarities to those from mcy of Nostoc sp. strains 152 and IO-102-I and other cyanobacterial genera. The phylogenetic tree based on concatenated McyG, McyD and McyE amino acids clustered the sequences according to cyanobacterial genera, with exception of the Nostoc sp. CENA88 sequence, which was placed in a clade distantly related from other Nostoc strains, as previously observed also in the 16S rRNA phylogenetic analysis. The present study describes for the first time a Brazilian Nostoc microcystin producer and also the occurrence of demethyl MCYST-YR variant in this genus. The sequenced Nostoc genes involved in the microcystin synthesis can contribute to a better understanding of the toxigenicity and evolution of this cyanotoxin. Copyright 2009 Elsevier Ltd. All rights reserved.

  3. Molecular evidence of Burkholderia pseudomallei genotypes based on geographical distribution.

    PubMed

    Zulkefli, Noorfatin Jihan; Mariappan, Vanitha; Vellasamy, Kumutha Malar; Chong, Chun Wie; Thong, Kwai Lin; Ponnampalavanar, Sasheela; Vadivelu, Jamuna; Teh, Cindy Shuan Ju

    2016-01-01

    Background. Central intermediary metabolism (CIM) in bacteria is defined as a set of metabolic biochemical reactions within a cell, which is essential for the cell to survive in response to environmental perturbations. The genes associated with CIM are commonly found in both pathogenic and non-pathogenic strains. As these genes are involved in vital metabolic processes of bacteria, we explored the efficiency of the genes in genotypic characterization of Burkholderia pseudomallei isolates, compared with the established pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST) schemes. Methods. Nine previously sequenced B. pseudomallei isolates from Malaysia were characterized by PFGE, MLST and CIM genes. The isolates were later compared to the other 39 B. pseudomallei strains, retrieved from GenBank using both MLST and sequence analysis of CIM genes. UniFrac and hierachical clustering analyses were performed using the results generated by both MLST and sequence analysis of CIM genes. Results. Genetic relatedness of nine Malaysian B. pseudomallei isolates and the other 39 strains was investigated. The nine Malaysian isolates were subtyped into six PFGE profiles, four MLST profiles and five sequence types based on CIM genes alignment. All methods demonstrated the clonality of OB and CB as well as CMS and THE. However, PFGE showed less than 70% similarity between a pair of morphology variants, OS and OB. In contrast, OS was identical to the soil isolate, MARAN. To have a better understanding of the genetic diversity of B. pseudomallei worldwide, we further aligned the sequences of genes used in MLST and genes associated with CIM for the nine Malaysian isolates and 39 B. pseudomallei strains from NCBI database. Overall, based on the CIM genes, the strains were subtyped into 33 profiles where majority of the strains from Asian countries were clustered together. On the other hand, MLST resolved the isolates into 31 profiles which formed three clusters. Hierarchical clustering using UniFrac distance suggested that the isolates from Australia were genetically distinct from the Asian isolates. Nevertheless, statistical significant differences were detected between isolates from Malaysia, Thailand and Australia. Discussion. Overall, PFGE showed higher discriminative power in clustering the nine Malaysian B. pseudomallei isolates and indicated its suitability for localized epidemiological study. Compared to MLST, CIM genes showed higher resolution in distinguishing those non-related strains and better clustering of strains from different geographical regions. A closer genetic relatedness of Malaysian isolates with all Asian strains in comparison to Australian strains was observed. This finding was supported by UniFrac analysis which resulted in geographical segregation between Australia and the Asian countries.

  4. Genetic diversity studies in pea (Pisum sativum L.) using simple sequence repeat markers.

    PubMed

    Kumari, P; Basal, N; Singh, A K; Rai, V P; Srivastava, C P; Singh, P K

    2013-03-13

    The genetic diversity among 28 pea (Pisum sativum L.) genotypes was analyzed using 32 simple sequence repeat markers. A total of 44 polymorphic bands, with an average of 2.1 bands per primer, were obtained. The polymorphism information content ranged from 0.657 to 0.309 with an average of 0.493. The variation in genetic diversity among these cultivars ranged from 0.11 to 0.73. Cluster analysis based on Jaccard's similarity coefficient using the unweighted pair-group method with arithmetic mean (UPGMA) revealed 2 distinct clusters, I and II, comprising 6 and 22 genotypes, respectively. Cluster II was further differentiated into 2 subclusters, IIA and IIB, with 12 and 10 genotypes, respectively. Principal component (PC) analysis revealed results similar to those of UPGMA. The first, second, and third PCs contributed 21.6, 16.1, and 14.0% of the variation, respectively; cumulative variation of the first 3 PCs was 51.7%.

  5. First CCD UBVI photometric analysis of six open cluster candidates

    NASA Astrophysics Data System (ADS)

    Piatti, A. E.; Clariá, J. J.; Ahumada, A. V.

    2011-04-01

    We have obtained CCD UBVIKC photometry down to V ˜ 22 for the open cluster candidates Haffner 3, Haffner 5, NGC 2368, Haffner 25, Hogg 3 and Hogg 4 and their surrounding fields. None of these objects have been photometrically studied so far. Our analysis shows that these stellar groups are not genuine open clusters since no clear main sequences or other meaningful features can be seen in their colour-magnitude and colour-colour diagrams. We checked for possible differential reddening across the studied fields that could be hiding the characteristics of real open clusters. However, the dust in the directions to these objects appears to be uniformly distributed. Moreover, star counts carried out within and outside the open cluster candidate fields do not support the hypothesis that these objects are real open clusters or even open cluster remnants.

  6. Dispersion of the HIV-1 Epidemic in Men Who Have Sex with Men in the Netherlands: A Combined Mathematical Model and Phylogenetic Analysis.

    PubMed

    Bezemer, Daniela; Cori, Anne; Ratmann, Oliver; van Sighem, Ard; Hermanides, Hillegonda S; Dutilh, Bas E; Gras, Luuk; Rodrigues Faria, Nuno; van den Hengel, Rob; Duits, Ashley J; Reiss, Peter; de Wolf, Frank; Fraser, Christophe

    2015-11-01

    The HIV-1 subtype B epidemic amongst men who have sex with men (MSM) is resurgent in many countries despite the widespread use of effective combination antiretroviral therapy (cART). In this combined mathematical and phylogenetic study of observational data, we aimed to find out the extent to which the resurgent epidemic is the result of newly introduced strains or of growth of already circulating strains. As of November 2011, the ATHENA observational HIV cohort of all patients in care in the Netherlands since 1996 included HIV-1 subtype B polymerase sequences from 5,852 patients. Patients who were diagnosed between 1981 and 1995 were included in the cohort if they were still alive in 1996. The ten most similar sequences to each ATHENA sequence were selected from the Los Alamos HIV Sequence Database, and a phylogenetic tree was created of a total of 8,320 sequences. Large transmission clusters that included ≥10 ATHENA sequences were selected, with a local support value ≥ 0.9 and median pairwise patristic distance below the fifth percentile of distances in the whole tree. Time-varying reproduction numbers of the large MSM-majority clusters were estimated through mathematical modeling. We identified 106 large transmission clusters, including 3,061 (52%) ATHENA and 652 Los Alamos sequences. Half of the HIV sequences from MSM registered in the cohort in the Netherlands (2,128 of 4,288) were included in 91 large MSM-majority clusters. Strikingly, at least 54 (59%) of these 91 MSM-majority clusters were already circulating before 1996, when cART was introduced, and have persisted to the present. Overall, 1,226 (35%) of the 3,460 diagnoses among MSM since 1996 were found in these 54 long-standing clusters. The reproduction numbers of all large MSM-majority clusters were around the epidemic threshold value of one over the whole study period. A tendency towards higher numbers was visible in recent years, especially in the more recently introduced clusters. The mean age of MSM at diagnosis increased by 0.45 years/year within clusters, but new clusters appeared with lower mean age. Major strengths of this study are the high proportion of HIV-positive MSM with a sequence in this study and the combined application of phylogenetic and modeling approaches. Main limitations are the assumption that the sampled population is representative of the overall HIV-positive population and the assumption that the diagnosis interval distribution is similar between clusters. The resurgent HIV epidemic amongst MSM in the Netherlands is driven by several large, persistent, self-sustaining, and, in many cases, growing sub-epidemics shifting towards new generations of MSM. Many of the sub-epidemics have been present since the early epidemic, to which new sub-epidemics are being added.

  7. Analysis of the beak and feather disease viral genome indicates the existence of several genotypes which have a complex psittacine host specificity.

    PubMed

    de Kloet, E; de Kloet, S R

    2004-12-01

    A study was made of the phylogenetic relationships between fifteen complete nucleotide sequences as well as 43 nucleotide sequences of the putative coat protein gene of different strains belonging to the virus species Beak and feather disease virus obtained from 39 individuals of 16 psittacine species. The species included among others, cockatoos ( Cacatuini), African grey parrots ( Psittacus erithacus) and peach-faced lovebirds ( Agapornis roseicollis), which were infected at different geographical locations, within and outside Australia, the native origin of the virus. The derived amino acid sequences of the putative coat protein were highly diverse, with differences between some strains amounting to 50 of the 250 amino acids. Phylogenetic analysis demonstrated that the putative coat gene sequences form six clusters which show a varying degree of psittacine species specificity. Most, but not all strains infecting African grey parrots formed a single cluster as did the strains infecting the cockatoos. Strains infecting the lovebirds clustered with those infecting such Australasian species as Eclectus roratus, Psittacula kramerii and Psephotus haematogaster. Although individual birds included in this study were, where studied, often infected by closely related strains, infection by highly diverged trains was also detected. The possible relationship between BFD viral strains and clinical disease signs is discussed.

  8. Multilocus sequence typing and pulsed-field gel electrophoresis analysis of Oenococcus oeni from different wine-producing regions of China.

    PubMed

    Wang, Tao; Li, Hua; Wang, Hua; Su, Jing

    2015-04-16

    The present study established a typing method with NotI-based pulsed-field gel electrophoresis (PFGE) and stress response gene schemed multilocus sequence typing (MLST) for 55 Oenococcus oeni strains isolated from six individual regions in China and two model strains PSU-1 (CP000411) and ATCC BAA-1163 (AAUV00000000). Seven stress response genes, cfa, clpL, clpP, ctsR, mleA, mleP and omrA, were selected for MLST testing, and positive selective pressure was detected for these genes. Furthermore, both methods separated the strains into two clusters. The PFGE clusters are correlated with the region, whereas the sequence types (STs) formed by the MLST confirm the two clusters identified by PFGE. In addition, the population structure was a mixture of evolutionary pathways, and the strains exhibited both clonal and panmictic characteristics. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. SAMSA2: a standalone metatranscriptome analysis pipeline.

    PubMed

    Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G

    2018-05-21

    Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

  10. Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

    PubMed

    Kikuchi, Akihito; Ikemura, Toshimichi; Abe, Takashi

    2015-01-01

    With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data.

  11. On the determination of age and mass functions of stars in young open star clusters from the analysis of their luminosity functions

    NASA Astrophysics Data System (ADS)

    Piskunov, A. E.; Belikov, A. N.; Kharchenko, N. V.; Sagar, R.; Subramaniam, A.

    2004-04-01

    We construct the observed luminosity functions of the remote young open clusters NGC 2383, 2384, 4103, 4755, 7510 and Hogg 15 from CCD observations of them. The observed LFs are corrected for field star contamination determined with the help of a Galactic star count model. In the case of Hogg 15 and NGC 2383 we also consider the additional contamination from neighbouring clusters NGC 4609 and 2384, respectively. These corrections provide a realistic pattern of cluster LF in the vicinity of the main-sequence (MS) turn-on point and at fainter magnitudes reveal the so-called H-feature arising as a result of the transition of the pre-MS phase to the MS, which is dependent on the cluster age. The theoretical LFs are constructed representing a cluster population model with continuous star formation for a short time-scale and a power-law initial mass function (IMF), and these are fitted to the observed LF. As a result, we are able to determine for each cluster a set of parameters describing the cluster population (the age, duration of star formation, IMF slope and percentage of field star contamination). It is found that in spite of the non-monotonic behaviour of observed LFs, cluster IMFs can be described as power-law functions with slopes similar to Salpeter's value. The present main-sequence turn-on cluster ages are several times lower than those derived from the fitting of theoretical isochrones to the turn-off region of the upper main sequences.

  12. Inferring HIV-1 Transmission Dynamics in Germany From Recently Transmitted Viruses.

    PubMed

    Pouran Yousef, Kaveh; Meixenberger, Karolin; Smith, Maureen R; Somogyi, Sybille; Gromöller, Silvana; Schmidt, Daniel; Gunsenheimer-Bartmeyer, Barbara; Hamouda, Osamah; Kücherer, Claudia; von Kleist, Max

    2016-11-01

    Although HIV continues to spread globally, novel intervention strategies such as treatment as prevention (TasP) may bring the epidemic to a halt. However, their effective implementation requires a profound understanding of the underlying transmission dynamics. We analyzed parameters of the German HIV epidemic based on phylogenetic clustering of viral sequences from recently infected seroconverters with known infection dates. Viral baseline and follow-up pol sequences (n = 1943) from 1159 drug-naïve individuals were selected from a nationwide long-term observational study initiated in 1997. Putative transmission clusters were computed based on a maximum likelihood phylogeny. Using individual follow-up sequences, we optimized our clustering threshold to maximize the likelihood of co-clustering individuals connected by direct transmission. The sizes of putative transmission clusters scaled inversely with their abundance and their distribution exhibited a heavy tail. Clusters based on the optimal clustering threshold were significantly more likely to contain members of the same or bordering German federal states. Interinfection times between co-clustered individuals were significantly shorter (26 weeks; interquartile range: 13-83) than in a null model. Viral intraindividual evolution may be used to select criteria that maximize co-clustering of transmission pairs in the absence of strong adaptive selection pressure. Interinfection times of co-clustered individuals may then be an indicator of the typical time to onward transmission. Our analysis suggests that onward transmission may have occurred early after infection, when individuals are typically unaware of their serological status. The latter argues that TasP should be combined with HIV testing campaigns to reduce the possibility of transmission before TasP initiation.

  13. PAQ: Partition Analysis of Quasispecies.

    PubMed

    Baccam, P; Thompson, R J; Fedrigo, O; Carpenter, S; Cornette, J L

    2001-01-01

    The complexities of genetic data may not be accurately described by any single analytical tool. Phylogenetic analysis is often used to study the genetic relationship among different sequences. Evolutionary models and assumptions are invoked to reconstruct trees that describe the phylogenetic relationship among sequences. Genetic databases are rapidly accumulating large amounts of sequences. Newly acquired sequences, which have not yet been characterized, may require preliminary genetic exploration in order to build models describing the evolutionary relationship among sequences. There are clustering techniques that rely less on models of evolution, and thus may provide nice exploratory tools for identifying genetic similarities. Some of the more commonly used clustering methods perform better when data can be grouped into mutually exclusive groups. Genetic data from viral quasispecies, which consist of closely related variants that differ by small changes, however, may best be partitioned by overlapping groups. We have developed an intuitive exploratory program, Partition Analysis of Quasispecies (PAQ), which utilizes a non-hierarchical technique to partition sequences that are genetically similar. PAQ was used to analyze a data set of human immunodeficiency virus type 1 (HIV-1) envelope sequences isolated from different regions of the brain and another data set consisting of the equine infectious anemia virus (EIAV) regulatory gene rev. Analysis of the HIV-1 data set by PAQ was consistent with phylogenetic analysis of the same data, and the EIAV rev variants were partitioned into two overlapping groups. PAQ provides an additional tool which can be used to glean information from genetic data and can be used in conjunction with other tools to study genetic similarities and genetic evolution of viral quasispecies.

  14. The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs.

    PubMed

    Thompson, Jennifer A; Fielding, Katherine; Hargreaves, James; Copas, Andrew

    2017-12-01

    Background/Aims We sought to optimise the design of stepped wedge trials with an equal allocation of clusters to sequences and explored sample size comparisons with alternative trial designs. Methods We developed a new expression for the design effect for a stepped wedge trial, assuming that observations are equally correlated within clusters and an equal number of observations in each period between sequences switching to the intervention. We minimised the design effect with respect to (1) the fraction of observations before the first and after the final sequence switches (the periods with all clusters in the control or intervention condition, respectively) and (2) the number of sequences. We compared the design effect of this optimised stepped wedge trial to the design effects of a parallel cluster-randomised trial, a cluster-randomised trial with baseline observations, and a hybrid trial design (a mixture of cluster-randomised trial and stepped wedge trial) with the same total cluster size for all designs. Results We found that a stepped wedge trial with an equal allocation to sequences is optimised by obtaining all observations after the first sequence switches and before the final sequence switches to the intervention; this means that the first sequence remains in the control condition and the last sequence remains in the intervention condition for the duration of the trial. With this design, the optimal number of sequences is [Formula: see text], where [Formula: see text] is the cluster-mean correlation, [Formula: see text] is the intracluster correlation coefficient, and m is the total cluster size. The optimal number of sequences is small when the intracluster correlation coefficient and cluster size are small and large when the intracluster correlation coefficient or cluster size is large. A cluster-randomised trial remains more efficient than the optimised stepped wedge trial when the intracluster correlation coefficient or cluster size is small. A cluster-randomised trial with baseline observations always requires a larger sample size than the optimised stepped wedge trial. The hybrid design can always give an equally or more efficient design, but will be at most 5% more efficient. We provide a strategy for selecting a design if the optimal number of sequences is unfeasible. For a non-optimal number of sequences, the sample size may be reduced by allowing a proportion of observations before the first or after the final sequence has switched. Conclusion The standard stepped wedge trial is inefficient. To reduce sample sizes when a hybrid design is unfeasible, stepped wedge trial designs should have no observations before the first sequence switches or after the final sequence switches.

  15. New Insights on Taxonomy, Phylogeny and Population Genetics of Leishmania (Viannia) Parasites Based on Multilocus Sequence Analysis

    PubMed Central

    Boité, Mariana C.; Mauricio, Isabel L.; Miles, Michael A.; Cupolillo, Elisa

    2012-01-01

    The Leishmania genus comprises up to 35 species, some with status still under discussion. The multilocus sequence typing (MLST)—extensively used for bacteria—has been proposed for pathogenic trypanosomatids. For Leishmania, however, a detailed analysis and revision on the taxonomy is still required. We have partially sequenced four housekeeping genes—glucose-6-phosphate dehydrogenase (G6PD), 6-phosphogluconate dehydrogenase (6PGD), mannose phosphate isomerase (MPI) and isocitrate dehydrogenase (ICD)—from 96 Leishmania (Viannia) strains and assessed their discriminatory typing capacity. The fragments had different degrees of diversity, and are thus suitable to be used in combination for intra- and inter-specific inferences. Species-specific single nucleotide polymorphisms were detected, but not for all species; ambiguous sites indicating heterozygosis were observed, as well as the putative homozygous donor. A large number of haplotypes were detected for each marker; for 6PGD a possible ancestral allele for L. (Viannia) was found. Maximum parsimony-based haplotype networks were built. Strains of different species, as identified by multilocus enzyme electrophoresis (MLEE), formed separated clusters in each network, with exceptions. NeighborNet of concatenated sequences confirmed species-specific clusters, suggesting recombination occurring in L. braziliensis and L. guyanensis. Phylogenetic analysis indicates L. lainsoni and L. naiffi as the most divergent species and does not support L. shawi as a distinct species, placing it in the L. guyanensis cluster. BURST analysis resulted in six clonal complexes (CC), corresponding to distinct species. The L. braziliensis strains evaluated correspond to one widely geographically distributed CC and another restricted to one endemic area. This study demonstrates the value of systematic multilocus sequence analysis (MLSA) for determining intra- and inter-species relationships and presents an approach to validate the species status of some entities. Furthermore, it contributes to the phylogeny of L. (Viannia) and might be helpful for epidemiological and population genetics analysis based on haplotype/diplotype determinations and inferences. PMID:23133690

  16. Phylogenetic Network Analysis Revealed the Occurrence of Horizontal Gene Transfer of 16S rRNA in the Genus Enterobacter

    PubMed Central

    Sato, Mitsuharu; Miyazaki, Kentaro

    2017-01-01

    Horizontal gene transfer (HGT) is a ubiquitous genetic event in bacterial evolution, but it seldom occurs for genes involved in highly complex supramolecules (or biosystems), which consist of many gene products. The ribosome is one such supramolecule, but several bacteria harbor dissimilar and/or chimeric 16S rRNAs in their genomes, suggesting the occurrence of HGT of this gene. However, we know little about whether the genes actually experience HGT and, if so, the frequency of such a transfer. This is primarily because the methods currently employed for phylogenetic analysis (e.g., neighbor-joining, maximum likelihood, and maximum parsimony) of 16S rRNA genes assume point mutation-driven tree-shape evolution as an evolutionary model, which is intrinsically inappropriate to decipher the evolutionary history for genes driven by recombination. To address this issue, we applied a phylogenetic network analysis, which has been used previously for detection of genetic recombination in homologous alleles, to the 16S rRNA gene. We focused on the genus Enterobacter, whose phylogenetic relationships inferred by multi-locus sequence alignment analysis and 16S rRNA sequences are incompatible. All 10 complete genomic sequences were retrieved from the NCBI database, in which 71 16S rRNA genes were included. Neighbor-joining analysis demonstrated that the genes residing in the same genomes clustered, indicating the occurrence of intragenomic recombination. However, as suggested by the low bootstrap values, evolutionary relationships between the clusters were uncertain. We then applied phylogenetic network analysis to representative sequences from each cluster. We found three ancestral 16S rRNA groups; the others were likely created through recursive recombination between the ancestors and chimeric descendants. Despite the large sequence changes caused by the recombination events, the RNA secondary structures were conserved. Successive intergenomic and intragenomic recombination thus shaped the evolution of 16S rRNA genes in the genus Enterobacter. PMID:29180992

  17. Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

    PubMed

    Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

    2002-12-15

    We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.

  18. Whole-genome sequencing of Aspergillus tubingensis G131 and overview of its secondary metabolism potential.

    PubMed

    Choque, Elodie; Klopp, Christophe; Valiere, Sophie; Raynal, José; Mathieu, Florence

    2018-03-15

    Black Aspergilli represent one of the most important fungal resources of primary and secondary metabolites for biotechnological industry. Having several black Aspergilli sequenced genomes should allow targeting the production of certain metabolites with bioactive properties. In this study, we report the draft genome of a black Aspergilli, A. tubingensis G131, isolated from a French Mediterranean vineyard. This 35 Mb genome includes 10,994 predicted genes. A genomic-based discovery identifies 80 secondary metabolites biosynthetic gene clusters. Genomic sequences of these clusters were blasted on 3 chosen black Aspergilli genomes: A. tubingensis CBS 134.48, A. niger CBS 513.88 and A. kawachii IFO 4308. This comparison highlights different levels of clusters conservation between the four strains. It also allows identifying seven unique clusters in A. tubingensis G131. Moreover, the putative secondary metabolites clusters for asperazine and naphtho-gamma-pyrones production were proposed based on this genomic analysis. Key biosynthetic genes required for the production of 2 mycotoxins, ochratoxin A and fumonisin, are absent from this draft genome. Even if intergenic sequences of these mycotoxins biosynthetic pathways are present, this could not lead to the production of those mycotoxins by A. tubingensis G131. Functional and bioinformatics analyses of A. tubingensis G131 genome highlight its potential for metabolites production in particular for TAN-1612, asperazine and naphtho-gamma-pyrones presenting antioxidant, anticancer or antibiotic properties.

  19. Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform

    PubMed Central

    Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga

    2015-01-01

    Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy. PMID:25860802

  20. Regional spread of HIV-1 M subtype B in middle-aged patients by random env-C2V4 region sequencing

    PubMed Central

    Stürmer, Martin; Zimmermann, Katrin; Fritzsche, Carlos; Reisinger, Emil; Doelken, Gottfried; Berger, Annemarie; Doerr, Hans W.; Eberle, Josef

    2010-01-01

    A transmission cluster of HIV-1 M:B was identified in 11 patients with a median age of 52 (range 26–65) in North-East Germany by C2V4 region sequencing of the env gene of HIV-1, who—except of one—were not aware of any risky behaviour. The 10 male and 1 female patients deteriorated immunologically, according to their information made available, within 4 years after a putative HIV acquisition. Nucleic acid sequence analysis showed a R5 virus in all patients and in 7 of 11 a crown motif of the V3 loop, GPGSALFTT, which is found rarely. Analysis of formation of this cluster showed that there is still a huge discrepancy between awareness and behaviour regarding HIV transmission in middle-aged patients, and that a local outbreak can be detected by nucleic acid analysis of the hypervariable env region. PMID:20217125

  1. Genetic diversity studies and identification of SSR markers associated with Fusarium wilt (Fusarium udum) resistance in cultivated pigeonpea (Cajanus cajan).

    PubMed

    Singh, A K; Rai, V P; Chand, R; Singh, R P; Singh, M N

    2013-01-01

    Genetic diversity and identification of simple sequence repeat markers correlated with Fusarium wilt resistance was performed in a set of 36 elite cultivated pigeonpea genotypes differing in levels of resistance to Fusarium wilt. Twenty-four polymorphic sequence repeat markers were screened across these genotypes, and amplified a total of 59 alleles with an average high polymorphic information content value of 0.52. Cluster analysis, done by UPGMA and PCA, grouped the 36 pigeonpea genotypes into two main clusters according to their Fusarium wilt reaction. Based on the Kruskal-Wallis ANOVA and simple regression analysis, six simple sequence repeat markers were found to be significantly associated with Fusarium wilt resistance. The phenotypic variation explained by these markers ranged from 23.7 to 56.4%. The present study helps in finding out feasibility of prescreened SSR markers to be used in genetic diversity analysis and their potential association with disease resistance.

  2. Microsatellite markers identify three lineages of Phytophthora ramorum in US nurseries, yet single lineages in US forest and European nursery populations.

    PubMed

    Ivors, K; Garbelotto, M; Vries, I D E; Ruyter-Spira, C; Te Hekkert, B; Rosenzweig, N; Bonants, P

    2006-05-01

    Analysis of 12 polymorphic simple sequence repeats identified in the genome sequence of Phytophthora ramorum, causal agent of 'sudden oak death', revealed genotypic diversity to be significantly higher in nurseries (91% of total) than in forests (18% of total). Our analysis identified only two closely related genotypes in US forests, while the genetic structure of populations from European nurseries was of intermediate complexity, including multiple, closely related genotypes. Multilocus analysis determined populations in US forests reproduce clonally and are likely descendants of a single introduced individual. The 151 isolates analysed clustered in three clades. US forest and European nursery isolates clustered into two distinct clades, while one isolate from a US nursery belonged to a third novel clade. The combined microsatellite, sequencing and morphological analyses suggest the three clades represent distinct evolutionary lineages. All three clades were identified in some US nurseries, emphasizing the role of commercial plant trade in the movement of this pathogen.

  3. Characterization of genome sequences and clinical features of coxsackievirus A6 strains collected in Hyogo, Japan in 1999-2013.

    PubMed

    Ogi, Miki; Yano, Yoshihiko; Chikahira, Masatsugu; Takai, Denshi; Oshibe, Tomohiro; Arashiro, Takeshi; Hanaoka, Nozomu; Fujimoto, Tsuguto; Hayashi, Yoshitake

    2017-08-01

    Coxsackievirus A6 (CV-A6) is an enterovirus, which is known to cause herpangina. However, since 2009 it has frequently been isolated from children with hand, foot, and mouth disease (HFMD). In Japan, CV-A6 has been linked to HFMD outbreaks in 2011 and 2013. In this study, the full-length genome sequencing of CV-A6 strains were analyzed to identify the association with clinical manifestations. Five thousand six hundred and twelve children with suspected enterovirus infection (0-17 years old) between 1999 and 2013 in Hyogo Prefecture, Japan, were enrolled. Enterovirus infection was confirmed with reverse transcriptase-PCR in 753 children (791 samples), 127 of whom (133 samples) were positive for CV-A6 based on the direct sequencing of the VP4 region. The complete genomes of CV-A6 from 22 positive patients with different clinical manifestations were investigated. A phylogenetic analysis divided these 22 strains into two clusters based on the VP1 region; cluster I contained strains collected in 1999-2009 and mostly related to herpangina, and cluster II contained strains collected in 2011-2013 and related to HFMD outbreak. Based on the full-length polyprotein analysis, the amino acid differences between the strains in cluster I and II were 97.7 ± 0.28%. Amino acid differences were detected in 17 positions within the polyprotein. Strains collected in 1999-2009 and those in 2011-2013 were separately clustered by phylogenetic analysis based on 5'UTR and 3Dpol region, as well as VP1 region. In conclusion, HFMD outbreaks by CV-A6 were recently frequent in Japan and the accumulation of genomic change might be associated with the clinical course. © 2017 Wiley Periodicals, Inc.

  4. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    PubMed

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  5. Biochemical characterization and phylogenetic analysis based on 16S rRNA sequences for V-factor dependent members of Pasteurellaceae derived from laboratory rats.

    PubMed

    Hayashimoto, Nobuhito; Ueno, Masami; Tkakura, Akira; Itoh, Toshio

    2007-06-01

    Phylogenetic analysis based on 16S rRNA sequences with sequence data of some bacterial species of Pasteurellaceae related to rodents deposited in GenBank was performed along with biochemical characterization for the 20 strains of V-factor dependent members of Pasteurellaceae derived from laboratory rats to obtain basic information and to investigate the taxonomic positions. The results of biochemical tests for all strains were identical except for three tests, the ornithine decarboxylase test, and fermentation tests of D(+) mannose and D(+) xylose. The biochemical properties of 8 of 20 strains that showed negative results for the fermentation test of D(+) xylose agreed with those of Haemophilus parainfluenzae complex. By phylogenetic analysis, the strains were divided into two clusters that agreed with the results of the fermentation test of xylose (group I: negative reaction for xylose, group II: positive reaction for xylose). The clusters were independent of other bacterial species of Pasteurellaceae tested. The sequences of the strains in group I showed 99.7-99.8% similarity and the strains in group II showed 99.3-99.7% similarity. None of the strains in group I had a close relation with Haemophilus parainfluenzae by phylogenetic analysis, although they showed the same biochemical properties. In conclusion, the strains had characteristic biochemical properties and formed two independent groups within the "rodent cluster" of Pasteurellaceae that differed in the results of the fermentation test of xylose. Therefore, they seemed to be hitherto undescribed taxa in Pasteurellaceae.

  6. Serial analysis of gene expression (SAGE) in normal human trabecular meshwork.

    PubMed

    Liu, Yutao; Munro, Drew; Layfield, David; Dellinger, Andrew; Walter, Jeffrey; Peterson, Katherine; Rickman, Catherine Bowes; Allingham, R Rand; Hauser, Michael A

    2011-04-08

    To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma. Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map. A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified. This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.

  7. Clustering and visualizing similarity networks of membrane proteins.

    PubMed

    Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

    2015-08-01

    We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.

  8. Subtyping Salmonella enterica serovar enteritidis isolates from different sources by using sequence typing based on virulence genes and clustered regularly interspaced short palindromic repeats (CRISPRs).

    PubMed

    Liu, Fenyun; Kariyawasam, Subhashinie; Jayarao, Bhushan M; Barrangou, Rodolphe; Gerner-Smidt, Peter; Ribot, Efrain M; Knabel, Stephen J; Dudley, Edward G

    2011-07-01

    Salmonella enterica subsp. enterica serovar Enteritidis is a major cause of food-borne salmonellosis in the United States. Two major food vehicles for S. Enteritidis are contaminated eggs and chicken meat. Improved subtyping methods are needed to accurately track specific strains of S. Enteritidis related to human salmonellosis throughout the chicken and egg food system. A sequence typing scheme based on virulence genes (fimH and sseL) and clustered regularly interspaced short palindromic repeats (CRISPRs)-CRISPR-including multi-virulence-locus sequence typing (designated CRISPR-MVLST)-was used to characterize 35 human clinical isolates, 46 chicken isolates, 24 egg isolates, and 63 hen house environment isolates of S. Enteritidis. A total of 27 sequence types (STs) were identified among the 167 isolates. CRISPR-MVLST identified three persistent and predominate STs circulating among U.S. human clinical isolates and chicken, egg, and hen house environmental isolates in Pennsylvania, and an ST that was found only in eggs and humans. It also identified a potential environment-specific sequence type. Moreover, cluster analysis based on fimH and sseL identified a number of clusters, of which several were found in more than one outbreak, as well as 11 singletons. Further research is needed to determine if CRISPR-MVLST might help identify the ecological origins of S. Enteritidis strains that contaminate chickens and eggs.

  9. The accelerated build-up of the red sequence in high-redshift galaxy clusters

    NASA Astrophysics Data System (ADS)

    Cerulo, P.; Couch, W. J.; Lidman, C.; Demarco, R.; Huertas-Company, M.; Mei, S.; Sánchez-Janssen, R.; Barrientos, L. F.; Muñoz, R. P.

    2016-04-01

    We analyse the evolution of the red sequence in a sample of galaxy clusters at redshifts 0.8 < z < 1.5 taken from the HAWK-I Cluster Survey (HCS). The comparison with the low-redshift (0.04 < z < 0.08) sample of the WIde-field Nearby Galaxy-cluster Survey (WINGS) and other literature results shows that the slope and intrinsic scatter of the cluster red sequence have undergone little evolution since z = 1.5. We find that the luminous-to-faint ratio and the slope of the faint end of the luminosity distribution of the HCS red sequence are consistent with those measured in WINGS, implying that there is no deficit of red galaxies at magnitudes fainter than M_V^{ast } at high redshifts. We find that the most massive HCS clusters host a population of bright red sequence galaxies at MV < -22.0 mag, which are not observed in low-mass clusters. Interestingly, we also note the presence of a population of very bright (MV < -23.0 mag) and massive (log (M*/M⊙) > 11.5) red sequence galaxies in the WINGS clusters, which do not include only the brightest cluster galaxies and which are not present in the HCS clusters, suggesting that they formed at epochs later than z = 0.8. The comparison with the luminosity distribution of a sample of passive red sequence galaxies drawn from the COSMOS/UltraVISTA field in the photometric redshift range 0.8 < zphot < 1.5 shows that the red sequence in clusters is more developed at the faint end, suggesting that halo mass plays an important role in setting the time-scales for the build-up of the red sequence.

  10. Genomic analysis of Ascochyta rabiei identifies dynamic genome environments of solanapyrone biosynthesis gene cluster and a novel type of pathway-specific regulator

    USDA-ARS?s Scientific Manuscript database

    Secondary metabolite genes are often clustered together and situated in particular genomic regions such as the subtelomere, which can facilitate niche adaptation in fungi. Solanapyrones are toxic secondary metabolites produced by fungi occupying different ecological niches. Full genome sequencing of...

  11. Whole genome sequence analysis of Geitlerinema sp. FC II unveils competitive edge of the strain in marine cultivation system for biofuel production.

    PubMed

    Batchu, Navish Kumar; Khater, Shradha; Patil, Sonal; Nagle, Vinod; Das, Gautam; Bhadra, Bhaskar; Sapre, Ajit; Dasgupta, Santanu

    2018-03-05

    A filamentous cyanobacteria, Geitlerinema sp. FC II, was isolated from marine algae culture pond at Reliance Industries Limited (RIL), India. The 6.7 Mb draft genome of FC II encodes for 6697 protein coding genes. Analysis of the whole genome sequence revealed presence of nif gene cluster, supporting its capability to fix atmospheric nitrogen. FC II genome contains two variants of sulfide:quinone oxidoreductases (SQR), which is a crucial elector donor in cyanobacterial metabolic processes. FC II is characterized by the presence of multiple CRISPR- Cas (Clustered Regularly Interspaced Short Palindrome Repeats - CRISPR associated proteins) clusters, multiple variants of genes encoding photosystem reaction centres, biosynthetic gene clusters of alkane, polyketides and non-ribosomal peptides. Presence of these pathways will help FC II in gaining an ecological advantage over other strains for biomass production in large scale cultivation system. Hence, FC II may be used for production of biofuel and other industrially important metabolites. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. Diversity amongst trigeminal neurons revealed by high throughput single cell sequencing

    PubMed Central

    Nguyen, Minh Q.; Wu, Youmei; Bonilla, Lauren S.; von Buchholtz, Lars J.

    2017-01-01

    The trigeminal ganglion contains somatosensory neurons that detect a range of thermal, mechanical and chemical cues and innervate unique sensory compartments in the head and neck including the eyes, nose, mouth, meninges and vibrissae. We used single-cell sequencing and in situ hybridization to examine the cellular diversity of the trigeminal ganglion in mice, defining thirteen clusters of neurons. We show that clusters are well conserved in dorsal root ganglia suggesting they represent distinct functional classes of somatosensory neurons and not specialization associated with their sensory targets. Notably, functionally important genes (e.g. the mechanosensory channel Piezo2 and the capsaicin gated ion channel Trpv1) segregate into multiple clusters and often are expressed in subsets of cells within a cluster. Therefore, the 13 genetically-defined classes are likely to be physiologically heterogeneous rather than highly parallel (i.e., redundant) lines of sensory input. Our analysis harnesses the power of single-cell sequencing to provide a unique platform for in silico expression profiling that complements other approaches linking gene-expression with function and exposes unexpected diversity in the somatosensory system. PMID:28957441

  13. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  14. Sequence determination and analysis of the NSs genes of two tospoviruses.

    PubMed

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  15. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  16. Role of Sequencing the Measles Virus Hemagglutinin Gene and Hypervariable Region in the Measles Outbreak Investigations in Sweden During 2013-2014.

    PubMed

    Harvala, Heli; Wiman, Åsa; Wallensten, Anders; Zakikhany, Katherina; Englund, Hélène; Brytting, Maria

    2016-02-15

    It is increasingly difficult to differentiate measles viruses (MeVs) relating to certain outbreaks on the basis of the nucleoprotein (N) gene sequence only, as the diversity of circulating MeV strains has decreased. We studied genomic regions that could provide better molecular discrimination between epidemiologically linked and unlinked MeV variants identified in Sweden during 2013-2014. The hemagglutinin (H) gene and hypervariable region between the fusion and matrix genes (MF-HVR) from 53 MeV-positive samples were amplified and sequenced. Data on phylogenetic clustering of MeVs on the basis of N, H, and MF-HVR sequences were compared to epidemiological data. MeVs were genotyped: 27 were B3, and 26 were D8. One genotype B3 cluster based on the N gene sequence contained epidemiologically unrelated viruses from 4 outbreaks, whereas analysis of H and MF-HVR sequences separated them into phylogenetic clusters consistent with the epidemiological data. Similarly, the single cluster of viruses with a genotype D8 N gene could be divided into the 5 outbreak groups on the basis of the phylogeny of MF-HVR sequences. A detailed picture of MeV circulation with more-defined links between outbreaks was obtained by sequencing the H gene and MF-HVR. Further identification and better genetic characterization of MeVs internationally is essential in identifying sources and routes of MeV spread within and beyond Europe in the elimination end game. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.

  17. Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States.

    PubMed

    Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A

    2017-01-01

    A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents.

  18. Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States

    PubMed Central

    Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S.; Allard, Marc W.; Brown, Eric W.; Strain, Errol A.

    2017-01-01

    A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents. PMID:28166293

  19. MytiBase: a knowledgebase of mussel (M. galloprovincialis) transcribed sequences

    PubMed Central

    Venier, Paola; De Pittà, Cristiano; Bernante, Filippo; Varotto, Laura; De Nardi, Barbara; Bovo, Giuseppe; Roch, Philippe; Novoa, Beatriz; Figueras, Antonio; Pallavicini, Alberto; Lanfranchi, Gerolamo

    2009-01-01

    Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST) sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel) challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01) was constructed as determined by the high rate of gene discovery (65.6%). Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database . Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels. PMID:19203376

  20. Characterization and analysis of a transcriptome from the boreal spider crab Hyas araneus.

    PubMed

    Harms, Lars; Frickenhaus, Stephan; Schiffer, Melanie; Mark, Felix C; Storch, Daniela; Pörtner, Hans-Otto; Held, Christoph; Lucassen, Magnus

    2013-12-01

    Research investigating the genetic basis of physiological responses has significantly broadened our understanding of the mechanisms underlying organismic response to environmental change. However, genomic data are currently available for few taxa only, thus excluding physiological model species from this approach. In this study we report the transcriptome of the model organism Hyas araneus from Spitsbergen (Arctic). We generated 20,479 transcripts, using the 454 GS FLX sequencing technology in combination with an Illumina HiSeq sequencing approach. Annotation by Blastx revealed 7159 blast hits in the NCBI non-redundant protein database. The comparison between the spider crab H. araneus transcriptome and EST libraries of the European lobster Homarus americanus and the porcelain crab Petrolisthes cinctipes yielded 3229/2581 sequences with a significant hit, respectively. The clustering by the Markov Clustering Algorithm (MCL) revealed a common core of 1710 clusters present in all three species and 5903 unique clusters for H. araneus. The combined sequencing approaches generated transcripts that will greatly expand the limited genomic data available for crustaceans. We introduce the MCL clustering for transcriptome comparisons as a simple approach to estimate similarities between transcriptomic libraries of different size and quality and to analyze homologies within the selected group of species. In particular, we identified a large variety of reverse transcriptase (RT) sequences not only in the H. araneus transcriptome and other decapod crustaceans, but also sea urchin, supporting the hypothesis of a heritable, anti-viral immunity and the proposed viral fragment integration by host-derived RTs in marine invertebrates. © 2013.

  1. Significance of flow clustering and sequencing on sediment transport: 1D sediment transport modelling

    NASA Astrophysics Data System (ADS)

    Hassan, Kazi; Allen, Deonie; Haynes, Heather

    2016-04-01

    This paper considers 1D hydraulic model data on the effect of high flow clusters and sequencing on sediment transport. Using observed flow gauge data from the River Caldew, England, a novel stochastic modelling approach was developed in order to create alternative 50 year flow sequences. Whilst the observed probability density of gauge data was preserved in all sequences, the order in which those flows occurred was varied using the output from a Hidden Markov Model (HMM) with generalised Pareto distribution (GP). In total, one hundred 50 year synthetic flow series were generated and used as the inflow boundary conditions for individual flow series model runs using the 1D sediment transport model HEC-RAS. The model routed graded sediment through the case study river reach to define the long-term morphological changes. Comparison of individual simulations provided a detailed understanding of the sensitivity of channel capacity to flow sequence. Specifically, each 50 year synthetic flow sequence was analysed using a 3-month, 6-month or 12-month rolling window approach and classified for clusters in peak discharge. As a cluster is described as a temporal grouping of flow events above a specified threshold, the threshold condition used herein is considered as a morphologically active channel forming discharge event. Thus, clusters were identified for peak discharges in excess of 10%, 20%, 50%, 100% and 150% of the 1 year Return Period (RP) event. The window of above-peak flows also required cluster definition and was tested for timeframes 1, 2, 10 and 30 days. Subsequently, clusters could be described in terms of the number of events, maximum peak flow discharge, cumulative flow discharge and skewness (i.e. a description of the flow sequence). The model output for each cluster was analysed for the cumulative flow volume and cumulative sediment transport (mass). This was then compared to the total sediment transport of a single flow event of equivalent flow volume. Results illustrate that clustered flood events generated sediment loads up to an order of magnitude greater than that of individual events of the same flood volume. Correlations were significant for sediment volume compared to both maximum flow discharge (R2<0.8) and number of events (R2 -0.5 to -0.7) within the cluster. The strongest correlations occurred for clusters with a greater number of flow events only slightly above-threshold. This illustrates that the numerical model can capture a degree of the non-linear morphological response to flow magnitude. Analysis of the relationship between morphological change and the skewness of flow events within each cluster was also determined, illustrating only minor sensitivity to cluster peak distribution skewness. This is surprising and discussion is presented on model limitations, including the capability of sediment transport formulae to effectively account for temporal processes of antecedent flow, hysteresis, local supply etc.

  2. Microbial and viral-like rhodopsins present in coastal marine sediments from four polar and subpolar regions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    López, José L.; Golemba, Marcelo; Hernández, Edgardo

    Rhodopsins are broadly distributed. In this work, we analyzed 23 metagenomes corresponding to marine sediment samples from four regions that share cold climate conditions (Norway; Sweden; Argentina and Antarctica). In order to investigate the genes evolution of viral rhodopsins, an initial set of 6224 bacterial rhodopsin sequences according to COG5524 were retrieved from the 23 metagenomes. After selection by the presence of transmembrane domains and alignment, 123 viral (51) and non-viral (72) sequences (>50 amino acids) were finally included in further analysis. Viral rhodopsin genes were homologs of Phaeocystis globosa virus and Organic lake Phycodnavirus. Non-viral microbial rhodopsin genes weremore » ascribed to Bacteroidetes, Planctomycetes, Firmicutes, Actinobacteria, Cyanobacteria, Proteobacteria, Deinococcus-Thermus and Cryptophyta and Fungi. A rescreening using Blastp, using as queries the viral sequences previously described, retrieved 30 sequences (>100 amino acids). Phylogeographic analysis revealed a geographical clustering of the sequences affiliated to the viral group. This clustering was not observed for the microbial non-viral sequences. The phylogenetic reconstruction allowed us to propose the existence of a putative ancestor of viral rhodopsin genes related to Actinobacteria and Chloroflexi. This is the first report about the existence of a phylogeographic association of the viral rhodopsin sequences from marine sediments.« less

  3. Diversity of the small subunit ribosomal RNA gene of the arbuscular mycorrhizal fungi colonizing Clintonia borealis from a mixed-wood boreal forest.

    PubMed

    DeBellis, Tonia; Widden, Paul

    2006-11-01

    Arbuscular mycorrhizal fungi (AMF) communities in Clintonia borealis roots from a boreal mixed forests in northwestern Québec were investigated. Roots were sampled from 100 m2 plots whose overstory was dominated by either trembling aspen (Populus tremuloides Michx.), white birch (Betula papyrifera Marsh.), or mixed white spruce (Picea glauca (Moench) Voss) and balsam fir (Abies balsamea (L.) Mill.). Part of the 18S ribosomal gene of the AMF was amplified and the resulting PCR products were cloned. Restriction analysis of the 576 resulting clones yielded 92 different restriction patterns which were then sequenced. Fifty-two sequences closely matched other Glomus sequences from Genbank. Phylogenetic analysis revealed 10 different AMF sequence types, most of which clustered with other uncultured AM sequences from plant roots from various field sites. Compared with other AMF communities from comparable studies, richness and diversity were higher than observed in an arable field, but lower than seen in a tropical forest and a temperate wetland. The AMF communities from Clintonia roots under the different canopy types did not differ significantly and the dominant sequence type, which clustered with AM sequences from a variety of environments and hosts at distant geographical locations, represented 66.9% of all the clones analyzed.

  4. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    PubMed Central

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  5. MAGIC database and interfaces: an integrated package for gene discovery and expression.

    PubMed

    Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H

    2004-01-01

    The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.

  6. Identification of the Monooxygenase Gene Clusters Responsible for the Regioselective Oxidation of Phenol to Hydroquinone in Mycobacteria▿

    PubMed Central

    Furuya, Toshiki; Hirose, Satomi; Osanai, Hisashi; Semba, Hisashi; Kino, Kuniki

    2011-01-01

    Mycobacterium goodii strain 12523 is an actinomycete that is able to oxidize phenol regioselectively at the para position to produce hydroquinone. In this study, we investigated the genes responsible for this unique regioselective oxidation. On the basis of the fact that the oxidation activity of M. goodii strain 12523 toward phenol is induced in the presence of acetone, we first identified acetone-induced proteins in this microorganism by two-dimensional electrophoretic analysis. The N-terminal amino acid sequence of one of these acetone-induced proteins shares 100% identity with that of the protein encoded by the open reading frame Msmeg_1971 in Mycobacterium smegmatis strain mc2155, whose genome sequence has been determined. Since Msmeg_1971, Msmeg_1972, Msmeg_1973, and Msmeg_1974 constitute a putative binuclear iron monooxygenase gene cluster, we cloned this gene cluster of M. smegmatis strain mc2155 and its homologous gene cluster found in M. goodii strain 12523. Sequence analysis of these binuclear iron monooxygenase gene clusters revealed the presence of four genes designated mimABCD, which encode an oxygenase large subunit, a reductase, an oxygenase small subunit, and a coupling protein, respectively. When the mimA gene (Msmeg_1971) of M. smegmatis strain mc2155, which was also found to be able to oxidize phenol to hydroquinone, was deleted, this mutant lost the oxidation ability. This ability was restored by introduction of the mimA gene of M. smegmatis strain mc2155 or of M. goodii strain 12523 into this mutant. Interestingly, we found that these gene clusters also play essential roles in propane and acetone metabolism in these mycobacteria. PMID:21183637

  7. Untangling Magmatic Processes and Hydrothermal Alteration of in situ Superfast Spreading Ocean Crust at ODP/IODP Site 1256 with Fuzzy c-means Cluster Analysis of Rock Magnetic Properties

    NASA Astrophysics Data System (ADS)

    Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.

    2014-12-01

    Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.

  8. Whole-Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non-Salted Fermented Soybean Foods.

    PubMed

    Kamada, Mayumi; Hase, Sumitaka; Fujii, Kazushi; Miyake, Masato; Sato, Kengo; Kimura, Keitarou; Sakakibara, Yasubumi

    2015-01-01

    Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from "Tua Nao" of Thailand traces a different evolutionary process from other strains.

  9. Morphological and Inter Simple Sequence Repeat (ISSR) markers analyses of Corynespora cassiicola isolates from rubber plantations in Malaysia.

    PubMed

    Nghia, Nguyen Anh; Kadir, Jugah; Sunderasan, E; Puad Abdullah, Mohd; Malik, Adam; Napis, Suhaimi

    2008-10-01

    Morphological features and Inter Simple Sequence Repeat (ISSR) polymorphism were employed to analyse 21 Corynespora cassiicola isolates obtained from a number of Hevea clones grown in rubber plantations in Malaysia. The C. cassiicola isolates used in this study were collected from several states in Malaysia from 1998 to 2005. The morphology of the isolates was characteristic of that previously described for C. cassiicola. Variations in colony and conidial morphology were observed not only among isolates but also within a single isolate with no inclination to either clonal or geographical origin of the isolates. ISSR analysis delineated the isolates into two distinct clusters. The dendrogram created from UPGMA analysis based on Nei and Li's coefficient (calculated from the binary matrix data of 106 amplified DNA bands generated from 8 ISSR primers) showed that cluster 1 encompasses 12 isolates from the states of Johor and Selangor (this cluster was further split into 2 sub clusters (1A, 1B), sub cluster 1B consists of a unique isolate, CKT05D); while cluster 2 comprises of 9 isolates that were obtained from the other states. Detached leaf assay performed on selected Hevea clones showed that the pathogenicity of representative isolates from cluster 1 (with the exception of CKT05D) resembled that of race 1; and isolates in cluster 2 showed pathogenicity similar to race 2 of the fungus that was previously identified in Malaysia. The isolate CKT05D from sub cluster 1B showed pathogenicity dissimilar to either race 1 or race 2.

  10. Comparative Genomic Hybridization Analysis of Two Predominant Nordic Group I (Proteolytic) Clostridium botulinum Type B Clusters▿ †

    PubMed Central

    Lindström, Miia; Hinderink, Katja; Somervuo, Panu; Kiviniemi, Katri; Nevas, Mari; Chen, Ying; Auvinen, Petri; Carter, Andrew T.; Mason, David R.; Peck, Michael W.; Korkeala, Hannu

    2009-01-01

    Comparative genomic hybridization analysis of 32 Nordic group I Clostridium botulinum type B strains isolated from various sources revealed two homogeneous clusters, clusters BI and BII. The type B strains differed from reference strain ATCC 3502 by 413 coding sequence (CDS) probes, sharing 88% of all the ATCC 3502 genes represented on the microarray. The two Nordic type B clusters differed from each other by their response to 145 CDS probes related mainly to transport and binding, adaptive mechanisms, fatty acid biosynthesis, the cell membranes, bacteriophages, and transposon-related elements. The most prominent differences between the two clusters were related to resistance to toxic compounds frequently found in the environment, such as arsenic and cadmium, reflecting different adaptive responses in the evolution of the two clusters. Other relatively variable CDS groups were related to surface structures and the gram-positive cell wall, suggesting that the two clusters possess different antigenic properties. All the type B strains carried CDSs putatively related to capsule formation, which may play a role in adaptation to different environmental and clinical niches. Sequencing showed that representative strains of the two type B clusters both carried subtype B2 neurotoxin genes. As many of the type B strains studied have been isolated from foods or associated with botulism, it is expected that the two group I C. botulinum type B clusters present a public health hazard in Nordic countries. Knowing the genetic and physiological markers of these clusters will assist in targeting control measures against these pathogens. PMID:19270141

  11. Discrimination of multilocus sequence typing-based Campylobacter jejuni subgroups by MALDI-TOF mass spectrometry.

    PubMed

    Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver

    2013-11-07

    Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.

  12. The biological characteristics of predominant strains of HIV-1 genotype: modeling of HIV-1 infection among men who have sex with men.

    PubMed

    Dai, Di; Shang, Hong; Han, Xiao-Xu; Zhao, Bin; Liu, Jing; Ding, Hai-Bo; Xu, Jun-Jie; Chu, Zhen-Xing

    2015-04-01

    To investigate the molecular subtypes of prevalent HIV-1 strains and characterize the genetics of dominant strains among men who have sex with men. Molecular epidemiology surveys in this study concentrated on the prevalent HIV-1 strains in Liaoning province by year. 229 adult patients infected with HIV-1 and part of a high-risk group of men who have sex with men were recruited. Reverse transcription and nested PCR amplification were performed. Sequencing reactions were conducted and edited, followed by codon-based alignment. NJ phylogenetic tree analyses detected two distinct CRF01_AE phylogenetic clusters, designated clusters 1 and 2. Clusters 1 and 2 accounted for 12.8% and 84.2% of sequences in the pol gene and 17.6% and 73.1% of sequences in the env gene, respectively. Another six samples were distributed on other phylogenetic clusters. Cluster 1 increased significantly from 5.6% to 20.0%, but cluster 2 decreased from 87.5% to 80.0%. Genetic distance analysis indicated that CRF01_AE cluster 1 in Liaoning was homologous to epidemic CRF01_AE strains, but CRF01_AE cluster 2 was different from other scattered strains. Additionally, significant differences were found in tetra-peptide motifs at the tip of V3 loop between cluster 1 and 2; however, differences in coreceptor usage were not detected. This study shows that subtype CRF01_AE strain may be the most prevalent epidemic strain in the men who have sex with men. Genetic characteristics of the subtype CRF01_AE cluster strain in Liaoning showed homology to the prevalent strains of men who have sex with men in other parts of China. © 2015 Wiley Periodicals, Inc.

  13. Identification of the Coumermycin A1 Biosynthetic Gene Cluster of Streptomyces rishiriensis DSM 40489

    PubMed Central

    Wang, Zhao-Xin; Li, Shu-Ming; Heide, Lutz

    2000-01-01

    The biosynthetic gene cluster of the aminocoumarin antibiotic coumermycin A1 was cloned by screening of a cosmid library of Streptomyces rishiriensis DSM 40489 with heterologous probes from a dTDP-glucose 4,6-dehydratase gene, involved in deoxysugar biosynthesis, and from the aminocoumarin resistance gyrase gene gyrBr. Sequence analysis of a 30.8-kb region upstream of gyrBr revealed the presence of 28 complete open reading frames (ORFs). Fifteen of the identified ORFs showed, on average, 84% identity to corresponding ORFs in the biosynthetic gene cluster of novobiocin, another aminocoumarin antibiotic. Possible functions of 17 ORFs in the biosynthesis of coumermycin A1 could be assigned by comparison with sequences in GenBank. Experimental proof for the function of the identified gene cluster was provided by an insertional gene inactivation experiment, which resulted in an abolishment of coumermycin A1 production. PMID:11036020

  14. Leptospira interrogans serovars Bratislava and Muenchen animal infections: Implications for epidemiology and control.

    PubMed

    Arent, Z; Frizzell, C; Gilmore, C; Allen, A; Ellis, W A

    2016-07-15

    Strains of Leptospira interrogans belonging to two very closely related serovars - Bratislava and Muenchen - have been associated with disease in domestic animals, in particular pigs, but also in horses and dogs. Similar strains have also been recovered from various wildlife species. Their epidemiology is poorly understood. Two hundred and forty seven such isolates, from UK domestic animal and wildlife species, were examined by restriction endonuclease analysis in an attempt to elucidate their epidemiology. A representative sub-sample of 65 of these isolates was further examined by multiple-locus variable-number tandem repeat analysis and 22 by secY sequencing. Ten restriction pattern types were identified. The majority of isolates fell into one of three restriction endonuclease analysis pattern types designated B2a, B2b and M2a. B2a was ubiquitous and was isolated from 10 species and represented the majority of the horse and all dog isolates. B2b was very different, being isolated only from pigs, indicating that this type was maintained by pigs. The pattern M2a was reported for the majority of isolates from pigs but also was common in small rodents isolates. Five restriction pattern types were found only in wildlife suggesting that they are unlikely to pose a disease threat to domestic animals. Multiple-locus variable-number tandem repeat analysis identified six clusters. The REA types B2a and B2b were all found in one MLVA cluster while the majority of the M2a strains examined occurred in another cluster. The secY sequencing detected only one sequence type, clustered with other serovars of Leptospira interrogans. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana)

    PubMed Central

    2010-01-01

    Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079

  16. Reconstructing evolutionary trees in parallel for massive sequences.

    PubMed

    Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

    2017-12-14

    Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .

  17. Analysis of 16S libraries of mouse gastrointestinal microflora reveals a large new group of mouse intestinal bacteria.

    PubMed

    Salzman, Nita H; de Jong, Hendrik; Paterson, Yvonne; Harmsen, Hermie J M; Welling, Gjalt W; Bos, Nicolaas A

    2002-11-01

    Total genomic DNA from samples of intact mouse small intestine, large intestine, caecum and faeces was used as template for PCR amplification of 16S rRNA gene sequences with conserved bacterial primers. Phylogenetic analysis of the amplification products revealed 40 unique 16S rDNA sequences. Of these sequences, 25% (10/40) corresponded to described intestinal organisms of the mouse, including Lactobacillus spp., Helicobacter spp., segmented filamentous bacteria and members of the altered Schaedler flora (ASF360, ASF361, ASF502 and ASF519); 75% (30/40) represented novel sequences. A large number (11/40) of the novel sequences revealed a new operational taxonomic unit (OTU) belonging to the Cytophaga-Flavobacter-Bacteroides phylum, which the authors named 'mouse intestinal bacteria'. 16S rRNA probes were developed for this new OTU. Upon analysis of the novel sequences, eight were found to cluster within the Eubacterium rectale-Clostridium coccoides group and three clustered within the Bacteroides group. One of the novel sequences was distantly related to Verrucomicrobium spinosum and one was distantly related to Bacillus mycoides. Oligonucleotide probes specific for the 16S rRNA of these novel clones were generated. Using a combination of four previously described and four newly designed probes, approximately 80% of bacteria recovered from the murine large intestine and 71% of bacteria recovered from the murine caecum could be identified by fluorescence in situ hybridization (FISH).

  18. Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

    PubMed Central

    Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2014-01-01

    Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900

  19. Improved efficiency in amplification of Escherichia coli o-antigen gene clusters using genome-wide sequence comparison

    USDA-ARS?s Scientific Manuscript database

    Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...

  20. Analysis of the gene cluster encoding toluene/o-xylene monooxygenase from Pseudomonas stutzeri OX1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertoni, G.; Martino, M.; Galli, E.

    The toluene/o-xylene monooxygenase cloned from Pseudomonas stutzeri OX1 displays a very broad range of substrates and a very peculiar regioselectivity, because it is able to hydroxylate more than one position on the aromatic ring of several hydrocarbons and phenols. The nucleotide sequence of the gene cluster coding for this enzymatic system has been determined. The sequence analysis revealed the presence of six open reading frames (ORFs) homologous to other genes clustered in operons coding for multicomponent monooxygenases found in benzene- and toluene-degradative pathways cloned from Pseudomonas strains. Significant similarities were also found with multicomponent monooxygenase systems for phenol, methane, alkene,more » and dimethyl sulfide cloned from different bacterial strains. The knockout of each ORF and complementation with the wild-type allele indicated that all six ORFs are essential for the full activity of the toluene/o-xylene monooxygenase in Escherichia coli. This analysis also shows that despite its activity on both hydrocarbons and phenols, toluene/o-xylene monooxygenase belongs to a toluene multicomponent monooxygenase subfamily rather than to the monooxygenases active on phenols.« less

  1. Phylogeny of isolates of Prunus necrotic ringspot virus from the Ilarvirus Ringtest and identification of group-specific features.

    PubMed

    Hammond, R W

    2003-06-01

    Isolates of Prunus necrotic ringspot virus (PNRSV) were examined to establish the level of naturally occurring sequence variation in the coat protein (CP) gene and to identify group-specific genome features that may prove valuable for the generation of diagnostic reagents. Phylogenetic analysis of a 452 bp sequence of 68 virus isolates, 20 obtained from the European Union Ilarvirus Ringtest held in October 1998, confirmed the clustering of the isolates into three distinct groups. Although no correlation was found between the sequence and host or geographic origin, there was a general trend for severe isolates to cluster into one group. Group-specific features have been identified for discrimination between virus strains.

  2. New Insights into the Diversity of the Genus Faecalibacterium.

    PubMed

    Benevides, Leandro; Burman, Sriti; Martin, Rebeca; Robert, Véronique; Thomas, Muriel; Miquel, Sylvie; Chain, Florian; Sokol, Harry; Bermudez-Humaran, Luis G; Morrison, Mark; Langella, Philippe; Azevedo, Vasco A; Chatel, Jean-Marc; Soares, Siomar

    2017-01-01

    Faecalibacterium prausnitzii is a commensal bacterium, ubiquitous in the gastrointestinal tracts of animals and humans. This species is a functionally important member of the microbiota and studies suggest it has an impact on the physiology and health of the host. F. prausnitzii is the only identified species in the genus Faecalibacterium , but a recent study clustered strains of this species in two different phylogroups. Here, we propose the existence of distinct species in this genus through the use of comparative genomics. Briefly, we performed analyses of 16S rRNA gene phylogeny, phylogenomics, whole genome Multi-Locus Sequence Typing (wgMLST), Average Nucleotide Identity (ANI), gene synteny, and pangenome to better elucidate the phylogenetic relationships among strains of Faecalibacterium . For this, we used 12 newly sequenced, assembled, and curated genomes of F. prausnitzii , which were isolated from feces of healthy volunteers from France and Australia, and combined these with published data from 5 strains downloaded from public databases. The phylogenetic analysis of the 16S rRNA sequences, together with the wgMLST profiles and a phylogenomic tree based on comparisons of genome similarity, all supported the clustering of Faecalibacterium strains in different genospecies. Additionally, the global analysis of gene synteny among all strains showed a highly fragmented profile, whereas the intra-cluster analyses revealed larger and more conserved collinear blocks. Finally, ANI analysis substantiated the presence of three distinct clusters-A, B, and C-composed of five, four, and four strains, respectively. The pangenome analysis of each cluster corroborated the classification of these clusters into three distinct species, each containing less variability than that found within the global pangenome of all strains. Here, we propose that comparison of pangenome subsets and their associated α values may be used as an alternative approach, together with ANI, in the in silico classification of new species. Altogether, our results provide evidence not only for the reconsideration of the phylogenetic and genomic relatedness among strains currently assigned to F. prausnitzii , but also the need for lineage (strain-based) differentiation of this taxon to better define how specific members might be associated with positive or negative host interactions.

  3. Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239

    PubMed Central

    Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki

    2011-01-01

    We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486

  4. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    PubMed

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  5. A segmentation method for lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise

    PubMed Central

    Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian

    2017-01-01

    The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916

  6. Naturally selected hepatitis C virus polymorphisms confer broad neutralizing antibody resistance.

    PubMed

    Bailey, Justin R; Wasilewski, Lisa N; Snider, Anna E; El-Diwany, Ramy; Osburn, William O; Keck, Zhenyong; Foung, Steven K H; Ray, Stuart C

    2015-01-01

    For hepatitis C virus (HCV) and other highly variable viruses, broadly neutralizing mAbs are an important guide for vaccine development. The development of resistance to anti-HCV mAbs is poorly understood, in part due to a lack of neutralization testing against diverse, representative panels of HCV variants. Here, we developed a neutralization panel expressing diverse, naturally occurring HCV envelopes (E1E2s) and used this panel to characterize neutralizing breadth and resistance mechanisms of 18 previously described broadly neutralizing anti-HCV human mAbs. The observed mAb resistance could not be attributed to polymorphisms in E1E2 at known mAb-binding residues. Additionally, hierarchical clustering analysis of neutralization resistance patterns revealed relationships between mAbs that were not predicted by prior epitope mapping, identifying 3 distinct neutralization clusters. Using this clustering analysis and envelope sequence data, we identified polymorphisms in E2 that confer resistance to multiple broadly neutralizing mAbs. These polymorphisms, which are not at mAb contact residues, also conferred resistance to neutralization by plasma from HCV-infected subjects. Together, our method of neutralization clustering with sequence analysis reveals that polymorphisms at noncontact residues may be a major immune evasion mechanism for HCV, facilitating viral persistence and presenting a challenge for HCV vaccine development.

  7. Insights into magmatic processes and hydrothermal alteration of in situ superfast spreading ocean crust at ODP/IODP site 1256 from a cluster analysis of rock magnetic properties

    NASA Astrophysics Data System (ADS)

    Dekkers, Mark J.; Heslop, David; Herrero-Bervera, Emilio; Acton, Gary; Krasa, David

    2014-08-01

    We analyze magnetic properties from Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6°44.1' N, 91°56.1' W) on the Cocos Plate in ˜15.2 Ma oceanic crust generated by superfast seafloor spreading, the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Fuzzy c-means cluster analysis and nonlinear mapping are utilized to study down-hole trends in the ratio of the saturation remanent magnetization and the saturation magnetization, the coercive force, the ratio of the remanent coercive force and coercive force, the low-field magnetic susceptibility, and the Curie temperature, to evaluate the effects of magmatic and hydrothermal processes on magnetic properties. A statistically robust five cluster solution separates the data predominantly into three clusters that express increasing hydrothermal alteration of the lavas, which differ from two distinct clusters mainly representing the dikes and gabbros. Extensive alteration can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. Thus, the analysis complements interpretation based on electrofacies analysis. All clusters display rock magnetic characteristics compatible with an ability to retain a stable natural remanent magnetization suggesting that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Paleointensity determination is difficult because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.

  8. Analysis of the intergenic region of tomato spotted wilt Tospovirus medium RNA segment.

    PubMed

    Bhat, A I; Pappu, S S; Pappu, H R; Deom, C M; Culbreath, A K

    1999-06-01

    The intergenic region (IGR) of the medium (M) RNA of tomato spotted wilt Tospovirus (TSWV) isolates naturally infecting peanut (groundnut), pepper, potato, stokesia, tobacco and watermelon in Georgia (GA) and a peanut isolate from Florida (FL) was cloned and sequenced. The IGR sequences were compared with one another and with respective M RNA IGRs of TSWV isolates from Brazil and Japan and other tospoviruses. The length of M IGR of GA and FL isolates varied from 271 to 277 nucleotides. The M IGRs of TSWV from potato and stokesia, and tobacco and watermelon were identical with each other in their length and sequence. IGR sequences were more conserved (95-100%) among the populations of TSWV from GA and FL, than when compared with those of TSWV isolates from other countries (83-94%). The conserved motif (CAAACTTTGG) present in the IGRs of both M and small (S) RNAs of a Brazilian isolate of TSWV was also conserved in the isolates studied. Cluster analysis of the IGR sequences showed that all GA and FL isolates are closely clustered and are distinct from the TSWV isolates from other countries as well as from other tospoviruses.

  9. 16S-23S rRNA gene internal transcribed spacer sequences for analysis of the phylogenetic relationships among species of the genus Porphyromonas.

    PubMed

    Conrads, Georg; Citron, Diane M; Tyrrell, Kerin L; Horz, Hans-Peter; Goldstein, Ellie J C

    2005-03-01

    The 16S-23S rRNA gene internal transcribed spacer (ITS) regions of 11 reference strains of Porphyromonas species, together with Bacteroides distasonis and Tannerella forsythensis, were analysed to examine interspecies relationships. Compared with the phylogenetic tree generated using 16S rRNA gene sequences, the resolution of the ITS sequence-based tree was higher, but species positioning and clustering were similar with both approaches. The recent separation of Porphyromonas gulae and Porphyromonas gingivalis into distinct species was confirmed by the ITS data. In addition, analysis of the ITS sequences of 24 clinical isolates of Porphyromonas asaccharolytica plus the type strain ATCC 25260(T) divided the sequences into two clusters, of which one was alpha-fucosidase-positive (like the type strain) while the other was alpha-fucosidase-negative. The latter resembled the previously studied unusual extra-oral isolates of 'Porphyromonas endodontalis-like organisms' (PELOs) which could therefore be called 'Porphyromonas asaccharolytica-like organisms' (PALOs), based on the genetic identification. Moreover, the proposal of alpha-fucosidase-negative P. asaccharolytica strains as a new species should also be considered.

  10. DNA sequence analysis of the photosynthesis region of Rhodobacter sphaeroides 2.4.1.

    PubMed

    Choudhary, M; Kaplan, S

    2000-02-15

    This paper describes the DNA sequence of the photosynthesis region of Rhodobacter sphaeroides 2.4.1 (T). The photosynthesis gene cluster is located within a approximately 73 kb Ase I genomic DNA fragment containing the puf, puhA, cycA and puc operons. A total of 65 open reading frames (ORFs) have been identified, of which 61 showed significant similarity to genes/proteins of other organisms while only four did not reveal any significant sequence similarity to any gene/protein sequences in the database. The data were compared with the corresponding genes/ORFs from a different strain of R.sphaeroides and Rhodobacter capsulatus, a close relative of R. sphaeroides. A detailed analysis of the gene organization in the photosynthesis region revealed a similar gene order in both species with some notable differences located to the pucBAC = cycA region. In addition, photosynthesis gene regulatory protein (PpsR, FNR, IHF) binding motifs in upstream sequences of a number of photosynthesis genes have been identified and shown to differ between these two species. The difference in gene organization relative to pucBAC and cycA suggests that this region originated independently of the photosynthesis gene cluster of R.sphaeroides.

  11. The sirodesmin biosynthetic gene cluster of the plant pathogenic fungus Leptosphaeria maculans.

    PubMed

    Gardiner, Donald M; Cozijnsen, Anton J; Wilson, Leanne M; Pedras, M Soledade C; Howlett, Barbara J

    2004-09-01

    Sirodesmin PL is a phytotoxin produced by the fungus Leptosphaeria maculans, which causes blackleg disease of canola (Brassica napus). This phytotoxin belongs to the epipolythiodioxopiperazine (ETP) class of toxins produced by fungi including mammalian and plant pathogens. We report the cloning of a cluster of genes with predicted roles in the biosynthesis of sirodesmin PL and show via gene disruption that one of these genes (encoding a two-module non-ribosomal peptide synthetase) is essential for sirodesmin PL biosynthesis. Of the nine genes in the cluster tested, all are co-regulated with the production of sirodesmin PL in culture. A similar cluster is present in the genome of the opportunistic human pathogen Aspergillus fumigatus and is most likely responsible for the production of gliotoxin, which is also an ETP. Homologues of the genes in the cluster were also identified in expressed sequence tags of the ETP producing fungus Chaetomium globosum. Two other fungi with publicly available genome sequences, Magnaporthe grisea and Fusarium graminearum, had similar gene clusters. A comparative analysis of all four clusters is presented. This is the first report of the genes responsible for the biosynthesis of an ETP. Copyright 2004 Blackwell Publishing Ltd

  12. Genotypes and subgenotypes of hepatitis B virus circulating in an endemic area in Peru.

    PubMed

    Ramírez-Soto, Max Carlos; Bracho, Maria Alma; González-Candelas, Fernando; Huichi-Atamari, Milagros

    2018-01-01

    Although hepatitis B virus (HBV) infection is still endemic in Abancay, Peru, two decades after vaccination against hepatitis B started in the area, little is known about the diversity and circulation of genotypes and subgenotypes of the virus. To identify the genotypes and subtypes of HBV circulating in Abancay, complete genome sequences of 11 treatment-naive HBV-infected patients were obtained, and phylogenetic analysis was conducted with these and additional sequences from GenBank. Genotyping revealed the presence of genotype F in all the samples from Abancay. Subgenotype F1b was dominant and only one isolate belonged to subgenotype F4, which represents the first description of this subgenotype in Peru. Phylogenetic analysis revealed that most subgenotype F1b isolates from Peru clustered in a subgroup along with two sequences from Argentina, whereas two clusters with two HBV/F1b sequences each were indicative of recent epidemiological linkage, but only one could be verified by independent data. These results suggest that the HBV subgenotype F1b seems to be the predominant subgenotype in Abancay, Peru.

  13. Genome sequencing reveals complex secondary metabolome in themarine actinomycete Salinispora tropica

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Udwary, Daniel W.; Zeigler, Lisa; Asolkar, Ratnakar

    2007-05-01

    Recent fermentation studies have identified actinomycetes ofthe marine-dwelling genus Salinispora as prolific natural productproducers. To further evaluate their biosynthetic potential, we analyzedall identifiable secondary natural product gene clusters from therecently sequenced 5,184,724 bp S. tropica CNB-440 circular genome. Ouranalysis shows that biosynthetic potential meets or exceeds that shown byprevious Streptomyces genome sequences as well as other naturalproduct-producing actinomycetes. The S. tropica genome features ninepolyketide synthase systems of every known formally classified family,non-ribosomal peptide synthetases and several hybrid clusters. While afew clusters appear to encode molecules previously identified inStreptomyces species,the majority of the 15 biosynthetic loci are novel.Specific chemical information aboutmore » putative and observed natural productmolecules is presented and discussed. In addition, our bioinformaticanalysis was critical for the structure elucidation of the novelpolyenemacrolactam salinilactam A. This study demonstrates the potentialfor genomic analysis to complement and strengthen traditional naturalproduct isolation studies and firmly establishes the genus Salinispora asa rich source of novel drug-like molecules.« less

  14. Evolution of massive stars in very young clusters and associations

    NASA Technical Reports Server (NTRS)

    Stothers, R. B.

    1985-01-01

    Statistics concerning the stellar content of young galactic clusters and associations which show well defined main sequence turnups have been analyzed in order to derive information about stellar evolution in high-mass galaxies. The analytical approach is semiempirical and uses natural spectroscopic groups of stars on the H-R diagram together with the stars' apparent magnitudes. The new approach does not depend on absolute luminosities and requires only the most basic elements of stellar evolution theory. The following conclusions are offered on the basis of the statistical analysis: (1) O-tupe main-sequence stars evolve to a spectral type of B1 during core hydrogen burning; (2) most O-type blue stragglers are newly formed massive stars burning core hydrogen; (3) supergiants lying redward of the main-sequence turnup are burning core helium; and most Wolf-Rayet stars are burning core helium and originally had masses greater than 30-40 solar mass. The statistics of the natural spectroscopic stars in young galactic clusters and associations are given in a table.

  15. A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project.

    PubMed

    Yebra, Gonzalo; Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R Bridget; Waters, Laura; Tong, C Y William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J

    2018-01-01

    The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters.

  16. Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis and proposals to emend the description of Streptomyces albus and describe Streptomyces pathocidini sp. nov

    USDA-ARS?s Scientific Manuscript database

    In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T forms a cluster with 5 other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these oth...

  17. The complete genome sequence of a south Indian isolate of Rice tungro spherical virus reveals evidence of genetic recombination between distinct isolates.

    PubMed

    Sailaja, B; Anjum, Najreen; Patil, Yogesh K; Agarwal, Surekha; Malathi, P; Krishnaveni, D; Balachandran, S M; Viraktamath, B C; Mangrauthia, Satendra K

    2013-12-01

    In this study, complete genome of a south Indian isolate of Rice tungro spherical virus (RTSV) from Andhra Pradesh (AP) was sequenced, and the predicted amino acid sequence was analysed. The RTSV RNA genome consists of 12,171 nt without the poly(A) tail, encoding a putative typical polyprotein of 3,470 amino acids. Furthermore, cleavage sites and sequence motifs of the polyprotein were predicted. Multiple alignment with other RTSV isolates showed a nucleotide sequence identity of 95% to east Indian isolates and 90% to Philippines isolates. A phylogenetic tree based on complete genome sequence showed that Indian isolates clustered together, while Vt6 and PhilA isolates of Philippines formed two separate clusters. Twelve recombination events were detected in RNA genome of RTSV using the Recombination Detection Program version 3. Recombination analysis suggested significant role of 5' end and central region of genome in virus evolution. Further, AP and Odisha isolates appeared as important RTSV isolates involved in diversification of this virus in India through recombination phenomenon. The new addition of complete genome of first south Indian isolate provided an opportunity to establish the molecular evolution of RTSV through recombination analysis and phylogenetic relationship.

  18. Approximation algorithm for the problem of partitioning a sequence into clusters

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Mikhailova, L. V.; Khamidullin, S. A.; Khandeev, V. I.

    2017-08-01

    We consider the problem of partitioning a finite sequence of Euclidean points into a given number of clusters (subsequences) using the criterion of the minimal sum (over all clusters) of intercluster sums of squared distances from the elements of the clusters to their centers. It is assumed that the center of one of the desired clusters is at the origin, while the center of each of the other clusters is unknown and determined as the mean value over all elements in this cluster. Additionally, the partition obeys two structural constraints on the indices of sequence elements contained in the clusters with unknown centers: (1) the concatenation of the indices of elements in these clusters is an increasing sequence, and (2) the difference between an index and the preceding one is bounded above and below by prescribed constants. It is shown that this problem is strongly NP-hard. A 2-approximation algorithm is constructed that is polynomial-time for a fixed number of clusters.

  19. HIV-1 subtype F1 epidemiological networks among Italian heterosexual males are associated with introduction events from South America.

    PubMed

    Lai, Alessia; Simonetti, Francesco R; Zehender, Gianguglielmo; De Luca, Andrea; Micheli, Valeria; Meraviglia, Paola; Corsi, Paola; Bagnarelli, Patrizia; Almi, Paolo; Zoncada, Alessia; Paolucci, Stefania; Gonnelli, Angela; Colao, Grazia; Tacconi, Danilo; Franzetti, Marco; Ciccozzi, Massimo; Zazzi, Maurizio; Balotta, Claudia

    2012-01-01

    About 40% of the Italian HIV-1 epidemic due to non-B variants is sustained by F1 clade, which circulates at high prevalence in South America and Eastern Europe. Aim of this study was to define clade F1 origin, population dynamics and epidemiological networks through phylogenetic approaches. We analyzed pol sequences of 343 patients carrying F1 subtype stored in the ARCA database from 1998 to 2009. Citizenship of patients was as follows: 72.6% Italians, 9.3% South Americans and 7.3% Rumanians. Heterosexuals, Homo-bisexuals, Intravenous Drug Users accounted for 58.1%, 24.0% and 8.8% of patients, respectively. Phylogenetic analysis indicated that 70% of sequences clustered in 27 transmission networks. Two distinct groups were identified; the first clade, encompassing 56 sequences, included all Rumanian patients. The second group involved the remaining clusters and included 10 South American Homo-bisexuals in 9 distinct clusters. Heterosexual modality of infection was significantly associated with the probability to be detected in transmission networks. Heterosexuals were prevalent either among Italians (67.2%) or Rumanians (50%); by contrast, Homo-bisexuals accounted for 71.4% of South Americans. Among patients with resistant strains the proportion of clustering sequences was 57.1%, involving 14 clusters (51.8%). Resistance in clusters tended to be higher in South Americans (28.6%) compared to Italian (17.7%) and Rumanian patients (14.3%). A striking proportion of epidemiological networks could be identified in heterosexuals carrying F1 subtype residing in Italy. Italian Heterosexual males predominated within epidemiological clusters while foreign patients were mainly Heterosexual Rumanians, both males and females, and South American Homo-bisexuals. Tree topology suggested that F1 variant from South America gave rise to the Italian F1 epidemic through multiple introduction events. The contact tracing also revealed an unexpected burden of resistance in epidemiological clusters underlying the need of public interventions to limit the spread of non-B subtypes and transmitted drug resistance.

  20. Evolution of coding and non-coding genes in HOX clusters of a marsupial.

    PubMed

    Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

    2012-06-18

    The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.

  1. Evolution of coding and non-coding genes in HOX clusters of a marsupial

    PubMed Central

    2012-01-01

    Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672

  2. [EST-SSR identification, markers development of Ligusticum chuanxiong based on Ligusticum chuanxiong transcriptome sequences].

    PubMed

    Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao

    2017-09-01

    Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.

  3. An unsupervised classification approach for analysis of Landsat data to monitor land reclamation in Belmont county, Ohio

    NASA Technical Reports Server (NTRS)

    Brumfield, J. O.; Bloemer, H. H. L.; Campbell, W. J.

    1981-01-01

    Two unsupervised classification procedures for analyzing Landsat data used to monitor land reclamation in a surface mining area in east central Ohio are compared for agreement with data collected from the corresponding locations on the ground. One procedure is based on a traditional unsupervised-clustering/maximum-likelihood algorithm sequence that assumes spectral groupings in the Landsat data in n-dimensional space; the other is based on a nontraditional unsupervised-clustering/canonical-transformation/clustering algorithm sequence that not only assumes spectral groupings in n-dimensional space but also includes an additional feature-extraction technique. It is found that the nontraditional procedure provides an appreciable improvement in spectral groupings and apparently increases the level of accuracy in the classification of land cover categories.

  4. Multilocus sequence analysis of Anaplasma phagocytophilum reveals three distinct lineages with different host ranges in clinically ill French cattle.

    PubMed

    Chastagner, Amélie; Dugat, Thibaud; Vourc'h, Gwenaël; Verheyden, Hélène; Legrand, Loïc; Bachy, Véronique; Chabanne, Luc; Joncour, Guy; Maillard, Renaud; Boulouis, Henri-Jean; Haddad, Nadia; Bailly, Xavier; Leblond, Agnès

    2014-12-09

    Molecular epidemiology represents a powerful approach to elucidate the complex epidemiological cycles of multi-host pathogens, such as Anaplasma phagocytophilum. A. phagocytophilum is a tick-borne bacterium that affects a wide range of wild and domesticated animals. Here, we characterized its genetic diversity in populations of French cattle; we then compared the observed genotypes with those found in horses, dogs, and roe deer to determine whether genotypes of A. phagocytophilum are shared among different hosts. We sampled 120 domesticated animals (104 cattle, 13 horses, and 3 dogs) and 40 wild animals (roe deer) and used multilocus sequence analysis on nine loci (ankA, msp4, groESL, typA, pled, gyrA, recG, polA, and an intergenic region) to characterize the genotypes of A. phagocytophilum present. Phylogenic analysis revealed three genetic clusters of bacterial variants in domesticated animals. The two principal clusters included 98% of the bacterial genotypes found in cattle, which were only distantly related to those in roe deer. One cluster comprised only cattle genotypes, while the second contained genotypes from cattle, horses, and dogs. The third contained all roe deer genotypes and three cattle genotypes. Geographical factors could not explain this clustering pattern. These results suggest that roe deer do not contribute to the spread of A. phagocytophilum in cattle in France. Further studies should explore if these different clusters are associated with differing disease severity in domesticated hosts. Additionally, it remains to be seen if the three clusters of A. phagocytophilum genotypes in cattle correspond to distinct epidemiological cycles, potentially involving different reservoir hosts.

  5. Population clustering based on copy number variations detected from next generation sequencing data.

    PubMed

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  6. Horizontal transfer of a large and highly toxic secondary metabolic gene cluster between fungi.

    PubMed

    Slot, Jason C; Rokas, Antonis

    2011-01-25

    Genes involved in intermediary and secondary metabolism in fungi are frequently physically linked or clustered. For example, in Aspergillus nidulans the entire pathway for the production of sterigmatocystin (ST), a highly toxic secondary metabolite and a precursor to the aflatoxins (AF), is located in a ∼54 kb, 23 gene cluster. We discovered that a complete ST gene cluster in Podospora anserina was horizontally transferred from Aspergillus. Phylogenetic analysis shows that most Podospora cluster genes are adjacent to or nested within Aspergillus cluster genes, although the two genera belong to different taxonomic classes. Furthermore, the Podospora cluster is highly conserved in content, sequence, and microsynteny with the Aspergillus ST/AF clusters and its intergenic regions contain 14 putative binding sites for AflR, the transcription factor required for activation of the ST/AF biosynthetic genes. Examination of ∼52,000 Podospora expressed sequence tags identified transcripts for 14 genes in the cluster, with several expressed at multiple life cycle stages. The presence of putative AflR-binding sites and the expression evidence for several cluster genes, coupled with the recent independent discovery of ST production in Podospora [1], suggest that this HGT event probably resulted in a functional cluster. Given the abundance of metabolic gene clusters in fungi, our finding that one of the largest known metabolic gene clusters moved intact between species suggests that such transfers might have significantly contributed to fungal metabolic diversity. PAPERFLICK: Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation

    PubMed Central

    Casadio, Rita

    2017-01-01

    Abstract BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. PMID:28453653

  8. Laboratory diagnosis and genetic analysis of a family clustering outbreak of aseptic meningitis due to echovirus 30

    PubMed Central

    Ye, Hongyan; Yan, Juying; Xie, Guoliang; Cui, Dawei; Yu, Fei; Wang, Yiyin; Yang, Xianzhi; Zhou, Fangman; Zhang, Yanjun; Tian, Xueli; Chen, Yu

    2016-01-01

    Echovirus 30 (E30) is a major pathogen associated with aseptic meningitis. In the summer of 2014, a family clustering aseptic meningitis outbreak occurred in urban–rural fringe of Ningbo city in Zhejiang Province in China. To identify the etiologic agent, specimens were tested by cell culture and reverse transcriptase–polymerase chain reaction. Pathogenic examination confirmed that the outbreak is caused by E30. The first case is a 6-year-old child, who studied in kindergarten in local, suffered from headache and fever. Same symptoms appeared in his parents, aunts, and other six relatives continuously. Meanwhile, vomiting occurred in majority of the patients and diarrhea in parts of them. White blood cells in cerebrospinal fluid (CSF) exceeded normal range in all patients. Protein levels in CSF were above normal range in half of the patients. Glucose levels in CSF were within normal range in all patients. We isolated six strains E30 in the stool specimens of patients, and carried out sequencing analysis to VP1 region. Sequencing results showed that 100% sequence identity was seen in both nucleotide and amino acid levels. Phylogenetic analysis discovered that isolate in this study was grouped into sublineage D2 together with sequences isolated from other areas of China in the 2000s and 2010s. Our study is the first family clustering outbreak of aseptic meningitis caused by E30 in Zhejiang Province in China. It is essential to establish an enterovirus molecular surveillance system in China to prevent mass outbreaks in Zhejiang. PMID:27646838

  9. Laboratory diagnosis and genetic analysis of a family clustering outbreak of aseptic meningitis due to echovirus 30.

    PubMed

    Zheng, Shufa; Ye, Hongyan; Yan, Juying; Xie, Guoliang; Cui, Dawei; Yu, Fei; Wang, Yiyin; Yang, Xianzhi; Zhou, Fangman; Zhang, Yanjun; Tian, Xueli; Chen, Yu

    2016-09-01

    Echovirus 30 (E30) is a major pathogen associated with aseptic meningitis. In the summer of 2014, a family clustering aseptic meningitis outbreak occurred in urban-rural fringe of Ningbo city in Zhejiang Province in China. To identify the etiologic agent, specimens were tested by cell culture and reverse transcriptase-polymerase chain reaction. Pathogenic examination confirmed that the outbreak is caused by E30. The first case is a 6-year-old child, who studied in kindergarten in local, suffered from headache and fever. Same symptoms appeared in his parents, aunts, and other six relatives continuously. Meanwhile, vomiting occurred in majority of the patients and diarrhea in parts of them. White blood cells in cerebrospinal fluid (CSF) exceeded normal range in all patients. Protein levels in CSF were above normal range in half of the patients. Glucose levels in CSF were within normal range in all patients. We isolated six strains E30 in the stool specimens of patients, and carried out sequencing analysis to VP1 region. Sequencing results showed that 100% sequence identity was seen in both nucleotide and amino acid levels. Phylogenetic analysis discovered that isolate in this study was grouped into sublineage D2 together with sequences isolated from other areas of China in the 2000s and 2010s. Our study is the first family clustering outbreak of aseptic meningitis caused by E30 in Zhejiang Province in China. It is essential to establish an enterovirus molecular surveillance system in China to prevent mass outbreaks in Zhejiang.

  10. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

    PubMed

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-09-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  11. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

    PubMed Central

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-01-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648

  12. Epidemiological study of phylogenetic transmission clusters in a local HIV-1 epidemic reveals distinct differences between subtype B and non-B infections.

    PubMed

    Chalmet, Kristen; Staelens, Delfien; Blot, Stijn; Dinakis, Sylvie; Pelgrom, Jolanda; Plum, Jean; Vogelaers, Dirk; Vandekerckhove, Linos; Verhofstede, Chris

    2010-09-07

    The number of HIV-1 infected individuals in the Western world continues to rise. More in-depth understanding of regional HIV-1 epidemics is necessary for the optimal design and adequate use of future prevention strategies. The use of a combination of phylogenetic analysis of HIV sequences, with data on patients' demographics, infection route, clinical information and laboratory results, will allow a better characterization of individuals responsible for local transmission. Baseline HIV-1 pol sequences, obtained through routine drug-resistance testing, from 506 patients, newly diagnosed between 2001 and 2009, were used to construct phylogenetic trees and identify transmission-clusters. Patients' demographics, laboratory and clinical data, were retrieved anonymously. Statistical analysis was performed to identify subtype-specific and transmission-cluster-specific characteristics. Multivariate analysis showed significant differences between the 59.7% of individuals with subtype B infection and the 40.3% non-B infected individuals, with regard to route of transmission, origin, infection with Chlamydia (p = 0.01) and infection with Hepatitis C virus (p = 0.017). More and larger transmission-clusters were identified among the subtype B infections (p < 0.001). Overall, in multivariate analysis, clustering was significantly associated with Caucasian origin, infection through homosexual contact and younger age (all p < 0.001). Bivariate analysis additionally showed a correlation between clustering and syphilis (p < 0.001), higher CD4 counts (p = 0.002), Chlamydia infection (p = 0.013) and primary HIV (p = 0.017). Combination of phylogenetics with demographic information, laboratory and clinical data, revealed that HIV-1 subtype B infected Caucasian men-who-have-sex-with-men with high prevalence of sexually transmitted diseases, account for the majority of local HIV-transmissions. This finding elucidates observed epidemiological trends through molecular analysis, and justifies sustained focus in prevention on this high risk group.

  13. Stellar Clustering in the Dark Filament IRDC 321.706+0.066

    NASA Astrophysics Data System (ADS)

    Soto King, Piera

    2017-06-01

    We investigate the star formation process in the infrared dark cloud IRDC 321.706+0.066, where are located three infrared clusters recently discovered by Barbá et al. (2015) using images of the VISTA Variables in the Vía Láctea public survey: La Serena 210, 211 and 212. The aim is to characterize the stellar content of the three clusters and to investigate the star formation sequence in a filamentary dark cloud. We present a new photometric analysis of VVV images, and we use data from others surveys. We confirmed the presence of the three VVV clusters. And also, we propose a new cluster

  14. Intermediate to low-mass stellar content of Westerlund 1

    NASA Astrophysics Data System (ADS)

    Brandner, W.; Clark, J. S.; Stolte, A.; Waters, R.; Negueruela, I.; Goodwin, S. P.

    2008-01-01

    We have analysed near-infrared NTT/SofI observations of the starburst cluster Westerlund 1, which is among the most massive young clusters in the Milky Way. A comparison of colour-magnitude diagrams with theoretical main-sequence and pre-main sequence evolutionary tracks yields improved extinction and distance estimates of AKs = 1.13 ± 0.03 mag and d = 3.55 ± 0.17 kpc (DM = 12.75 ± 0.10 mag). The pre-main sequence population is best fit by a Palla & Stahler isochrone for an age of 3.2 Myr, while the main sequence population is in agreement with a cluster age of 3 to 5 Myr. An analysis of the structural parameters of the cluster yields that the half-mass radius of the cluster population increases towards lower mass, indicative of the presence of mass segregation. The cluster is clearly elongated with an eccentricity of 0.20 for stars with masses between 10 and 32 M_⊙, and 0.15 for stars with masses in the range 3 to 10 M_⊙. We derive the slope of the stellar mass function for stars with masses between 3.4 and 27 M_⊙. In an annulus with radii between 0.75 and 1.5 pc from the cluster centre, we obtain a slope of Γ = -1.3. Closer in, the mass function of Westerlund 1 is shallower with Γ = -0.6. The extrapolation of the mass function for stars with masses from 0.08 to 120 M_⊙ yields an initial total stellar mass of ≈52 000 M_⊙, and a present-day mass of 20 000 to 45 000 M_⊙ (about 10 times the stellar mass of the Orion nebula cluster, and 2 to 4 times the mass of the NGC 3603 young cluster), indicating that Westerlund 1 is the most massive starburst cluster identified to date in the Milky Way. Based on observations collected at the European Southern Observatory, La Silla, Chile, and retrieved from the ESO archive (Prog ID 67.C-0514).

  15. Molecular characterization and phylogenetic inferences of Dermanyssus gallinae isolates in Italy within an European framework.

    PubMed

    Marangi, M; Cantacessi, C; Sparagano, O A E; Camarda, A; Giangaspero, A

    2014-12-01

    In order to investigate the genetic relationships between Dermanyssus gallinae (Metastigmata: Dermanyssidae) (de Geer) isolates from poultry farms in Italy and other European countries, phylogenetic analysis was performed using a portion of the cytochrome c oxidase subunit 1 (cox1) gene of the mitochondrial DNA and the internal transcribed spacers (ITS1+5.8S+ITS2) of the ribosomal DNA. A total of 360 cox1 sequences and 360 ITS+ sequences were obtained from mites collected on 24 different poultry farms in 10 different regions of Northern and Southern Italy. Phylogenetic analysis of the cox1 sequences resulted in the clustering of two groups (A and B), whereas phylogenetic analysis of the ITS+ resulted in largely unresolved clusters. Knowledge of the genetic make-up of mite populations within countries, together with comparative analyses of D. gallinae isolates from different countries, will provide better understanding of the population dynamics of D. gallinae. This will also allow the identification of genetic markers of emerging acaricide resistance and the development of alternative strategies for the prevention and treatment of infestations. © 2014 The Royal Entomological Society.

  16. GATA: A graphic alignment tool for comparative sequenceanalysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less

  17. On the age and mass function of the globular cluster M 4: A different interpretation of recent deep HST observations

    NASA Astrophysics Data System (ADS)

    De Marchi, G.; Paresce, F.; Straniero, O.; Prada Moroni, P. G.

    2004-03-01

    Very deep images of the Galactic globular cluster M 4 (NGC 6121) through the F606W and F814W filters were taken in 2001 with the WFPC2 on board the HST. A first published analysis of this data set (Richer et al. \\cite{Richer2002}) produced the result that the age of M 4 is 12.7± 0.7 Gyr (Hansen et al. \\cite{Hansen2002}), thus setting a robust lower limit to the age of the universe. In view of the great astronomical importance of getting this number right, we have subjected the same data set to the simplest possible photometric analysis that completely avoids uncertain assumptions about the origin of the detected sources. This analysis clearly reveals both a thin main sequence, from which can be deduced the deepest statistically complete mass function yet determined for a globular cluster, and a white dwarf (WD) sequence extending all the way down to the 5 \\sigma detection limit at I ≃ 27. The WD sequence is abruptly terminated at exactly this limit as expected by detection statistics. Using our most recent theoretical WD models (Prada Moroni & Straniero \\cite{Prada2002}) to obtain the expected WD sequence for different ages in the observed bandpasses, we find that the data so far obtained do not reach the peak of the WD luminosity function, thus only allowing one to set a lower limit to the age of M 4 of ˜9 Gyr. Thus, the problem of determining the absolute age of a globular cluster and, therefore, the onset of GC formation with cosmologically significant accuracy remains completely open. Only observations several magnitudes deeper than the limit obtained so far would allow one to approach this objective. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA for NASA under contract NAS5-26555.

  18. Genome sequencing and secondary metabolism of the postharvest pathogen Penicillium griseofulvum.

    PubMed

    Banani, Houda; Marcet-Houben, Marina; Ballester, Ana-Rosa; Abbruscato, Pamela; González-Candelas, Luis; Gabaldón, Toni; Spadaro, Davide

    2016-01-05

    Penicillium griseofulvum is associated in stored apples with blue mould, the most important postharvest disease of pome fruit. This pathogen can simultaneously produce both detrimental and beneficial secondary metabolites (SM). In order to gain insight into SM synthesis in P. griseofulvum in vitro and during disease development on apple, we sequenced the genome of P. griseofulvum strain PG3 and analysed important SM clusters. PG3 genome sequence (29.3 Mb) shows that P. griseofulvum branched off after the divergence of P. oxalicum but before the divergence of P. chrysogenum. Genome-wide analysis of P. griseofulvum revealed putative gene clusters for patulin, griseofulvin and roquefortine C biosynthesis. Furthermore, we quantified the SM production in vitro and on apples during the course of infection. The expression kinetics of key genes of SM produced in infected apple were examined. We found additional SM clusters, including those potentially responsible for the synthesis of penicillin, yanuthone D, cyclopiazonic acid and we predicted a cluster putatively responsible for the synthesis of chanoclavine I. These findings provide relevant information to understand the molecular basis of SM biosynthesis in P. griseofulvum, to allow further research directed to the overexpression or blocking the synthesis of specific SM.

  19. A novel alignment-free method to classify protein folding types by combining spectral graph clustering with Chou's pseudo amino acid composition.

    PubMed

    Tripathi, Pooja; Pandey, Paras N

    2017-07-07

    The present work employs pseudo amino acid composition (PseAAC) for encoding the protein sequences in their numeric form. Later this will be arranged in the similarity matrix, which serves as input for spectral graph clustering method. Spectral methods are used previously also for clustering of protein sequences, but they uses pair wise alignment scores of protein sequences, in similarity matrix. The alignment score depends on the length of sequences, so clustering short and long sequences together may not good idea. Therefore the idea of introducing PseAAC with spectral clustering algorithm came into scene. We extensively tested our method and compared its performance with other existing machine learning methods. It is consistently observed that, the number of clusters that we obtained for a given set of proteins is close to the number of superfamilies in that set and PseAAC combined with spectral graph clustering shows the best classification results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Single-cell RNA-sequencing reveals a distinct population of proglucagon-expressing cells specific to the mouse upper small intestine.

    PubMed

    Glass, Leslie L; Calero-Nieto, Fernando J; Jawaid, Wajid; Larraufie, Pierre; Kay, Richard G; Göttgens, Berthold; Reimann, Frank; Gribble, Fiona M

    2017-10-01

    To identify sub-populations of intestinal preproglucagon-expressing (PPG) cells producing Glucagon-like Peptide-1, and their associated expression profiles of sensory receptors, thereby enabling the discovery of therapeutic strategies that target these cell populations for the treatment of diabetes and obesity. We performed single cell RNA sequencing of PPG-cells purified by flow cytometry from the upper small intestine of 3 GLU-Venus mice. Cells from 2 mice were sequenced at low depth, and from the third mouse at high depth. High quality sequencing data from 234 PPG-cells were used to identify clusters by tSNE analysis. qPCR was performed to compare the longitudinal and crypt/villus locations of cluster-specific genes. Immunofluorescence and mass spectrometry were used to confirm protein expression. PPG-cells formed 3 major clusters: a group with typical characteristics of classical L-cells, including high expression of Gcg and Pyy (comprising 51% of all PPG-cells); a cell type overlapping with Gip-expressing K-cells (14%); and a unique cluster expressing Tph1 and Pzp that was predominantly located in proximal small intestine villi and co-produced 5-HT (35%). Expression of G-protein coupled receptors differed between clusters, suggesting the cell types are differentially regulated and would be differentially targetable. Our findings support the emerging concept that many enteroendocrine cell populations are highly overlapping, with individual cells producing a range of peptides previously assigned to distinct cell types. Different receptor expression profiles across the clusters highlight potential drug targets to increase gut hormone secretion for the treatment of diabetes and obesity. Copyright © 2017 The Authors. Published by Elsevier GmbH.. All rights reserved.

  1. Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.

    PubMed

    Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar

    2011-12-01

    Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.

  2. A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

    PubMed

    Rajan, Vaibhav

    2013-03-01

    Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.

  3. Prenatal Diagnosis and Molecular Analysis of a Large Novel Deletion (- -JS) Causing α0-Thalassemia.

    PubMed

    Cao, Jinru; He, Shuzhen; Pu, Yudong; Liu, Jingjing; Liu, Fuping; Feng, Jun

    α-Thalassemia (α-thal) is a very common single gene hereditary disease caused by large deletions or point mutations of the α-globin gene cluster in tropical and subtropical regions of the world. Here, we report for the first time, a novel large α-thal deletion in a Chinese family from Jiangsu Province, People's Republic of China (PRC), which removes almost the entire α2 and α1 genes from the α-globin gene cluster. Thus, it was named the Jiangsu deletion (- - JS ) on the α-globin gene cluster causing α 0 -thal. Heterozygotes for this deletion showed an α-thal trait phenotype with reduced mean corpuscular volume (MCV) and mean corpuscular hemoglobin (Hb) (MCH) levels. The sequencing results showed that a 2538 bp deletion (NG_000006.1: g.35801_38338) existed in this novel genotype on the basis of -α 4.2 (leftward), indicating a deletion of about 6.8 kb from the α-globin cluster. In addition, a 29 bp sequence was inserted into the deletion during the recombination events that led to this deletion. Through pedigree analysis, we knew that the proband inherited the novel allele from his mother.

  4. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

    PubMed

    Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

    2012-09-11

    Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  5. Predictive Rate-Distortion for Infinite-Order Markov Processes

    NASA Astrophysics Data System (ADS)

    Marzen, Sarah E.; Crutchfield, James P.

    2016-06-01

    Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments confirm a popular intuition: algorithms that cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of finite- and infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting algorithms yield substantial improvements.

  6. Molecular evolution of the HoxA cluster in the three major gnathostome lineages

    PubMed Central

    Chiu, Chi-hua; Amemiya, Chris; Dewar, Ken; Kim, Chang-Bae; Ruddle, Frank H.; Wagner, Günter P.

    2002-01-01

    The duplication of Hox clusters and their maintenance in a lineage has a prominent but little understood role in chordate evolution. Here we examined how Hox cluster duplication may influence changes in cluster architecture and patterns of noncoding sequence evolution. We sequenced the entire duplicated HoxAa and HoxAb clusters of zebrafish (Danio rerio) and extended the 5′ (posterior) part of the HoxM (HoxA-like) cluster of horn shark (Heterodontus francisci) containing the hoxa11 and hoxa13 orthologs as well as intergenic and flanking noncoding sequences. The duplicated HoxA clusters in zebrafish each house considerably fewer genes and are dramatically shorter than the single HoxA clusters of human and horn shark. We compared the intergenic sequences of the HoxA clusters of human, horn shark, zebrafish (Aa, Ab), and striped bass and found extensive conservation of noncoding sequence motifs, i.e., phylogenetic footprints, between the human and horn shark, representing two of the three gnathostome lineages. These are putative cis-regulatory elements that may play a role in the regulation of the ancestral HoxA cluster. In contrast, homologous regions of the duplicated HoxAa and HoxAb clusters of zebrafish and the HoxA cluster of striped bass revealed a striking loss of conservation of these putative cis-regulatory sequences in the 3′ (anterior) segment of the cluster, where zebrafish only retains single representatives of group 1, 3, 4, and 5 (HoxAa) and group 2 (HoxAb) genes and in the 5′ part of the clusters, where zebrafish retains two copies of the group 13, 11, and 9 genes, i.e., AbdB-like genes. In analyzing patterns of cis-sequence evolution in the 5′ part of the clusters, we explicitly looked for evidence of complementary loss of conserved noncoding sequences, as predicted by the duplication-degeneration-complementation model in which genetic redundancy after gene duplication is resolved because of the fixation of complementary degenerative mutations. Our data did not yield evidence supporting this prediction. We conclude that changes in the pattern of cis-sequence conservation after Hox cluster duplication are more consistent with being the outcome of adaptive modification rather than passive mechanisms that erode redundancy created by the duplication event. These results support the view that genome duplications may provide a mechanism whereby master control genes undergo radical modifications conducive to major alterations in body plan. Such genomic revolutions may contribute significantly to the evolutionary process. PMID:11943847

  7. Community detection in sequence similarity networks based on attribute clustering

    DOE PAGES

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    2017-07-24

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  8. Community detection in sequence similarity networks based on attribute clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  9. Whole-Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non-Salted Fermented Soybean Foods

    PubMed Central

    Kamada, Mayumi; Hase, Sumitaka; Fujii, Kazushi; Miyake, Masato; Sato, Kengo; Kimura, Keitarou; Sakakibara, Yasubumi

    2015-01-01

    Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA), we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food) starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from “Tua Nao” of Thailand traces a different evolutionary process from other strains. PMID:26505996

  10. Spatial methods for deriving crop rotation history

    NASA Astrophysics Data System (ADS)

    Mueller-Warrant, George W.; Trippe, Kristin M.; Whittaker, Gerald W.; Anderson, Nicole P.; Sullivan, Clare S.

    2017-08-01

    Benefits of converting 11 years of remote sensing classification data into cropping history of agricultural fields included measuring lengths of rotation cycles and identifying specific sequences of intervening crops grown between final years of old grass seed stands and establishment of new ones. Spatial and non-spatial methods were complementary. Individual-year classification errors were often correctable in spreadsheet-based non-spatial analysis, whereas their presence in spatial data generally led to exclusion of fields from further analysis. Markov-model testing of non-spatial data revealed that year-to-year cropping sequences did not match average frequencies for transitions among crops grown in western Oregon, implying that rotations into new grass seed stands were influenced by growers' desires to achieve specific objectives. Moran's I spatial analysis of length of time between consecutive grass seed stands revealed that clustering of fields was relatively uncommon, with high and low value clusters only accounting for 7.1 and 6.2% of fields.

  11. Exploring Connectivity in Sequence Space of Functional RNA

    NASA Technical Reports Server (NTRS)

    Wei, Chenyu; Pohorille, Andrzej; Popovic, Milena; Ditzler, Mark

    2017-01-01

    Emergence of replicable genetic molecules was one of the marking points in the origin of life, evolution of which can be conceptualized as a walk through the space of all possible sequences. A theoretical concept of fitness landscape helps to understand evolutionary processes through assigning a value of fitness to each genotype. Then, evolution of a phenotype is viewed as a series of consecutive, single-point mutations. Natural selection biases evolution toward peaks of high fitness and away from valleys of low fitness. whereas neutral drift occurs in the sequence space without direction as mutations are introduced at random. Large networks of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long genomes, are possible or even inevitable. Their detection in experiments, however, has been elusive. Although a few near-neutral evolutionary pathways have been found, recent experimental evidence indicates landscapes consist of largely isolated islands. The generality of these results, however, is not clear, as the genome length or the fraction of functional molecules in the genotypic space might have been insufficient for the emergence of large, neutral networks. Thorough investigation on the structure of the fitness landscape is essential to understand the mechanisms of evolution of early genomes. RNA molecules are commonly assumed to play the pivotal role in the origin of genetic systems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of our recent studies on the structure of the sequence space of RNA ligase ribozymes selected through in vitro evolution. Several hundred thousands of sequences active to a different degree were obtained by way of deep sequencing. Analysis of these sequences revealed several large clusters defined such that every sequence in a cluster can be reached from any other sequence in the same cluster through a series of single point mutations. Sequences in a single cluster appear to adopt more than one secondary structure. The mechanism of refolding within a single cluster was examined. To shed light on possible evolutionary paths in the space of ribozymes, the connectivity between clusters was investigated. The effect of length of RNA molecules on the structure of the fitness landscape and possible evolutionary paths was examined by way of comparing functional sequences of 20 and 80 nucleobases in length. It was found that sequences of different lengths shared secondary structure motifs that were presumed responsible for catalytic activity, with increasing complexity and global structural rearrangements emerging in longer molecules.

  12. Multilocus Sequence Typing Analysis of Staphylococcus lugdunensis Implies a Clonal Population Structure

    PubMed Central

    Chassain, Benoît; Lemée, Ludovic; Didi, Jennifer; Thiberge, Jean-Michel; Brisse, Sylvain; Pons, Jean-Louis

    2012-01-01

    Staphylococcus lugdunensis is recognized as one of the major pathogenic species within the genus Staphylococcus, even though it belongs to the coagulase-negative group. A multilocus sequence typing (MLST) scheme was developed to study the genetic relationships and population structure of 87 S. lugdunensis isolates from various clinical and geographic sources by DNA sequence analysis of seven housekeeping genes (aroE, dat, ddl, gmk, ldh, recA, and yqiL). The number of alleles ranged from four (gmk and ldh) to nine (yqiL). Allelic profiles allowed the definition of 20 different sequence types (STs) and five clonal complexes. The 20 STs lacked correlation with geographic source. Isolates recovered from hematogenic infections (blood or osteoarticular isolates) or from skin and soft tissue infections did not cluster in separate lineages. Penicillin-resistant isolates clustered mainly in one clonal complex, unlike glycopeptide-tolerant isolates, which did not constitute a distinct subpopulation within S. lugdunensis. Phylogenies from the sequences of the seven individual housekeeping genes were congruent, indicating a predominantly mutational evolution of these genes. Quantitative analysis of the linkages between alleles from the seven loci revealed a significant linkage disequilibrium, thus confirming a clonal population structure for S. lugdunensis. This first MLST scheme for S. lugdunensis provides a new tool for investigating the macroepidemiology and phylogeny of this unusually virulent coagulase-negative Staphylococcus. PMID:22785196

  13. [Diversity of beta-proteobacterial ammonia-oxidizing bacteria and ammonia-oxidizing archaea in shrimp farm sediment].

    PubMed

    Gao, Lihai; Lin, Weitie

    2011-01-01

    In order to study the diversity of ammonia-oxidizing bacteria (AOB) and ammonia-oxidizing archaea (AOA) in shrimp farm sediment. Total microbial DNA was directly extracted from the shrimp farm sediment. The clone library of amoA genes were constructed with beta-Proteobacterial-AOB and AOA specific primers. The library was screened by PCR-restriction fragment length polymorphism (RFLP) analysis and clones with unique RFLP patterns were sequenced. Phylogenetic analyses of the amoA gene fragments showed that all AOB sequences from shrimp farm sediment were affiliated with Nitrosomonas (61.54%) or Nitrosomonas-like (38. 46%) species and grouped into Nitrosomonas communis cluster, Nitrosomonas sp. Nm148 cluster, Nitrosomonas oligotropha cluster. All AOA sequences belonged to the kingdom Crenarchaeote except that one Operational Taxa Unit (OTU) sequence was Unclassified-Archaea and fell within cluster S (soil origin). AOB and AOA species composition included 13 OTUs and 9 OTUs. The clone coverage of bacterial and archaeal amoA genes was 73.47% and 90.43%. The Shannon-Wiener index, Evenness index, Simpson index and Richness index of AOB were higher than those of AOA. These findings represent the first detailed examination of archaeal amoA diversity in shrimp farm sediment and demonstrate that diverse communities of Crenarchaeote capable of ammonia oxidation are present within shrimp farm sediment, where they may be actively involved in nitrification.

  14. Transcriptomic analysis of the interaction between Helianthus annuus and its obligate parasite Plasmopara halstedii shows single nucleotide polymorphisms in CRN sequences.

    PubMed

    As-sadi, Falah; Carrere, Sébastien; Gascuel, Quentin; Hourlier, Thibaut; Rengel, David; Le Paslier, Marie-Christine; Bordat, Amandine; Boniface, Marie-Claude; Brunel, Dominique; Gouzy, Jérôme; Godiard, Laurence; Vincourt, Patrick

    2011-10-11

    Downy mildew in sunflowers (Helianthus annuus L.) is caused by the oomycete Plasmopara halstedii (Farl.) Berlese et de Toni. Despite efforts by the international community to breed mildew-resistant varieties, downy mildew remains a major threat to the sunflower crop. Very few genomic, genetic and molecular resources are currently available to study this pathogen. Using a 454 sequencing method, expressed sequence tags (EST) during the interaction between H. annuus and P. halstedii have been generated and a search was performed for sites in putative effectors to show polymorphisms between the different races of P. halstedii. A 454 pyrosequencing run of two infected sunflower samples (inbred lines XRQ and PSC8 infected with race 710 of P. halstedii, which exhibit incompatible and compatible interactions, respectively) generated 113,720 and 172,107 useable reads. From these reads, 44,948 contigs and singletons have been produced. A bioinformatic portal, HP, was specifically created for in-depth analysis of these clusters. Using in silico filtering, 405 clusters were defined as being specific to oomycetes, and 172 were defined as non-specific oomycete clusters. A subset of these two categories was checked using PCR amplification, and 86% of the tested clusters were validated. Twenty putative RXLR and CRN effectors were detected using PSI-BLAST. Using corresponding sequences from four races (100, 304, 703 and 710), 22 SNPs were detected, providing new information on pathogen polymorphisms. This study identified a large number of genes that are expressed during H. annuus/P. halstedii compatible or incompatible interactions. It also reveals, for the first time, that an infection mechanism exists in P. halstedii similar to that in other oomycetes associated with the presence of putative RXLR and CRN effectors. SNPs discovered in CRN effector sequences were used to determine the genetic distances between the four races of P. halstedii. This work therefore provides valuable tools for further discoveries regarding the H. annuus/P. halstedii pathosystem.

  15. Transcriptomic analysis of the interaction between Helianthus annuus and its obligate parasite Plasmopara halstedii shows single nucleotide polymorphisms in CRN sequences

    PubMed Central

    2011-01-01

    Background Downy mildew in sunflowers (Helianthus annuus L.) is caused by the oomycete Plasmopara halstedii (Farl.) Berlese et de Toni. Despite efforts by the international community to breed mildew-resistant varieties, downy mildew remains a major threat to the sunflower crop. Very few genomic, genetic and molecular resources are currently available to study this pathogen. Using a 454 sequencing method, expressed sequence tags (EST) during the interaction between H. annuus and P. halstedii have been generated and a search was performed for sites in putative effectors to show polymorphisms between the different races of P. halstedii. Results A 454 pyrosequencing run of two infected sunflower samples (inbred lines XRQ and PSC8 infected with race 710 of P. halstedii, which exhibit incompatible and compatible interactions, respectively) generated 113,720 and 172,107 useable reads. From these reads, 44,948 contigs and singletons have been produced. A bioinformatic portal, HP, was specifically created for in-depth analysis of these clusters. Using in silico filtering, 405 clusters were defined as being specific to oomycetes, and 172 were defined as non-specific oomycete clusters. A subset of these two categories was checked using PCR amplification, and 86% of the tested clusters were validated. Twenty putative RXLR and CRN effectors were detected using PSI-BLAST. Using corresponding sequences from four races (100, 304, 703 and 710), 22 SNPs were detected, providing new information on pathogen polymorphisms. Conclusions This study identified a large number of genes that are expressed during H. annuus/P. halstedii compatible or incompatible interactions. It also reveals, for the first time, that an infection mechanism exists in P. halstedii similar to that in other oomycetes associated with the presence of putative RXLR and CRN effectors. SNPs discovered in CRN effector sequences were used to determine the genetic distances between the four races of P. halstedii. This work therefore provides valuable tools for further discoveries regarding the H. annuus/P. halstedii pathosystem. PMID:21988821

  16. Photometric binary stars in Praesepe and the search for globular cluster binaries

    NASA Technical Reports Server (NTRS)

    Bolte, Michael

    1991-01-01

    A radial velocity study of the stars which are located on a second sequence above the single-star zero-age main sequence at a given color in the color-magnitude diagram of the open cluster Praesepe, (NGC 2632) shows that 10, and possibly 11, of 17 are binary systems. Of the binary systems, five have full amplitudes for their velocity variations that are greater than 50 km/s. To the extent that they can be applied to globular clusters, these results suggests that (1) observations of 'second-sequence' stars in globular clusters would be an efficient way of finding main-sequence binary systems in globulars, and (2) current instrumentation on large telescopes is sufficient for establishing unambiguously the existence of main-sequence binary systems in nearby globular clusters.

  17. Genomic Identification and Analysis of Shared Cis-regulator Elements in a Developmentally Critical homeobox Cluster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chris Amemiya

    2003-04-01

    The goals of this project were to isolate, characterize, and sequence the Dlx3/Dlx7 bigene cluster from twelve different species of mammals. The Dlx3 and Dlx7 genes are known to encode homeobox transcription factors involved in patterning of structures in the vertebrate jaw as well as vertebrate limbs. Genomic sequences from the respective taxa will subsequently be compared in order to identify conserved non-coding sequences that are potential cis-regulatory elements. Based on the comparisons they will fashion transgenic mouse experiments to functionally test the strength of the potential cis-regulatory elements. A goal of the project is to attempt to identify thosemore » elements that may function in coordinately regulating both Dlx3 and Dlx7 functions.« less

  18. Whole genome sequencing analysis of Salmonella enterica serovar Weltevreden isolated from human stool and contaminated food samples collected from the Southern coastal area of China.

    PubMed

    Li, Baisheng; Yang, Xingfen; Tan, Hailing; Ke, Bixia; He, Dongmei; Wang, Haiyan; Chen, Qiuxia; Ke, Changwen; Zhang, Yonghui

    2018-02-02

    Salmonella enterica serovar Weltevreden is the most common non-typhoid Salmonella found in South and Southeast Asia. It causes zoonoses worldwide through the consumption of contaminated foods and seafood, and is considered as an important food-borne pathogen in China, especially in the Southern coastal area. We compared the whole genomes of 44 S. Weltevreden strains isolated from human stool and contaminated food samples from Southern Coastal China, in order to investigate their phylogenetic relationships and establish their genetic relatedness to known international strains. ResFinder analysis of the draft genomes of isolated strains detected antimicrobial resistance (AMR) genes in only eight isolates, equivalent to minimum inhibitory concentration assay, and only a few isolates showed resistance to tetracycline, ciprofloxacin or ampicillin. In silico MLST analysis revealed that 43 out of 44 S. Weltevreden strains belonged to sequence type 365 (CC205), the most common sequence type of the serovars. Phylogenetic analysis of the 44 domestic and 26 international isolates suggested that the population of S. Weltevreden could be segregated into six phylogenetic clusters. Cluster I included two strains from food and strains of the "Island Cluster", indicating potential inter-transmission between different countries and regions through foods. The predominant S. Weltevreden isolates obtained from the samples from Southern coastal China were found to be phylogenetically related to strains from Southern East Asia, and formed clusters II-VI. The study has demonstrated that WGS-based analysis may be used to improve our understanding of the epidemiology of this bacterium as part of a food-borne disease surveillance program. The methods used are also more widely applicable to other geographical regions and areas and could therefore be useful for improving our understanding of the international spread of S. Weltevreden on a global scale. Copyright © 2017. Published by Elsevier B.V.

  19. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

    PubMed

    Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

    2017-07-03

    BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  1. Genetic basis for mycophenolic acid production and strain-dependent production variability in Penicillium roqueforti.

    PubMed

    Gillot, Guillaume; Jany, Jean-Luc; Dominguez-Santos, Rebeca; Poirier, Elisabeth; Debaets, Stella; Hidalgo, Pedro I; Ullán, Ricardo V; Coton, Emmanuel; Coton, Monika

    2017-04-01

    Mycophenolic acid (MPA) is a secondary metabolite produced by various Penicillium species including Penicillium roqueforti. The MPA biosynthetic pathway was recently described in Penicillium brevicompactum. In this study, an in silico analysis of the P. roqueforti FM164 genome sequence localized a 23.5-kb putative MPA gene cluster. The cluster contains seven genes putatively coding seven proteins (MpaA, MpaB, MpaC, MpaDE, MpaF, MpaG, MpaH) and is highly similar (i.e. gene synteny, sequence homology) to the P. brevicompactum cluster. To confirm the involvement of this gene cluster in MPA biosynthesis, gene silencing using RNA interference targeting mpaC, encoding a putative polyketide synthase, was performed in a high MPA-producing P. roqueforti strain (F43-1). In the obtained transformants, decreased MPA production (measured by LC-Q-TOF/MS) was correlated to reduced mpaC gene expression by Q-RT-PCR. In parallel, mycotoxin quantification on multiple P. roqueforti strains suggested strain-dependent MPA-production. Thus, the entire MPA cluster was sequenced for P. roqueforti strains with contrasted MPA production and a 174bp deletion in mpaC was observed in low MPA-producers. PCRs directed towards the deleted region among 55 strains showed an excellent correlation with MPA quantification. Our results indicated the clear involvement of mpaC gene as well as surrounding cluster in P. roqueforti MPA biosynthesis. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

    PubMed

    Gibbons, Theodore R; Mount, Stephen M; Cooper, Endymion D; Delwiche, Charles F

    2015-07-10

    Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.

  3. Diversity of Bradyrhizobium strains nodulating Lupinus micranthus on both sides of the Western Mediterranean: Algeria and Spain.

    PubMed

    Bourebaba, Yasmina; Durán, David; Boulila, Farida; Ahnia, Hadjira; Boulila, Abdelghani; Temprano, Francisco; Palacios, José M; Imperial, Juan; Ruiz-Argüeso, Tomás; Rey, Luis

    2016-06-01

    Lupinus micranthus is a lupine distributed in the Mediterranean basin whose nitrogen fixing symbiosis has not been described in detail. In this study, 101 slow-growing nodule isolates were obtained from L. micranthus thriving in soils on both sides of the Western Mediterranean. The diversity of the isolates, 60 from Algeria and 41 from Spain, was addressed by multilocus sequence analysis of housekeeping genes (16S rRNA, atpD, glnII and recA) and one symbiotic gene (nodC). Using genomic fingerprints from BOX elements, 37 different profiles were obtained (22 from Algeria and 15 from Spain). Phylogenetic analysis based on 16S rRNA and concatenated atpD, glnII and recA sequences of a representative isolate of each BOX profile displayed a homogeneous distribution of profiles in six different phylogenetic clusters. All isolates were taxonomically ascribed to the genus Bradyrhizobium. Three clusters comprising 24, 6, and 4 isolates, respectively, accounted for most of the profiles. The largest cluster was close to the Bradyrhizobium canariense lineage, while the other two were related to B. cytisi/B. rifense. The three remaining clusters included only one isolate each, and were close to B. canariense, B. japonicum and B. elkanii species, respectively. In contrast, phylogenetic clustering of BOX profiles based on nodC sequences yielded only two phylogenetic groups. One of them included all the profiles except one, and belonged to symbiovar genistearum. The remaining profile, constituted by a strain related to B. elkanii, was not related to any well-defined symbiotic lineage, and may constitute both a new symbiovar and a new genospecies. Copyright © 2016 Elsevier GmbH. All rights reserved.

  4. Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites.

    PubMed

    Omura, S; Ikeda, H; Ishikawa, J; Hanamoto, A; Takahashi, C; Shinose, M; Takahashi, Y; Horikawa, H; Nakazawa, H; Osonoe, T; Kikuchi, H; Shiba, T; Sakaki, Y; Hattori, M

    2001-10-09

    Streptomyces avermitilis is a soil bacterium that carries out not only a complex morphological differentiation but also the production of secondary metabolites, one of which, avermectin, is commercially important in human and veterinary medicine. The major interest in this genus Streptomyces is the diversity of its production of secondary metabolites as an industrial microorganism. A major factor in its prominence as a producer of the variety of secondary metabolites is its possession of several metabolic pathways for biosynthesis. Here we report sequence analysis of S. avermitilis, covering 99% of its genome. At least 8.7 million base pairs exist in the linear chromosome; this is the largest bacterial genome sequence, and it provides insights into the intrinsic diversity of the production of the secondary metabolites of Streptomyces. Twenty-five kinds of secondary metabolite gene clusters were found in the genome of S. avermitilis. Four of them are concerned with the biosyntheses of melanin pigments, in which two clusters encode tyrosinase and its cofactor, another two encode an ochronotic pigment derived from homogentiginic acid, and another polyketide-derived melanin. The gene clusters for carotenoid and siderophore biosyntheses are composed of seven and five genes, respectively. There are eight kinds of gene clusters for type-I polyketide compound biosyntheses, and two clusters are involved in the biosyntheses of type-II polyketide-derived compounds. Furthermore, a polyketide synthase that resembles phloroglucinol synthase was detected. Eight clusters are involved in the biosyntheses of peptide compounds that are synthesized by nonribosomal peptide synthetases. These secondary metabolite clusters are widely located in the genome but half of them are near both ends of the genome. The total length of these clusters occupies about 6.4% of the genome.

  5. Limited overlap between phylogenetic HIV and hepatitis C virus clusters illustrates the dynamic sexual network structure of Dutch HIV-infected MSM.

    PubMed

    Vanhommerig, Joost W; Bezemer, Daniela; Molenkamp, Richard; Van Sighem, Ard I; Smit, Colette; Arends, Joop E; Lauw, Fanny N; Brinkman, Kees; Rijnders, Bart J; Newsum, Astrid M; Bruisten, Sylvia M; Prins, Maria; Van Der Meer, Jan T; Van De Laar, Thijs J; Schinkel, Janke

    2017-09-24

    MSM are at increased risk for infection with HIV-1 and hepatitis C virus (HCV). Is HIV/HCV coinfection confined to specific HIV transmission networks? A HIV phylogenetic tree was constructed for 5038 HIV-1 subtype B polymerase (pol) sequences obtained from MSM in the AIDS therapy evaluation in the Netherlands cohort. We investigated the existence of HIV clusters with increased HCV prevalence, the HIV phylogenetic density (i.e. the number of potential HIV transmission partners) of HIV/HCV-coinfected MSM compared with HIV-infected MSM without HCV, and the overlap in HIV and HCV phylogenies using HCV nonstructural protein 5B sequences from 183 HIV-infected MSM with acute HCV infection. Five hundred and sixty-three of 5038 (11.2%) HIV-infected MSM tested HCV positive. Phylogenetic analysis revealed 93 large HIV clusters (≥10 MSM), 370 small HIV clusters (2-9 MSM), and 867 singletons with a median HCV prevalence of 11.5, 11.6, and 9.3%, respectively. We identified six large HIV clusters with elevated HCV prevalence (range 23.5-46.2%). Median HIV phylogenetic densities for MSM with HCV (3, interquartile range 1-7) and without HCV (3, interquartile range 1-8) were similar. HCV phylogeny showed 12 MSM-specific HCV clusters (clustersize: 2-39 HCV sequences); 12.7% of HCV infections were part of the same HIV and HCV cluster. We observed few HIV clusters with elevated HCV prevalence, no increase in the HIV phylogenetic density of HIV/HCV-coinfected MSM compared to HIV-infected MSM without HCV, and limited overlap between HIV and HCV phylogenies among HIV/HCV-coinfected MSM. Our data do not support the existence of MSM-specific sexual networks that fuel both the HIV and HCV epidemic.

  6. Spatio-Temporal History of HIV-1 CRF35_AD in Afghanistan and Iran.

    PubMed

    Eybpoosh, Sana; Bahrampour, Abbas; Karamouzian, Mohammad; Azadmanesh, Kayhan; Jahanbakhsh, Fatemeh; Mostafavi, Ehsan; Zolala, Farzaneh; Haghdoost, Ali Akbar

    2016-01-01

    HIV-1 Circulating Recombinant Form 35_AD (CRF35_AD) has an important position in the epidemiological profile of Afghanistan and Iran. Despite the presence of this clade in Afghanistan and Iran for over a decade, our understanding of its origin and dissemination patterns is limited. In this study, we performed a Bayesian phylogeographic analysis to reconstruct the spatio-temporal dispersion pattern of this clade using eligible CRF35_AD gag and pol sequences available in the Los Alamos HIV database (432 sequences available from Iran, 16 sequences available from Afghanistan, and a single CRF35_AD-like pol sequence available from USA). Bayesian Markov Chain Monte Carlo algorithm was implemented in BEAST v1.8.1. Between-country dispersion rates were tested with Bayesian stochastic search variable selection method and were considered significant where Bayes factor values were greater than three. The findings suggested that CRF35_AD sequences were genetically similar to parental sequences from Kenya and Uganda, and to a set of subtype A1 sequences available from Afghan refugees living in Pakistan. Our results also showed that across all phylogenies, Afghan and Iranian CRF35_AD sequences formed a monophyletic cluster (posterior clade credibility> 0.7). The divergence date of this cluster was estimated to be between 1990 and 1992. Within this cluster, a bidirectional dispersion of the virus was observed across Afghanistan and Iran. We could not clearly identify if Afghanistan or Iran first established or received this epidemic, as the root location of this cluster could not be robustly estimated. Three CRF35_AD sequences from Afghan refugees living in Pakistan nested among Afghan and Iranian CRF35_AD branches. However, the CRF35_AD-like sequence available from USA diverged independently from Kenyan subtype A1 sequences, suggesting it not to be a true CRF35_AD lineage. Potential factors contributing to viral exchange between Afghanistan and Iran could be injection drug networks and mass migration of Afghan refugees and labours to Iran, which calls for extensive preventive efforts.

  7. Spatio-Temporal History of HIV-1 CRF35_AD in Afghanistan and Iran

    PubMed Central

    Eybpoosh, Sana; Bahrampour, Abbas; Karamouzian, Mohammad; Azadmanesh, Kayhan; Jahanbakhsh, Fatemeh; Mostafavi, Ehsan; Zolala, Farzaneh; Haghdoost, Ali Akbar

    2016-01-01

    HIV-1 Circulating Recombinant Form 35_AD (CRF35_AD) has an important position in the epidemiological profile of Afghanistan and Iran. Despite the presence of this clade in Afghanistan and Iran for over a decade, our understanding of its origin and dissemination patterns is limited. In this study, we performed a Bayesian phylogeographic analysis to reconstruct the spatio-temporal dispersion pattern of this clade using eligible CRF35_AD gag and pol sequences available in the Los Alamos HIV database (432 sequences available from Iran, 16 sequences available from Afghanistan, and a single CRF35_AD-like pol sequence available from USA). Bayesian Markov Chain Monte Carlo algorithm was implemented in BEAST v1.8.1. Between-country dispersion rates were tested with Bayesian stochastic search variable selection method and were considered significant where Bayes factor values were greater than three. The findings suggested that CRF35_AD sequences were genetically similar to parental sequences from Kenya and Uganda, and to a set of subtype A1 sequences available from Afghan refugees living in Pakistan. Our results also showed that across all phylogenies, Afghan and Iranian CRF35_AD sequences formed a monophyletic cluster (posterior clade credibility> 0.7). The divergence date of this cluster was estimated to be between 1990 and 1992. Within this cluster, a bidirectional dispersion of the virus was observed across Afghanistan and Iran. We could not clearly identify if Afghanistan or Iran first established or received this epidemic, as the root location of this cluster could not be robustly estimated. Three CRF35_AD sequences from Afghan refugees living in Pakistan nested among Afghan and Iranian CRF35_AD branches. However, the CRF35_AD-like sequence available from USA diverged independently from Kenyan subtype A1 sequences, suggesting it not to be a true CRF35_AD lineage. Potential factors contributing to viral exchange between Afghanistan and Iran could be injection drug networks and mass migration of Afghan refugees and labours to Iran, which calls for extensive preventive efforts. PMID:27280293

  8. Amplification of the entire kanamycin biosynthetic gene cluster during empirical strain improvement of Streptomyces kanamyceticus.

    PubMed

    Yanai, Koji; Murakami, Takeshi; Bibb, Mervyn

    2006-06-20

    Streptomyces kanamyceticus 12-6 is a derivative of the wild-type strain developed for industrial kanamycin (Km) production. Southern analysis and DNA sequencing revealed amplification of a large genomic segment including the entire Km biosynthetic gene cluster in the chromosome of strain 12-6. At 145 kb, the amplifiable unit of DNA (AUD) is the largest AUD reported in Streptomyces. Striking repetitive DNA sequences belonging to the clustered regularly interspaced short palindromic repeats family were found in the AUD and may play a role in its amplification. Strain 12-6 contains a mixture of different chromosomes with varying numbers of AUDs, sometimes exceeding 36 copies and producing an amplified region >5.7 Mb. The level of Km production depended on the copy number of the Km biosynthetic gene cluster, suggesting that DNA amplification occurred during strain improvement as a consequence of selection for increased Km resistance. Amplification of DNA segments including entire antibiotic biosynthetic gene clusters might be a common mechanism leading to increased antibiotic production in industrial strains.

  9. A Photometric Survey of the Open Clusters NGC 7789 and M67

    NASA Astrophysics Data System (ADS)

    Janes, Kenneth

    2010-01-01

    Although there is strong evidence that stellar activity declines as a star ages, beyond about the age of the Hyades (600 Myr) there is little direct confirmation of this decline in stars of known age. This report is an update of an earlier report (Hayes-Gehrke, et al., 2004, AJ, 128, 2862) of a long-term project to explore stellar activity in old open clusters. I have now accumulated 12 years of photometry of the old clusters NGC 7789 (about 1.8 Gyr) and M 67 (about 4 Gyr). An analysis of these data has revealed a substantial number of low-amplitude variable stars in both clusters, including a number of previously-discovered eclipsing binary stars, and several stars near the main sequence turnoff of both clusters that exhibit apparently erratic variations. Some of the M 67 erratics are known X-ray sources. On the main sequence, the large majority of stars show little or no evidence for variability at the 0.1% - 0.2% level, consistent with a regular systematic decline in activity level with age.

  10. Escherichia coli O-Antigen Gene Clusters of Serogroups O62, O68, O131, O140, O142, and O163: DNA Sequences and Similarity between O62 and O68, and PCR-Based Serogrouping

    PubMed Central

    Liu, Yanhong; Yan, Xianghe; DebRoy, Chitrita; Fratamico, Pina M.; Needleman, David S.; Li, Robert W.; Wang, Wei; Losada, Liliana; Brinkac, Lauren; Radune, Diana; Toro, Magaly; Hegde, Narasimha; Meng, Jianghong

    2015-01-01

    The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase) and/or wzy (O-antigen polymerase) genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS) element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources. PMID:25664526

  11. Clustered regularly interspaced short palindromic repeats (CRISPRs) analysis of members of the Mycobacterium tuberculosis complex.

    PubMed

    Botelho, Ana; Canto, Ana; Leão, Célia; Cunha, Mónica V

    2015-01-01

    Typical CRISPR (clustered, regularly interspaced, short palindromic repeat) regions are constituted by short direct repeats (DRs), interspersed with similarly sized non-repetitive spacers, derived from transmissible genetic elements, acquired when the cell is challenged with foreign DNA. The analysis of the structure, in number and nature, of CRISPR spacers is a valuable tool for molecular typing since these loci are polymorphic among strains, originating characteristic signatures. The existence of CRISPR structures in the genome of the members of Mycobacterium tuberculosis complex (MTBC) enabled the development of a genotyping method, based on the analysis of the presence or absence of 43 oligonucleotide spacers separated by conserved DRs. This method, called spoligotyping, consists on PCR amplification of the DR chromosomal region and recognition after hybridization of the spacers that are present. The workflow beneath this methodology implies that the PCR products are brought onto a membrane containing synthetic oligonucleotides that have complementary sequences to the spacer sequences. Lack of hybridization of the PCR products to a specific oligonucleotide sequence indicates absence of the correspondent spacer sequence in the examined strain. Spoligotyping gained great notoriety as a robust identification and typing tool for members of MTBC, enabling multiple epidemiological studies on human and animal tuberculosis.

  12. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

    PubMed

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world. Copyright © 2015 Hadjithomas et al.

  13. Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis and proposals to emend the description of Streptomyces albus and describe Streptomyces pathocidini sp. nov.

    PubMed Central

    Doroghazi, J. R.; Ju, K.-S.; Metcalf, W. W.

    2014-01-01

    In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T forms a cluster with five other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these other species, including Streptomyces almquistii NRRL B-1685T, Streptomyces flocculus NRRL B-2465T, Streptomyces gibsonii NRRL B-1335T and Streptomyces rangoonensis NRRL B-12378T are quite similar. This cluster is of particular taxonomic interest because Streptomyces albus is the type species of the genus Streptomyces. The related strains were subjected to multilocus sequence analysis (MLSA) utilizing partial sequences of the housekeeping genes atpD, gyrB, recA, rpoB and trpB and confirmation of previously reported phenotypic characteristics. The five strains formed a coherent cluster supported by a 100 % bootstrap value in phylogenetic trees generated from sequence alignments prepared by concatenating the sequences of the housekeeping genes, and identical tree topology was observed using various different tree-making algorithms. Moreover, all but one strain, S. flocculus NRRL B-2465T, exhibited identical sequences for all of the five housekeeping gene loci sequenced, but NRRL B-2465T still exhibited an MLSA evolutionary distance of 0.005 from the other strains, a value that is lower than the 0.007 MLSA evolutionary distance threshold proposed for species-level relatedness. These data support a proposal to reclassify S. almquistii, S. flocculus, S. gibsonii and S. rangoonensis as later heterotypic synonyms of S. albus with NRRL B-1811T as the type strain. The MLSA sequence database also demonstrated utility for quickly and conclusively confirming that numerous strains within the ARS Culture Collection had been previously misidentified as subspecies of S. albus and that Streptomyces albus subsp. pathocidicus should be redescribed as a novel species, Streptomyces pathocidini sp. nov., with the type strain NRRL B-24287T. PMID:24277863

  14. Identifying and reducing error in cluster-expansion approximations of protein energies.

    PubMed

    Hahn, Seungsoo; Ashenberg, Orr; Grigoryan, Gevorg; Keating, Amy E

    2010-12-01

    Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design. © 2010 Wiley Periodicals, Inc.

  15. Stellar Clusters in the NGC 6334 Star-Forming Complex

    NASA Astrophysics Data System (ADS)

    Feigelson, Eric D.; Martin, Amanda L.; McNeill, Collin J.; Broos, Patrick S.; Garmire, Gordon P.

    2009-07-01

    The full stellar population of NGC 6334, one of the most spectacular regions of massive star formation in the nearby Galaxy, has not been well sampled in past studies. We analyze here a mosaic of two Chandra X-ray Observatory images of the region using sensitive data analysis methods, giving a list of 1607 faint X-ray sources with arcsecond positions and approximate line-of-sight absorption. About 95% of these are expected to be cluster members, most lower mass pre-main-sequence stars. Extrapolating to low X-ray levels, the total stellar population is estimated to be 20,000-30,000 pre-main-sequence stars. The X-ray sources show a complicated spatial pattern with ~10 distinct star clusters. The heavily obscured clusters are mostly associated with previously known far-infrared sources and radio H II regions. The lightly obscured clusters are mostly newly identified in the X-ray images. Dozens of likely OB stars are found, both in clusters and dispersed throughout the region, suggesting that star formation in the complex has proceeded over millions of years. A number of extraordinarily heavily absorbed X-ray sources are associated with the active regions of star formation.

  16. Molecular Epidemiological Survey and Genetic Characterization of Anaplasma Species in Mongolian Livestock.

    PubMed

    Ochirkhuu, Nyamsuren; Konnai, Satoru; Odbileg, Raadan; Murata, Shiro; Ohashi, Kazuhiko

    2017-08-01

    Anaplasma species are obligate intracellular rickettsial pathogens that cause great economic loss to the animal industry. Few studies on Anaplasma infections in Mongolian livestock have been conducted. This study examined the prevalence of Anaplasma marginale, Anaplasma ovis, Anaplasma phagocytophilum, and Anaplasma bovis by polymerase chain reaction assay in 928 blood samples collected from native cattle and dairy cattle (Bos taurus), yaks (Bos grunniens), sheep (Ovis aries), and goats (Capra aegagrus hircus) in four provinces of Ulaanbaatar city in Mongolia. We genetically characterized positive samples through sequencing analysis based on the heat-shock protein groEL, major surface protein 4 (msp4), and 16S rRNA genes. Only A. ovis was detected in Mongolian livestock (cattle, yaks, sheep, and goats), with 413 animals (44.5%) positive for groEL and 308 animals (33.2%) positive for msp4 genes. In the phylogenetic tree, we separated A. ovis sequences into two distinct clusters based on the groEL gene. One cluster comprised sequences derived mainly from sheep and goats, which was similar to that in A. ovis isolates from other countries. The other divergent cluster comprised sequences derived from cattle and yaks and appeared to be newly branched from that in previously published single isolates in Mongolian cattle. In addition, the msp4 gene of A. ovis using same and different samples with groEL gene of the pathogen demonstrated that all sequences derived from all animal species, except for three sequences derived from cattle and yak, were clustered together, and were identical or similar to those in isolates from other countries. We used 16S rRNA gene sequences to investigate the genetically divergent A. ovis and identified high homology of 99.3-100%. However, the sequences derived from cattle did not match those derived from sheep and goats. The results of this study on the prevalence and molecular characterization of A. ovis in Mongolian livestock can facilitate the control of infectious diseases in livestock.

  17. Identification of cephalopod species from the North and Baltic Seas using morphology, COI and 18S rDNA sequences

    NASA Astrophysics Data System (ADS)

    Gebhardt, Katharina; Knebelsberger, Thomas

    2015-09-01

    We morphologically analyzed 79 cephalopod specimens from the North and Baltic Seas belonging to 13 separate species. Another 29 specimens showed morphological features of either Alloteuthis mediaor Alloteuthis subulata or were found to be in between. Reliable identification features to distinguish between A. media and A. subulata are currently not available. The analysis of the DNA barcoding region of the COI gene revealed intraspecific distances (uncorrected p) ranging from 0 to 2.13 % (average 0.1 %) and interspecific distances between 3.31 and 22 % (average 15.52 %). All species formed monophyletic clusters in a neighbor-joining analysis and were supported by bootstrap values of ≥99 %. All COI haplotypes belonging to the 29 Alloteuthis specimens were grouped in one cluster. Neither COI nor 18S rDNA sequences helped to distinguish between the different Alloteuthis morphotypes. For species identification purposes, we recommend the use of COI, as it showed higher bootstrap support of species clusters and less amplification and sequencing failure compared to 18S. Our data strongly support the assumption that the genus Alloteuthis is only represented by a single species, at least in the North Sea. It remained unclear whether this species is A. subulata or A. media. All COI sequences including important metadata were uploaded to the Barcode of Life Data Systems and can be used as reference library for the molecular identification of more than 50 % of the cephalopod fauna known from the North and Baltic Seas.

  18. Statistical analysis of life history calendar data.

    PubMed

    Eerola, Mervi; Helske, Satu

    2016-04-01

    The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries. © The Author(s) 2012.

  19. Characterization of limes (Citrus aurantifolia) grown in Bhutan and Indonesia using high-throughput sequencing

    PubMed Central

    Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio

    2014-01-01

    Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes. PMID:24781859

  20. Genetic characterization of Measles Viruses in China, 2004

    PubMed Central

    Zhang, Yan; Ji, Yixin; Jiang, Xiaohong; Xu, Songtao; Zhu, Zhen; Zheng, Lei; He, Jilan; Ling, Hua; Wang, Yan; Liu, Yang; Du, Wen; Yang, Xuelei; Mao, Naiying; Xu, Wenbo

    2008-01-01

    Genetic characterization of wild-type measles virus was studied using nucleotide sequencing of the C-terminal region of the N protein gene and phylogenetic analysis on 59 isolates from 16 provinces of China in 2004. The results showed that all of the isolates belonged to genotype H1. 51 isolates were belonged to cluster 1 and 8 isolates were cluster 2 and Viruses from both clusters were distributed throughout China without distinct geographic pattern. The nucleotide sequence and predicted amino acid homologies of the 59 H1 strains were 96.5%–100% and 95.7%–100%, respectively. The report showed that the transmission pattern of genotype H1 viruses in China in 2004 was consistent with ongoing endemic transmission of multiple lineages of a single, endemic genotype. Multiple transmission pathways leaded to multiple lineages within endemic genotype. PMID:18928575

  1. Mumps virus F gene and HN gene sequencing as a molecular tool to study mumps virus transmission.

    PubMed

    Gouma, Sigrid; Cremer, Jeroen; Parkkali, Saara; Veldhuijzen, Irene; van Binnendijk, Rob S; Koopmans, Marion P G

    2016-11-01

    Various mumps outbreaks have occurred in the Netherlands since 2004, particularly among persons who had received 2 doses of measles, mumps, and rubella (MMR) vaccination. Genomic typing of pathogens can be used to track outbreaks, but the established genotyping of mumps virus based on the small hydrophobic (SH) gene sequences did not provide sufficient resolution. Therefore, we expanded the sequencing to include fusion (F) gene and haemagglutinin-neuraminidase (HN) gene sequences in addition to the SH gene sequences from 109 mumps virus genotype G strains obtained between 2004 and mid 2015 in the Netherlands. When the molecular information from these 3 genes was combined, we were able to identify separate mumps virus clusters and track mumps virus transmission. The analyses suggested that multiple mumps virus introductions occurred in the Netherlands between 2004 and 2015 resulting in several mumps outbreaks throughout this period, whereas during some local outbreaks the molecular data pointed towards endemic circulation. Combined analysis of epidemiological data and sequence data collected in 2015 showed good support for the phylogenetic clustering. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. The Gaia-ESO Survey: the present-day radial metallicity distribution of the Galactic disc probed by pre-main-sequence clusters

    NASA Astrophysics Data System (ADS)

    Spina, L.; Randich, S.; Magrini, L.; Jeffries, R. D.; Friel, E. D.; Sacco, G. G.; Pancino, E.; Bonito, R.; Bravi, L.; Franciosini, E.; Klutsch, A.; Montes, D.; Gilmore, G.; Vallenari, A.; Bensby, T.; Bragaglia, A.; Flaccomio, E.; Koposov, S. E.; Korn, A. J.; Lanzafame, A. C.; Smiljanic, R.; Bayo, A.; Carraro, G.; Casey, A. R.; Costado, M. T.; Damiani, F.; Donati, P.; Frasca, A.; Hourihane, A.; Jofré, P.; Lewis, J.; Lind, K.; Monaco, L.; Morbidelli, L.; Prisinzano, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.

    2017-05-01

    Context. The radial metallicity distribution in the Galactic thin disc represents a crucial constraint for modelling disc formation and evolution. Open star clusters allow us to derive both the radial metallicity distribution and its evolution over time. Aims: In this paper we perform the first investigation of the present-day radial metallicity distribution based on [Fe/H] determinations in late type members of pre-main-sequence clusters. Because of their youth, these clusters are therefore essential for tracing the current interstellar medium metallicity. Methods: We used the products of the Gaia-ESO Survey analysis of 12 young regions (age < 100 Myr), covering Galactocentric distances from 6.67 to 8.70 kpc. For the first time, we derived the metal content of star forming regions farther than 500 pc from the Sun. Median metallicities were determined through samples of reliable cluster members. For ten clusters the membership analysis is discussed in the present paper, while for other two clusters (I.e. Chamaeleon I and Gamma Velorum) we adopted the members identified in our previous works. Results: All the pre-main-sequence clusters considered in this paper have close-to-solar or slightly sub-solar metallicities. The radial metallicity distribution traced by these clusters is almost flat, with the innermost star forming regions having [Fe/H] values that are 0.10-0.15 dex lower than the majority of the older clusters located at similar Galactocentric radii. Conclusions: This homogeneous study of the present-day radial metallicity distribution in the Galactic thin disc favours models that predict a flattening of the radial gradient over time. On the other hand, the decrease of the average [Fe/H] at young ages is not easily explained by the models. Our results reveal a complex interplay of several processes (e.g. star formation activity, initial mass function, supernova yields, gas flows) that controlled the recent evolution of the Milky Way. Based on observations made with the ESO/VLT, at Paranal Observatory, under program 188.B-3002 (The Gaia-ESO Public Spectroscopic Survey).Full Table 1 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/601/A70

  3. Global Occurrence of Archaeal amoA Genes in Terrestrial Hot Springs▿

    PubMed Central

    Zhang, Chuanlun L.; Ye, Qi; Huang, Zhiyong; Li, WenJun; Chen, Jinquan; Song, Zhaoqi; Zhao, Weidong; Bagwell, Christopher; Inskeep, William P.; Ross, Christian; Gao, Lei; Wiegel, Juergen; Romanek, Christopher S.; Shock, Everett L.; Hedlund, Brian P.

    2008-01-01

    Despite the ubiquity of ammonium in geothermal environments and the thermodynamic favorability of aerobic ammonia oxidation, thermophilic ammonia-oxidizing microorganisms belonging to the crenarchaeota kingdom have only recently been described. In this study, we analyzed microbial mats and surface sediments from 21 hot spring samples (pH 3.4 to 9.0; temperature, 41 to 86°C) from the United States, China, and Russia and obtained 846 putative archaeal ammonia monooxygenase large-subunit (amoA) gene and transcript sequences, representing a total of 41 amoA operational taxonomic units (OTUs) at 2% identity. The amoA gene sequences were highly diverse, yet they clustered within two major clades of archaeal amoA sequences known from water columns, sediments, and soils: clusters A and B. Eighty-four percent (711/846) of the sequences belonged to cluster A, which is typically found in water columns and sediments, whereas 16% (135/846) belonged to cluster B, which is typically found in soils and sediments. Although a few amoA OTUs were present in several geothermal regions, most were specific to a single region. In addition, cluster A amoA genes formed geographic groups, while cluster B sequences did not group geographically. With the exception of only one hot spring, principal-component analysis and UPGMA (unweighted-pair group method using average linkages) based on the UniFrac metric derived from cluster A grouped the springs by location, regardless of temperature or bulk water pH, suggesting that geography may play a role in structuring communities of putative ammonia-oxidizing archaea (AOA). The amoA genes were distinct from those of low-temperature environments; in particular, pair-wise comparisons between hot spring amoA genes and those from sympatric soils showed less than 85% sequence identity, underscoring the distinctness of hot spring archaeal communities from those of the surrounding soil system. Reverse transcription-PCR showed that amoA genes were transcribed in situ in one spring and the transcripts were closely related to the amoA genes amplified from the same spring. Our study demonstrates the global occurrence of putative archaeal amoA genes in a wide variety of terrestrial hot springs and suggests that geography may play an important role in selecting different assemblages of AOA. PMID:18676703

  4. Global occurrence of archaeal amoA genes in terrestrial hot springs.

    PubMed

    Zhang, Chuanlun L; Ye, Qi; Huang, Zhiyong; Li, Wenjun; Chen, Jinquan; Song, Zhaoqi; Zhao, Weidong; Bagwell, Christopher; Inskeep, William P; Ross, Christian; Gao, Lei; Wiegel, Juergen; Romanek, Christopher S; Shock, Everett L; Hedlund, Brian P

    2008-10-01

    Despite the ubiquity of ammonium in geothermal environments and the thermodynamic favorability of aerobic ammonia oxidation, thermophilic ammonia-oxidizing microorganisms belonging to the crenarchaeota kingdom have only recently been described. In this study, we analyzed microbial mats and surface sediments from 21 hot spring samples (pH 3.4 to 9.0; temperature, 41 to 86 degrees C) from the United States, China, and Russia and obtained 846 putative archaeal ammonia monooxygenase large-subunit (amoA) gene and transcript sequences, representing a total of 41 amoA operational taxonomic units (OTUs) at 2% identity. The amoA gene sequences were highly diverse, yet they clustered within two major clades of archaeal amoA sequences known from water columns, sediments, and soils: clusters A and B. Eighty-four percent (711/846) of the sequences belonged to cluster A, which is typically found in water columns and sediments, whereas 16% (135/846) belonged to cluster B, which is typically found in soils and sediments. Although a few amoA OTUs were present in several geothermal regions, most were specific to a single region. In addition, cluster A amoA genes formed geographic groups, while cluster B sequences did not group geographically. With the exception of only one hot spring, principal-component analysis and UPGMA (unweighted-pair group method using average linkages) based on the UniFrac metric derived from cluster A grouped the springs by location, regardless of temperature or bulk water pH, suggesting that geography may play a role in structuring communities of putative ammonia-oxidizing archaea (AOA). The amoA genes were distinct from those of low-temperature environments; in particular, pair-wise comparisons between hot spring amoA genes and those from sympatric soils showed less than 85% sequence identity, underscoring the distinctness of hot spring archaeal communities from those of the surrounding soil system. Reverse transcription-PCR showed that amoA genes were transcribed in situ in one spring and the transcripts were closely related to the amoA genes amplified from the same spring. Our study demonstrates the global occurrence of putative archaeal amoA genes in a wide variety of terrestrial hot springs and suggests that geography may play an important role in selecting different assemblages of AOA.

  5. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  6. ORGANIZATION OF THE nif GENES OF THE NONHETEROCYSTOUS CYANOBACTERIUM TRICHODESMIUM SP. IMS101.

    PubMed

    Dominic, Benny; Zani, Sabino; Chen, Yi-Bu; Mellon, Mark T; Zehr, Jonathan P

    2000-08-26

    An approximately 16-kb fragment of the Trichodesmium sp. IMS101 (a nonheterocystous filamentous cyanobacterium) "conventional"nif gene cluster was cloned and sequenced. The gene organization of the Trichodesmium and Anabaena variabilis vegetative (nif 2) nitrogenase gene clusters spanning the region from nif B to nif W are similar except for the absence of two open reading frames (ORF3 and ORF1) in Trichodesmium. The Trichodesmium nif EN genes encode a fused Nif EN polypeptide that does not appear to be processed into individual Nif E and Nif N polypeptides. Fused nif EN genes were previously found in the A. variabilis nif 2 genes, but we have found that fused nif EN genes are widespread in the nonheterocystous cyanobacteria. Although the gene organization of the nonheterocystous filamentous Trichodesmium nif gene cluster is very similar to that of the A. variabilis vegetative nif 2 gene cluster, phylogenetic analysis of nif sequences do not support close relatedness of Trichodesmium and A. variabilis vegetative (nif 2) nitrogenase genes.

  7. Domain Evolution and Functional Diversification of Sulfite Reductases

    NASA Astrophysics Data System (ADS)

    Dhillon, Ashita; Goswami, Sulip; Riley, Monica; Teske, Andreas; Sogin, Mitchell

    2005-02-01

    Sulfite reductases are key enzymes of assimilatory and dissimilatory sulfur metabolism, which occur in diverse bacterial and archaeal lineages. They share a highly conserved domain "C-X5-C-n-C-X3-C" for binding siroheme and iron-sulfur clusters that facilitate electron transfer to the substrate. For each sulfite reductase cluster, the siroheme-binding domain is positioned slightly differently at the N-terminus of dsrA and dsrB, while in the assimilatory proteins the siroheme domain is located at the C-terminus. Our sequence and phylogenetic analysis of the siroheme-binding domain shows that sulfite reductase sequences diverged from a common ancestor into four separate clusters (aSir, alSir, dsr, and asrC) that are biochemically distinct; each serves a different assimilatory or dissimilatory role in sulfur metabolism. The phylogenetic distribution and functional grouping in sulfite reductase clusters (dsrA and dsrB vs. aSiR, asrC, and alSir) suggest that their functional diversification during evolution may have preceded the bacterial/archaeal divergence.

  8. Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes.

    PubMed

    Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W

    2018-02-14

    Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.

  9. Genome Neighborhood Network Reveals Insights into Enediyne Biosynthesis and Facilitates Prediction and Prioritization for Discovery

    PubMed Central

    Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben

    2015-01-01

    The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027

  10. Niche specificity of ammonia-oxidizing archaeal and bacterial communities in a freshwater wetland receiving municipal wastewater in Daqing, Northeast China.

    PubMed

    Lee, Kwok-Ho; Wang, Yong-Feng; Li, Hui; Gu, Ji-Dong

    2014-12-01

    Ecophysiological differences between ammonia-oxidizing bacteria (AOB) and ammonia-oxidizing archaea (AOA) enable them to adapt to different niches in complex freshwater wetland ecosystems. The community characters of AOA and AOB in the different niches in a freshwater wetland receiving municipal wastewater, as well as the physicochemical parameters of sediment/soil samples, were investigated in this study. AOA community structures varied and separated from each other among four different niches. Wetland vegetation including aquatic macrophytes and terrestrial plants affected the AOA community composition but less for AOB, whereas sediment depths might contribute to the AOB community shift. The diversity of AOA communities was higher than that of AOB across all four niches. Archaeal and bacterial amoA genes (encoding for the alpha-subunit of ammonia monooxygenases) were most diverse in the dry-land niche, indicating O2 availability might favor ammonia oxidation. The majority of AOA amoA sequences belonged to the Soil/sediment Cluster B in the freshwater wetland ecosystems, while the dominant AOB amoA sequences were affiliated with Nitrosospira-like cluster. In the Nitrosospira-like cluster, AOB amoA gene sequences affiliated with the uncultured ammonia-oxidizing beta-proteobacteria constituted the largest portion (99%). Moreover, independent methods for phylogenetic tree analysis supported high parsimony bootstrap values. As a consequence, it is proposed that Nitrosospira-like amoA gene sequences recovered in this study represent a potentially novel cluster, grouping with the sequences from Gulf of Mexico deposited in the public databases.

  11. Classification of Cowpox Viruses into Several Distinct Clades and Identification of a Novel Lineage

    PubMed Central

    Franke, Annika; Pfaff, Florian; Jenckel, Maria; Hoffmann, Bernd; Höper, Dirk; Antwerpen, Markus; Meyer, Hermann; Beer, Martin; Hoffmann, Donata

    2017-01-01

    Cowpox virus (CPXV) was considered as uniform species within the genus Orthopoxvirus (OPV). Previous phylogenetic analysis indicated that CPXV is polyphyletic and isolates may cluster into different clades with two of these clades showing genetic similarities to either variola (VARV) or vaccinia viruses (VACV). Further analyses were initiated to assess both the genetic diversity and the evolutionary background of circulating CPXVs. Here we report the full-length sequences of 20 CPXV strains isolated from different animal species and humans in Germany. A phylogenetic analysis of altogether 83 full-length OPV genomes confirmed the polyphyletic character of the species CPXV and suggested at least four different clades. The German isolates from this study mainly clustered into two CPXV-like clades, and VARV- and VACV-like strains were not observed. A single strain, isolated from a cotton-top tamarin, clustered distantly from all other CPXVs and might represent a novel and unique evolutionary lineage. The classification of CPXV strains into clades roughly followed their geographic origin, with the highest clade diversity so far observed for Germany. Furthermore, we found evidence for recombination between OPV clades without significant disruption of the observed clustering. In conclusion, this analysis markedly expands the number of available CPXV full-length sequences and confirms the co-circulation of several CPXV clades in Germany, and provides the first data about a new evolutionary CPXV lineage. PMID:28604604

  12. antiSMASH 2.0--a versatile platform for genome mining of secondary metabolite producers.

    PubMed

    Blin, Kai; Medema, Marnix H; Kazempour, Daniyal; Fischbach, Michael A; Breitling, Rainer; Takano, Eriko; Weber, Tilmann

    2013-07-01

    Microbial secondary metabolites are a potent source of antibiotics and other pharmaceuticals. Genome mining of their biosynthetic gene clusters has become a key method to accelerate their identification and characterization. In 2011, we developed antiSMASH, a web-based analysis platform that automates this process. Here, we present the highly improved antiSMASH 2.0 release, available at http://antismash.secondarymetabolites.org/. For the new version, antiSMASH was entirely re-designed using a plug-and-play concept that allows easy integration of novel predictor or output modules. antiSMASH 2.0 now supports input of multiple related sequences simultaneously (multi-FASTA/GenBank/EMBL), which allows the analysis of draft genomes comprising multiple contigs. Moreover, direct analysis of protein sequences is now possible. antiSMASH 2.0 has also been equipped with the capacity to detect additional classes of secondary metabolites, including oligosaccharide antibiotics, phenazines, thiopeptides, homo-serine lactones, phosphonates and furans. The algorithm for predicting the core structure of the cluster end product is now also covering lantipeptides, in addition to polyketides and non-ribosomal peptides. The antiSMASH ClusterBlast functionality has been extended to identify sub-clusters involved in the biosynthesis of specific chemical building blocks. The new features currently make antiSMASH 2.0 the most comprehensive resource for identifying and analyzing novel secondary metabolite biosynthetic pathways in microorganisms.

  13. A TALE OF DWARFS AND GIANTS: USING A z = 1.62 CLUSTER TO UNDERSTAND HOW THE RED SEQUENCE GREW OVER THE LAST 9.5 BILLION YEARS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rudnick, Gregory H.; Tran, Kim-Vy; Papovich, Casey

    2012-08-10

    We study the red sequence in a cluster of galaxies at z = 1.62 and follow its evolution over the intervening 9.5 Gyr to the present day. Using deep YJK{sub s} imaging with the HAWK-I instrument on the Very Large Telescope, we identify a tight red sequence and construct its rest-frame i-band luminosity function (LF). There is a marked deficit of faint red galaxies in the cluster that causes a turnover in the LF. We compare the red-sequence LF to that for clusters at z < 0.8, correcting the luminosities for passive evolution. The shape of the cluster red-sequence LFmore » does not evolve between z = 1.62 and z = 0.6 but at z < 0.6 the faint population builds up significantly. Meanwhile, between z = 1.62 and 0.6 the inferred total light on the red sequence grows by a factor of {approx}2 and the bright end of the LF becomes more populated. We construct a simple model for red-sequence evolution that grows the red sequence in total luminosity and matches the constant LF shape at z > 0.6. In this model the cluster accretes blue galaxies from the field whose star formation is quenched and who are subsequently allowed to merge. We find that three to four mergers among cluster galaxies during the 4 Gyr between z = 1.62 and z = 0.6 match the observed LF evolution between the two redshifts. The inferred merger rate is consistent with other studies of this cluster. Our result supports the picture that galaxy merging during the major growth phase of massive clusters is an important process in shaping the red-sequence population at all luminosities.« less

  14. MIPS: a database for genomes and protein sequences.

    PubMed Central

    Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

    1999-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138

  15. Phylogenetic Analysis of Prevalent Tuberculosis and Non-Tuberculosis Mycobacteria in Isfahan, Iran, Based on a 360 bp Sequence of the rpoB Gene

    PubMed Central

    Nasr Esfahani, Bahram; Moghim, Sharareh; Ghasemian Safaei, Hajieh; Moghoofei, Mohsen; Sedighi, Mansour; Hadifar, Shima

    2016-01-01

    Background Taxonomic and phylogenetic studies of Mycobacterium species have been based around the 16sRNA gene for many years. However, due to the high strain similarity between species in the Mycobacterium genus (94.3% - 100%), defining a valid phylogenetic tree is difficult; consequently, its use in estimating the boundaries between species is limited. The sequence of the rpoB gene makes it an appropriate gene for phylogenetic analysis, especially in bacteria with limited variation. Objectives In the present study, a 360bp sequence of rpoB was used for precise classification of Mycobacterium strains isolated in Isfahan, Iran. Materials and Methods From February to October 2013, 57 clinical and environmental isolates were collected, subcultured, and identified by phenotypic methods. After DNA extraction, a 360bp fragment was PCR-amplified and sequenced. The phylogenetic tree was constructed based on consensus sequence data, using MEGA5 software. Results Slow and fast-growing groups of the Mycobacterium strains were clearly differentiated based on the constructed tree of 56 common Mycobacterium isolates. Each species with a unique title in the tree was identified; in total, 13 nods with a bootstrap value of over 50% were supported. Among the slow-growing group was Mycobacterium kansasii, with M. tuberculosis in a cluster with a bootstrap value of 98% and M. gordonae in another cluster with a bootstrap value of 90%. In the fast-growing group, one cluster with a bootstrap value of 89% was defined, including all fast-growing members present in this study. Conclusions The results suggest that only the application of the rpoB gene sequence is sufficient for taxonomic categorization and definition of a new Mycobacterium species, due to its high resolution power and proper variation in its sequence (85% - 100%); the resulting tree has high validity. PMID:27284397

  16. VizieR Online Data Catalog: NGC 6802 dwarf cluster members and non-members (Tang+, 2017)

    NASA Astrophysics Data System (ADS)

    Tang, B.; Geisler, D.; Friel, E.; Villanova, S.; Smiljanic, R.; Casey, A. R.; Randich, S.; Magrini, L.; San, Roman I.; Munoz, C.; Cohen, R. E.; Mauro, F.; Bragaglia, A.; Donati, P.; Tautvaisiene, G.; Drazdauskas, A.; Zenoviene, R.; Snaith, O.; Sousa, S.; Adibekyan, V.; Costado, M. T.; Blanco-Cuaresma, S.; Jimenez-Esteban, F.; Carraro, G.; Zwitter, T.; Francois, P.; Jofre, P.; Sordo, R.; Gilmore, G.; Flaccomio, E.; Koposov, S.; Korn, A. J.; Lanzafame, A. C.; Pancino, E.; Bayo, A.; Damiani, F.; Franciosini, E.; Hourihane, A.; Lardo, C.; Lewis, J.; Monaco, L.; Morbidelli, L.; Prisinzano, L.; Sacco, G.; Worley, C. C.; Zaggia, S.

    2016-11-01

    The dwarf stars in NGC 6802 observed by GIRAFFE spectrograph are separated into four tables: 1. cluster members in the lower main sequence; 2. cluster members in the upper main sequence; 3. non-member dwarfs in the lower main sequence; 4. non-member dwarfs in the upper main sequence. The star coordinates, V band magnitude, V-I color, and radial velocity are given. (4 data files).

  17. Phylogenetic Relationships of Citrus and Its Relatives Based on matK Gene Sequences

    PubMed Central

    Penjor, Tshering; Uehara, Miki; Ide, Manami; Matsumoto, Natsumi; Matsumoto, Ryoji

    2013-01-01

    The genus Citrus includes mandarin, orange, lemon, grapefruit and lime, which have high economic and nutritional value. The family Rutaceae can be divided into 7 subfamilies, including Aurantioideae. The genus Citrus belongs to the subfamily Aurantioideae. In this study, we sequenced the chloroplast matK genes of 135 accessions from 22 genera of Aurantioideae and analyzed them phylogenetically. Our study includes many accessions that have not been examined in other studies. The subfamily Aurantioideae has been classified into 2 tribes, Clauseneae and Citreae, and our current molecular analysis clearly discriminate Citreae from Clauseneae by using only 1 chloroplast DNA sequence. Our study confirms previous observations on the molecular phylogeny of Aurantioideae in many aspects. However, we have provided novel information on these genetic relationships. For example, inconsistent with the previous observation, and consistent with our preliminary study using the chloroplast rbcL genes, our analysis showed that Feroniella oblata is not nested in Citrus species and is closely related with Feronia limonia. Furthermore, we have shown that Murraya paniculata is similar to Merrillia caloxylon and is dissimilar to Murraya koenigii. We found that “true citrus fruit trees” could be divided into 2 subclusters. One subcluster included Citrus, Fortunella, and Poncirus, while the other cluster included Microcitrus and Eremocitrus. Compared to previous studies, our current study is the most extensive phylogenetic study of Citrus species since it includes 93 accessions. The results indicate that Citrus species can be classified into 3 clusters: a citron cluster, a pummelo cluster, and a mandarin cluster. Although most mandarin accessions belonged to the mandarin cluster, we found some exceptions. We also obtained the information on the genetic background of various species of acid citrus grown in Japan. Because the genus Citrus contains many important accessions, we have comprehensively discussed the classification of this genus. PMID:23638116

  18. Detection and phylogenetic analysis of bacteriophage WO in spiders (Araneae).

    PubMed

    Yan, Qian; Qiao, Huping; Gao, Jin; Yun, Yueli; Liu, Fengxiang; Peng, Yu

    2015-11-01

    Phage WO is a bacteriophage found in Wolbachia. Herein, we represent the first phylogenetic study of WOs that infect spiders (Araneae). Seven species of spiders (Araneus alternidens, Nephila clavata, Hylyphantes graminicola, Prosoponoides sinensis, Pholcus crypticolens, Coleosoma octomaculatum, and Nurscia albofasciata) from six families were infected by Wolbachia and WO, followed by comprehensive sequence analysis. Interestingly, WO could be only detected Wolbachia-infected spiders. The relative infection rates of those seven species of spiders were 75, 100, 88.9, 100, 62.5, 72.7, and 100 %, respectively. Our results indicated that both Wolbachia and WO were found in three different body parts of N. clavata, and WO could be passed to the next generation of H. graminicola by vertical transmission. There were three different sequences for WO infected in A. alternidens and two different WO sequences from C. octomaculatum. Only one sequence of WO was found for the other five species of spiders. The discovered sequence of WO ranged from 239 to 311 bp. Phylogenetic tree was generated using maximum likelihood (ML) based on the orf7 gene sequences. According to the phylogenetic tree, WOs in N. clavata and H. graminicola were clustered in the same group. WOs from A. alternidens (WAlt1) and C. octomaculatum (WOct2) were closely related to another clade, whereas WO in P. sinensis was classified as a sole cluster.

  19. ClusterMine360: a database of microbial PKS/NRPS biosynthesis

    PubMed Central

    Conway, Kyle R.; Boddy, Christopher N.

    2013-01-01

    ClusterMine360 (http://www.clustermine360.ca/) is a database of microbial polyketide and non-ribosomal peptide gene clusters. It takes advantage of crowd-sourcing by allowing members of the community to make contributions while automation is used to help achieve high data consistency and quality. The database currently has >200 gene clusters from >185 compound families. It also features a unique sequence repository containing >10 000 polyketide synthase/non-ribosomal peptide synthetase domains. The sequences are filterable and downloadable as individual or multiple sequence FASTA files. We are confident that this database will be a useful resource for members of the polyketide synthases/non-ribosomal peptide synthetases research community, enabling them to keep up with the growing number of sequenced gene clusters and rapidly mine these clusters for functional information. PMID:23104377

  20. Genetic Diversity and Differentiation of Colletotrichum spp. Isolates Associated with Leguminosae Using Multigene Loci, RAPD and ISSR

    PubMed Central

    Mahmodi, Farshid; Kadir, J. B.; Puteh, A.; Pourdad, S. S.; Nasehi, A.; Soleimani, N.

    2014-01-01

    Genetic diversity and differentiation of 50 Colletotrichum spp. isolates from legume crops studied through multigene loci, RAPD and ISSR analysis. DNA sequence comparisons by six genes (ITS, ACT, Tub2, CHS-1, GAPDH, and HIS3) verified species identity of C. truncatum, C. dematium and C. gloeosporiodes and identity C. capsici as a synonym of C. truncatum. Based on the matrix distance analysis of multigene sequences, the Colletotrichum species showed diverse degrees of intera and interspecific divergence (0.0 to 1.4%) and (15.5–19.9), respectively. A multilocus molecular phylogenetic analysis clustered Colletotrichum spp. isolates into 3 well-defined clades, representing three distinct species; C. truncatum, C. dematium and C. gloeosporioides. The ISSR and RAPD and cluster analysis exhibited a high degree of variability among different isolates and permitted the grouping of isolates of Colletotrichum spp. into three distinct clusters. Distinct populations of Colletotrichum spp. isolates were genetically in accordance with host specificity and inconsistent with geographical origins. The large population of C. truncatum showed greater amounts of genetic diversity than smaller populations of C. dematium and C. gloeosporioides species. Results of ISSR and RAPD markers were congruent, but the effective maker ratio and the number of private alleles were greater in ISSR markers. PMID:25288981

  1. The Psychology of Yoga Practitioners: A Cluster Analysis.

    PubMed

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-11-01

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall -Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  2. The Psychology of Yoga Practitioners: A Cluster Analysis.

    PubMed

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-03-30

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  3. Complete Genome Sequences of Newcastle Disease Virus Strains Circulating in Chicken Populations of Indonesia

    PubMed Central

    Xiao, Sa; Paldurai, Anandan; Nayak, Baibaswata; Samuel, Arthur; Bharoto, Eny E.; Prajitno, Teguh Y.; Collins, Peter L.

    2012-01-01

    Eight highly virulent Newcastle disease virus (NDV) strains were isolated from vaccinated commercial chickens in Indonesia during outbreaks in 2009 and 2010. The complete genome sequences of two NDV strains and the sequences of the surface protein genes (F and HN) of six other strains were determined. Phylogenetic analysis classified them into two new subgroups of genotype VII in the class II cluster that were genetically distinct from vaccine strains. This is the first report of complete genome sequences of NDV strains isolated from chickens in Indonesia. PMID:22532534

  4. Clonal structure in Ichthyobacterium seriolicida, the causative agent of bacterial haemolytic jaundice in yellowtail, Seriola quinqueradiata, inferred from molecular epidemiological analysis.

    PubMed

    Matsuyama, T; Fukuda, Y; Sakai, T; Tanimoto, N; Nakanishi, M; Nakamura, Y; Takano, T; Nakayasu, C

    2017-08-01

    Bacterial haemolytic jaundice caused by Ichthyobacterium seriolicida has been responsible for mortality in farmed yellowtail, Seriola quinqueradiata, in western Japan since the 1980s. In this study, polymorphic analysis of I. seriolicida was performed using three molecular methods: amplified fragment length polymorphism (AFLP) analysis, multilocus sequence typing (MLST) and multiple-locus variable-number tandem repeat analysis (MLVA). Twenty-eight isolates were analysed using AFLP, while 31 isolates were examined by MLST and MLVA. No polymorphisms were identified by AFLP analysis using EcoRI and MseI, or by MLST of internal fragments of eight housekeeping genes. However, MLVA revealed variation in repeat numbers of three elements, allowing separation of the isolates into 16 sequence types. The unweighted pair group method using arithmetic averages cluster analysis of the MLVA data identified four major clusters, and all isolates belonged to clonal complexes. It is likely that I. seriolicida populations share a common ancestor, which may be a recently introduced strain. © 2016 John Wiley & Sons Ltd.

  5. Complete Genome Sequence of a Highly Virulent Newcastle Disease Virus Currently Circulating in Mexico

    PubMed Central

    Xiao, Sa; Paldurai, Anandan; Nayak, Baibaswata; Mirande, Armando; Collins, Peter L.

    2013-01-01

    The complete genome sequence was determined for a highly virulent Newcastle disease virus strain from vaccinated chicken farms in Mexico during outbreaks in 2010. On the basis of phylogenetic analysis this strain was classified into genotype V in the class II cluster that was closely related to Mexican strains that appeared in 2004–2006. PMID:23409252

  6. Phylogenetic investigation of a statewide HIV-1 epidemic reveals ongoing and active transmission networks among men who have sex with men

    PubMed Central

    Chan, Philip A.; Hogan, Joseph W.; Huang, Austin; DeLong, Allison; Salemi, Marco; Mayer, Kenneth H.; Kantor, Rami

    2015-01-01

    Background Molecular epidemiologic evaluation of HIV-1 transmission networks can elucidate behavioral components of transmission that can be targets for intervention. Methods We combined phylogenetic and statistical approaches using pol sequences from patients diagnosed 2004-2011 at a large HIV center in Rhode Island, following 75% of the state’s HIV population. Phylogenetic trees were constructed using maximum likelihood and putative transmission clusters were evaluated using latent class analyses (LCA) to determine association of cluster size with underlying demographic/behavioral characteristics. A logistic growth model was used to assess intra-cluster dynamics over time and predict “active” clusters that were more likely to harbor undiagnosed infections. Results Of 1,166 HIV-1 subtype B sequences, 31% were distributed among 114 statistically-supported, monophyletic clusters (range: 2-15 sequences/cluster). Sequences from men who have sex with men (MSM) formed 52% of clusters. LCA demonstrated that sequences from recently diagnosed (2008-2011) MSM with primary HIV infection (PHI) and other sexually transmitted infections (STIs) were more likely to form larger clusters (Odds Ratio 1.62-11.25, p<0.01). MSM in clusters were more likely to have anonymous partners and meet partners at sex clubs and pornographic stores. Four large clusters with 38 sequences (100% male, 89% MSM) had a high-probability of harboring undiagnosed infections and included younger MSM with PHI and STIs. Conclusions In this first large-scale molecular epidemiologic investigation of HIV-1 transmission in New England, sexual networks among recently diagnosed MSM with PHI and concomitant STIs contributed to ongoing transmission. Characterization of transmission dynamics revealed actively growing clusters which may be targets for intervention. PMID:26258569

  7. Spectroscopic characterization of galaxy clusters in RCS-1: spectroscopic confirmation, redshift accuracy, and dynamical mass-richness relation

    NASA Astrophysics Data System (ADS)

    Gilbank, David G.; Barrientos, L. Felipe; Ellingson, Erica; Blindert, Kris; Yee, H. K. C.; Anguita, T.; Gladders, M. D.; Hall, P. B.; Hertling, G.; Infante, L.; Yan, R.; Carrasco, M.; Garcia-Vergara, Cristina; Dawson, K. S.; Lidman, C.; Morokuma, T.

    2018-05-01

    We present follow-up spectroscopic observations of galaxy clusters from the first Red-sequence Cluster Survey (RCS-1). This work focuses on two samples, a lower redshift sample of ˜30 clusters ranging in redshift from z ˜ 0.2-0.6 observed with multiobject spectroscopy (MOS) on 4-6.5-m class telescopes and a z ˜ 1 sample of ˜10 clusters 8-m class telescope observations. We examine the detection efficiency and redshift accuracy of the now widely used red-sequence technique for selecting clusters via overdensities of red-sequence galaxies. Using both these data and extended samples including previously published RCS-1 spectroscopy and spectroscopic redshifts from SDSS, we find that the red-sequence redshift using simple two-filter cluster photometric redshifts is accurate to σz ≈ 0.035(1 + z) in RCS-1. This accuracy can potentially be improved with better survey photometric calibration. For the lower redshift sample, ˜5 per cent of clusters show some (minor) contamination from secondary systems with the same red-sequence intruding into the measurement aperture of the original cluster. At z ˜ 1, the rate rises to ˜20 per cent. Approximately ten per cent of projections are expected to be serious, where the two components contribute significant numbers of their red-sequence galaxies to another cluster. Finally, we present a preliminary study of the mass-richness calibration using velocity dispersions to probe the dynamical masses of the clusters. We find a relation broadly consistent with that seen in the local universe from the WINGS sample at z ˜ 0.05.

  8. Novel snake papillomavirus does not cluster with other non-mammalian papillomaviruses.

    PubMed

    Lange, Christian E; Favrot, Claude; Ackermann, Mathias; Gull, Jessica; Vetsch, Elisabeth; Tobler, Kurt

    2011-09-12

    Papillomaviruses (PVs) are associated with the development of neoplasias and have been found in several different species, most of them in humans and other mammals. We identified, cloned and sequenced PV DNA from pigmented papilloma-like lesions of a diamond python (Morelia spilota spilota). This represents the first complete PV genome discovered in a Squamata host (MsPV1). It consists of 7048 nt and contains the characteristic open reading (ORF) frames E6, E7, E1, E2, L1 and L2. The L1 ORF sequence showed the highest percentage of sequence identities to human PV5 (57.9%) and Caribbean manatee (Trichechus manatus) PV1 (55.4%), thus, establishing a new clade. According to phylogenetic analysis, the MsPV1 genome clusters with PVs of mammalian rather than sauropsid hosts.

  9. ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites.

    PubMed

    Li, Li; Crabtree, Jonathan; Fischer, Steve; Pinney, Deborah; Stoeckert, Christian J; Sibley, L David; Roos, David S

    2004-01-01

    ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences.

  10. Novel snake papillomavirus does not cluster with other non-mammalian papillomaviruses

    PubMed Central

    2011-01-01

    Papillomaviruses (PVs) are associated with the development of neoplasias and have been found in several different species, most of them in humans and other mammals. We identified, cloned and sequenced PV DNA from pigmented papilloma-like lesions of a diamond python (Morelia spilota spilota). This represents the first complete PV genome discovered in a Squamata host (MsPV1). It consists of 7048 nt and contains the characteristic open reading (ORF) frames E6, E7, E1, E2, L1 and L2. The L1 ORF sequence showed the highest percentage of sequence identities to human PV5 (57.9%) and Caribbean manatee (Trichechus manatus) PV1 (55.4%), thus, establishing a new clade. According to phylogenetic analysis, the MsPV1 genome clusters with PVs of mammalian rather than sauropsid hosts. PMID:21910860

  11. Proposals for revival of Streptomyces setonii and reclassification of S. fimicarius as a later synonym of S. setonii and S. albovinaceus as a later synonym of S. globisporus based on combined 16S rRNA-gyrB gene analysis

    USDA-ARS?s Scientific Manuscript database

    The 16S rRNA and gyrB genes of 22 Streptomyces species belonging to the Streptomyces griseus cluster were sequenced, and their taxonomic positions were re-evaluated. For correct analysis, all of the publicly available sequences of the species were collected and compared with those obtained in this s...

  12. Comparison of Molecular Typing Methods Useful for Detecting Clusters of Campylobacter jejuni and C. coli Isolates through Routine Surveillance

    PubMed Central

    Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King

    2012-01-01

    Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562

  13. HIV-1 diversity, transmission dynamics and primary drug resistance in Angola.

    PubMed

    Bártolo, Inês; Zakovic, Suzana; Martin, Francisco; Palladino, Claudia; Carvalho, Patrícia; Camacho, Ricardo; Thamm, Sven; Clemente, Sofia; Taveira, Nuno

    2014-01-01

    To assess HIV-1 diversity, transmission dynamics and prevalence of transmitted drug resistance (TDR) in Angola, five years after ART scale-up. Population sequencing of the pol gene was performed on 139 plasma samples collected in 2009 from drug-naive HIV-1 infected individuals living in Luanda. HIV-1 subtypes were determined using phylogenetic analysis. Drug resistance mutations were identified using the Calibrated Population Resistance Tool (CPR). Transmission networks were determined using phylogenetic analysis of all Angolan sequences present in the databases. Evolutionary trends were determined by comparison with a similar survey performed in 2001. 47.1% of the viruses were pure subtypes (all except B), 47.1% were recombinants and 5.8% were untypable. The prevalence of subtype A decreased significantly from 2001 to 2009 (40.0% to 10.8%, P = 0.0019) while the prevalence of unique recombinant forms (URFs) increased > 2-fold (40.0% to 83.1%, P < 0.0001). The most frequent URFs comprised untypable sequences with subtypes H (U/H, n = 7, 10.8%), A (U/A, n = 6, 9.2%) and G (G/U, n = 4, 6.2%). Newly identified U/H recombinants formed a highly supported monophyletic cluster suggesting a local and common origin. TDR mutation K103N was found in one (0.7%) patient (1.6% in 2001). Out of the 364 sequences sampled for transmission network analysis, 130 (35.7%) were part of a transmission network. Forty eight transmission clusters were identified; the majority (56.3%) comprised sequences sampled in 2008-2010 in Luanda which is consistent with a locally fuelled epidemic. Very low genetic distance was found in 27 transmission pairs sampled in the same year, suggesting recent transmission events. Transmission of drug resistant strains was still negligible in Luanda in 2009, five years after the scale-up of ART. The dominance of small and recent transmission clusters and the emergence of new URFs are consistent with a rising HIV-1 epidemics mainly driven by heterosexual transmission.

  14. A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project

    PubMed Central

    Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R. Bridget; Waters, Laura; Tong, C. Y. William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J.

    2018-01-01

    Background & methods The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. Results The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. Conclusions The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters. PMID:29389981

  15. Hepatitis C infection among intravenous drug users attending therapy programs in Cyprus.

    PubMed

    Demetriou, Victoria L; van de Vijver, David A M C; Hezka, Johana; Kostrikis, Leondios G; Kostrikis, Leondios G

    2010-02-01

    The most high-risk population for HCV transmission worldwide today are intravenous drug users. HCV genotypes in the general population in Cyprus demonstrate a polyphyletic infection and include subtypes associated with intravenous drug users. The prevalence of HCV, HBV, and HIV infection, HCV genotypes and risk factors among intravenous drug users in Cyprus were investigated here for the first time. Blood samples and interviews were obtained from 40 consenting users in treatment centers, and were tested for HCV, HBV, and HIV antibodies. On the HCV-positive samples, viral RNA extraction, RT-PCR and sequencing were performed. Phylogenetic analysis determined subtype and any relationships with database sequences and statistical analysis determined any correlation of risk factors with HCV infection. The prevalence of HCV infection was 50%, but no HBV or HIV infections were found. Of the PCR-positive samples, eight (57%) were genotype 3a, and six (43%) were 1b. No other subtypes, recombinant strains or mixed infections were observed. The phylogenetic analysis of the injecting drug users' strains against database sequences observed no clustering, which does not allow determination of transmission route, possibly due to a limitation of sequences in the database. However, three clusters were discovered among the drug users' sequences, revealing small groups who possibly share injecting equipment. Statistical analysis showed the risk factor associated with HCV infection is drug use duration. Overall, the polyphyletic nature of HCV infection in Cyprus is confirmed, but the transmission route remains unknown. These findings highlight the need for harm-reduction strategies to reduce HCV transmission. (c) 2009 Wiley-Liss, Inc.

  16. Transcription of two adjacent carbohydrate utilization gene clusters in Bifidobacterium breve UCC2003 is controlled by LacI- and repressor open reading frame kinase (ROK)-type regulators.

    PubMed

    O'Connell, Kerry Joan; Motherway, Mary O'Connell; Liedtke, Andrea; Fitzgerald, Gerald F; Paul Ross, R; Stanton, Catherine; Zomer, Aldert; van Sinderen, Douwe

    2014-06-01

    Members of the genus Bifidobacterium are commonly found in the gastrointestinal tracts of mammals, including humans, where their growth is presumed to be dependent on various diet- and/or host-derived carbohydrates. To understand transcriptional control of bifidobacterial carbohydrate metabolism, we investigated two genetic carbohydrate utilization clusters dedicated to the metabolism of raffinose-type sugars and melezitose. Transcriptomic and gene inactivation approaches revealed that the raffinose utilization system is positively regulated by an activator protein, designated RafR. The gene cluster associated with melezitose metabolism was shown to be subject to direct negative control by a LacI-type transcriptional regulator, designated MelR1, in addition to apparent indirect negative control by means of a second LacI-type regulator, MelR2. In silico analysis, DNA-protein interaction, and primer extension studies revealed the MelR1 and MelR2 operator sequences, each of which is positioned just upstream of or overlapping the correspondingly regulated promoter sequences. Similar analyses identified the RafR binding operator sequence located upstream of the rafB promoter. This study indicates that transcriptional control of gene clusters involved in carbohydrate metabolism in bifidobacteria is subject to conserved regulatory systems, representing either positive or negative control.

  17. Subgenotype analysis of Cryptosporidium isolates from humans, cattle, and zoo ruminants in Portugal.

    PubMed

    Alves, Margarida; Xiao, Lihua; Sulaiman, Irshad; Lal, Altaf A; Matos, Olga; Antunes, Francisco

    2003-06-01

    Cryptosporidium parvum and Cryptosporidium hominis isolates from human immunodeficiency virus-infected patients, cattle, and wild ruminants were characterized by PCR and DNA sequencing analysis of the 60-kDa glycoprotein gene. Seven alleles were identified, three corresponding to C. hominis and four corresponding to C. parvum. One new allele was found (IId), and one (IIb) had only been found in Portugal. Isolates from cattle and wild ruminants clustered in two alleles. In contrast, human isolates clustered in seven alleles, showing extensive allelic diversity.

  18. Biogeography of Burkholderia pseudomallei in the Torres Strait Islands of Northern Australia

    PubMed Central

    Baker, Anthony; Mayo, Mark; Owens, Leigh; Burgess, Graham; Norton, Robert; McBride, William John Hannan; Currie, Bart J.

    2013-01-01

    It has been hypothesized that biogeographical boundaries are a feature of Burkholderia pseudomallei ecology, and they impact the epidemiology of melioidosis on a global scale. This study examined the relatedness of B. pseudomallei sourced from islands in the Torres Strait of Northern Australia to determine if the geography of isolated island communities is a determinant of the organisms' dispersal. Environmental sampling on Badu Island in the Near Western Island cluster recovered a single clone. An additional 32 clinical isolates from the region were sourced. Isolates were characterized using multilocus sequence typing and a multiplex PCR targeting the flagellum gene cluster. Gene cluster analysis determined that 69% of the isolates from the region encoded the ancestral Burkholderia thailandensis-like flagellum and chemotaxis gene cluster, a proportion significantly lower than that reported from mainland Australia and consistent with observations of isolates from southern Papua New Guinea. A goodness-of-fit test indicated that there was geographic localization of sequence types throughout the archipelago, with the exception of Thursday Island, the economic and cultural hub of the region. Sequence types common to mainland Australia and Papua New Guinea were identified. These findings demonstrate for the first time an environmental reservoir for B. pseudomallei in the Torres Strait, and multilocus sequence typing suggests that the organism is not randomly distributed throughout this region and that seawater may provide a barrier to dispersal of the organism. Moreover, these findings support an anthropogenic dispersal hypothesis for the spread of B. pseudomallei throughout this region. PMID:23698533

  19. Analysis of genetic diversity and population structure of oil palm (Elaeis guineensis) from China and Malaysia based on species-specific simple sequence repeat markers.

    PubMed

    Zhou, L X; Xiao, Y; Xia, W; Yang, Y D

    2015-12-08

    Genetic diversity and patterns of population structure of the 94 oil palm lines were investigated using species-specific simple sequence repeat (SSR) markers. We designed primers for 63 SSR loci based on their flanking sequences and conducted amplification in 94 oil palm DNA samples. The amplification result showed that a relatively high level of genetic diversity was observed between oil palm individuals according a set of 21 polymorphic microsatellite loci. The observed heterozygosity (Ho) was 0.3683 and 0.4035, with an average of 0.3859. The Ho value was a reliable determinant of the discriminatory power of the SSR primer combinations. The principal component analysis and unweighted pair-group method with arithmetic averaging cluster analysis showed the 94 oil palm lines were grouped into one cluster. These results demonstrated that the oil palm in Hainan Province of China and the germplasm introduced from Malaysia may be from the same source. The SSR protocol was effective and reliable for assessing the genetic diversity of oil palm. Knowledge of the genetic diversity and population structure will be crucial for establishing appropriate management stocks for this species.

  20. Complete genome sequence of a coxsackievirus B3 recombinant isolated from an aseptic meningitis outbreak in eastern China.

    PubMed

    Zhang, Wenqiang; Lin, Xiaojuan; Jiang, Ping; Tao, Zexin; Liu, Xiaolin; Ji, Feng; Wang, Tongzhan; Wang, Suting; Lv, Hui; Xu, Aiqiang; Wang, Haiyan

    2016-08-01

    Coxsackievirus B3 (CV-B3) has frequently been associated with aseptic meningitis outbreaks in China. To identify sequence motifs related to aseptic meningitis and to construct an infectious clone, the genome sequence of 08TC170, a representative strain isolated from cerebrospinal fluid (CSF) samples from an outbreak in Shandong in 2008, was determined, and the coding regions for P1-P3 and VP1 were aligned. The first 21 and last 20 residues were "TTAAAACAGCCTGTGGGTTGT" and "ATTCTCCGCATTCGGTGCGG", respectively. The whole genome consisted of 7401 nucleotides, sharing 80.8 % identity with the prototype strain Nancy and low sequence similarity with members of clusters A-C. In contrast, 08TC170 showed high sequence similarity to members of cluster D. An especially high level of sequence identity (≥97.7 %) was found within a branch constituted by 08TC170 and four Chinese strains that clustered together in all of the P1-P3 phylogenic trees. In addition, 08TC170 also possessed a close relationship to the Hong Kong strain 26362/08 in VP1. Similarity plot analysis showed that 08TC170 was most similar to the Chinese CV-B3 strain SSM in P1 and the partial P2 coding region but to the CV-B5 or E-6 strain in 2C and following regions. A T277A mutation was found in 08TC170 and other strains isolated in 2008-2010, but not in strains isolated before 2008, which had high sequence similarity and formed the cluster A277. The results suggested that 08TC170 was the product of both intertypic recombination and point mutation, whose effects on viral neurovirulence will be investigated in a further study. The high homology between 08TC170 and other strains revealed their co-circulation in mainland China and Hong Kong and indicates that further surveillance is needed.

  1. Ortholog-based screening and identification of genes related to intracellular survival.

    PubMed

    Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

    2018-04-20

    Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.

  2. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

    PubMed Central

    2012-01-01

    Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672

  3. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.

    PubMed

    Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl

    2012-07-13

    Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.

  4. EVIDENCE FOR THE UNIVERSALITY OF PROPERTIES OF RED-SEQUENCE GALAXIES IN X-RAY- AND RED-SEQUENCE-SELECTED CLUSTERS AT z ∼ 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foltz, R.; Wilson, G.; DeGroot, A.

    We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable tomore » detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.« less

  5. A singular value decomposition approach for improved taxonomic classification of biological sequences

    PubMed Central

    2011-01-01

    Background Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. Results We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. Conclusions By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. PMID:22369633

  6. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes

    PubMed Central

    Li, Li; Stoeckert, Christian J.; Roos, David S.

    2003-01-01

    The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome. PMID:12952885

  7. Phylogenetic Evidence for Lateral Gene Transfer in the Intestine of Marine Iguanas

    PubMed Central

    Nelson, David M.; Cann, Isaac K. O.; Altermann, Eric; Mackie, Roderick I.

    2010-01-01

    Background Lateral gene transfer (LGT) appears to promote genotypic and phenotypic variation in microbial communities in a range of environments, including the mammalian intestine. However, the extent and mechanisms of LGT in intestinal microbial communities of non-mammalian hosts remains poorly understood. Methodology/Principal Findings We sequenced two fosmid inserts obtained from a genomic DNA library derived from an agar-degrading enrichment culture of marine iguana fecal material. The inserts harbored 16S rRNA genes that place the organism from which they originated within Clostridium cluster IV, a well documented group that habitats the mammalian intestinal tract. However, sequence analysis indicates that 52% of the protein-coding genes on the fosmids have top BLASTX hits to bacterial species that are not members of Clostridium cluster IV, and phylogenetic analysis suggests that at least 10 of 44 coding genes on the fosmids may have been transferred from Clostridium cluster XIVa to cluster IV. The fosmids encoded four transposase-encoding genes and an integrase-encoding gene, suggesting their involvement in LGT. In addition, several coding genes likely involved in sugar transport were probably acquired through LGT. Conclusion Our phylogenetic evidence suggests that LGT may be common among phylogenetically distinct members of the phylum Firmicutes inhabiting the intestinal tract of marine iguanas. PMID:20520734

  8. An Unexpected Detection of Bifurcated Blue Straggler Sequences in the Young Globular Cluster NGC 2173

    NASA Astrophysics Data System (ADS)

    Li, Chengyuan; Deng, Licai; de Grijs, Richard; Jiang, Dengkai; Xin, Yu

    2018-03-01

    The bifurcated patterns in the color–magnitude diagrams of blue straggler stars (BSSs) have attracted significant attention. This type of special (but rare) pattern of two distinct blue straggler sequences is commonly interpreted as evidence that cluster core-collapse-driven stellar collisions are an efficient formation mechanism. Here, we report the detection of a bifurcated blue straggler distribution in a young Large Magellanic Cloud cluster, NGC 2173. Because of the cluster’s low central stellar number density and its young age, dynamical analysis shows that stellar collisions alone cannot explain the observed BSSs. Therefore, binary evolution is instead the most viable explanation of the origin of these BSSs. However, the reason why binary evolution would render the color–magnitude distribution of BSSs bifurcated remains unclear. C. Li, L. Deng, and R. de Grijs jointly designed this project.

  9. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  10. Identification of the Regulator Gene Responsible for the Acetone-Responsive Expression of the Binuclear Iron Monooxygenase Gene Cluster in Mycobacteria ▿

    PubMed Central

    Furuya, Toshiki; Hirose, Satomi; Semba, Hisashi; Kino, Kuniki

    2011-01-01

    The mimABCD gene cluster encodes the binuclear iron monooxygenase that oxidizes propane and phenol in Mycobacterium smegmatis strain MC2 155 and Mycobacterium goodii strain 12523. Interestingly, expression of the mimABCD gene cluster is induced by acetone. In this study, we investigated the regulator gene responsible for this acetone-responsive expression. In the genome sequence of M. smegmatis strain MC2 155, the mimABCD gene cluster is preceded by a gene designated mimR, which is divergently transcribed. Sequence analysis revealed that MimR exhibits amino acid similarity with the NtrC family of transcriptional activators, including AcxR and AcoR, which are involved in acetone and acetoin metabolism, respectively. Unexpectedly, many homologs of the mimR gene were also found in the sequenced genomes of actinomycetes. A plasmid carrying a transcriptional fusion of the intergenic region between the mimR and mimA genes with a promoterless green fluorescent protein (GFP) gene was constructed and introduced into M. smegmatis strain MC2 155. Using a GFP reporter system, we confirmed by deletion and complementation analyses that the mimR gene product is the positive regulator of the mimABCD gene cluster expression that is responsive to acetone. M. goodii strain 12523 also utilized the same regulatory system as M. smegmatis strain MC2 155. Although transcriptional activators of the NtrC family generally control transcription using the σ54 factor, a gene encoding the σ54 factor was absent from the genome sequence of M. smegmatis strain MC2 155. These results suggest the presence of a novel regulatory system in actinomycetes, including mycobacteria. PMID:21856847

  11. Diversity of diazotrophic gut inhabitants of pikas (Ochotonidae) revealed by PCR-DGGE analysis.

    PubMed

    Kizilova, A K; Kravchenko, I K

    2014-01-01

    Diazotrophic gut symbionts are considered to act as nitrogen providers for their hosts, as was shown for various termite species. Although the diet of lagomorphs, like pikas or rabbits, is very poor in nitrogen and energy, their fecal matter contains 30-40% of protein. Since our hypothesis was that pikas maintained a diazotrophic consortium in their gastrointestinal tract, we conducted the first investigation of microbial diversity in pika guts. We obtained gut samples from animals of several Ochotona species, O. hyperborea (Northern pika), O. mantchurica (Manchurian pika), and O. dauurica (Daurian pika), in order to retrieve and compare the nitrogen-fixing communities of different pika species. The age and gender of the animals were taken into consideration. We amplified 320-bp long fragments of the nifH gene using the DNA extracted directly from the colon and cecum samples of pika's gut, resolved them by DGGE, and performed phylogenetic reconstruction of 51 sequences obtained from excised bands. No significant difference was detected between the nitrogen-fixing gut inhabitants of different pika species. NifH sequences fell into two clusters. The first cluster contained the sequences affiliated with NifH Cluster I (Zehr et al., 2003) with similarity to Sphingomonas sp., Bradyrhizobium sp., and various uncultured bacteria from soil and rhizosphere. Sequences from the second group were related to Treponema sp., Fibrobacter succinogenes, and uncultured clones from the guts of various termites and belonged to NifH Cluster III. We suggest that diazotrophic organisms from the second cluster are genuine endosymbionts of pikas and provide nitrogen for further synthesis processes thus allowing these animals not to be short of protein.

  12. Protein sequences clustering of herpes virus by using Tribe Markov clustering (Tribe-MCL)

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Siswantining, T.; Febriyani, N. L.; Novitasari, I. D.; Cahyaningrum, R. D.

    2017-07-01

    The herpes virus can be found anywhere and one of the important characteristics is its ability to cause acute and chronic infection at certain times so as a result of the infection allows severe complications occurred. The herpes virus is composed of DNA containing protein and wrapped by glycoproteins. In this work, the Herpes viruses family is classified and analyzed by clustering their protein-sequence using Tribe Markov Clustering (Tribe-MCL) algorithm. Tribe-MCL is an efficient clustering method based on the theory of Markov chains, to classify protein families from protein sequences using pre-computed sequence similarity information. We implement the Tribe-MCL algorithm using an open source program of R. We select 24 protein sequences of Herpes virus obtained from NCBI database. The dataset consists of three types of glycoprotein B, F, and H. Each type has eight herpes virus that infected humans. Based on our simulation using different inflation factor r=1.5, 2, 3 we find a various number of the clusters results. The greater the inflation factor the greater the number of their clusters. Each protein will grouped together in the same type of protein.

  13. Molecular phylogeny of grey mullets (Teleostei: Mugilidae) in Greece: evidence from sequence analysis of mtDNA segments.

    PubMed

    Papasotiropoulos, Vasilis; Klossa-Kilia, Elena; Alahiotis, Stamatis N; Kilias, George

    2007-08-01

    Mitochondrial DNA sequence analysis has been used to explore genetic differentiation and phylogenetic relationships among five species of the Mugilidae family, Mugil cephalus, Chelon labrosus, Liza aurata, Liza ramada, and Liza saliens. DNA was isolated from samples originating from the Messolongi Lagoon in Greece. Three mtDNA segments (12s rRNA, 16s rRNA, and CO I) were PCR amplified and sequenced. Sequencing analysis revealed that the greatest genetic differentiation was observed between M. cephalus and all the other species studied, while C. labrosus and L. aurata were the closest taxa. Dendrograms obtained by the neighbor-joining method and Bayesian inference analysis exhibited the same topology. According to this topology, M. cephalus is the most distinct species and the remaining taxa are clustered together, with C. labrosus and L. aurata forming a single group. The latter result brings into question the monophyletic origin of the genus Liza.

  14. Draft genome sequence of marine-derived Streptomyces sp. TP-A0598, a producer of anti-MRSA antibiotic lydicamycins.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro

    2015-01-01

    Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.

  15. Phylogenetic Analysis of the Latvian HIV-1 Epidemic

    PubMed Central

    Balode, Dace; Skar, Helena; Mild, Mattias; Kolupajeva, Tatjana; Ferdats, Andris; Rozentale, Baiba; Leitner, Thomas

    2012-01-01

    Abstract The Latvian HIV-1 outbreak among intravenous drug users (IDUs) in 1997–1998 involved subtype A1. To obtain a more complete picture of the Latvian HIV-1 epidemic, 315 HIV-1-infected patients diagnosed in 1990–2005 representing different transmission groups and geographic regions were phylogenetically characterized using env V3 and gag p17 sequences. Subtypes A1 and B infections were found in 76% and 22% of the patients, respectively. The subtype A1 sequences formed one large cluster, which also included sequences from other parts of the former Soviet Union (FSU), whereas most subtype B sequences formed three distinct clusters. We estimated that subtype A1 was introduced from FSU around 1997 and initially spread explosively among IDUs in Riga. A recent increase of heterosexually infected persons did not form a separate subepidemic, but had multiple interactions with the IDU epidemic. Subtype B was introduced before the collapse of the Soviet Union and primarily has spread among men who have sex with men. PMID:22049908

  16. Star Formation in the Orion Nebula Cluster

    NASA Astrophysics Data System (ADS)

    Palla, Francesco; Stahler, Steven W.

    1999-11-01

    We study the record of star formation activity within the dense cluster associated with the Orion Nebula. The bolometric luminosity function of 900 visible members is well matched by a simplified theoretical model for cluster formation. This model assumes that stars are produced at a constant rate and distributed according to the field-star initial mass function. Our best-fit age for the system, within this framework, is 2×106 yr. To undertake a more detailed analysis, we present a new set of theoretical pre-main-sequence tracks. These cover all masses from 0.1 to 6.0 Msolar, and start from a realistic stellar birthline. The tracks end along a zero-age main-sequence that is in excellent agreement with the empirical one. As a further aid to cluster studies, we offer an heuristic procedure for the correction of pre-main-sequence luminosities and ages to account for the effects of unresolved binary companions. The Orion Nebula stars fall neatly between our birthline and zero-age main-sequence in the H-R diagram. All those more massive than about 8 Msolar lie close to the main sequence, as also predicted by theory. After accounting for the finite sensitivity of the underlying observations, we confirm that the population between 0.4 and 6.0 Msolar roughly follows a standard initial mass function. We see no evidence for a turnover at lower masses. We next use our tracks to compile stellar ages, also between 0.4 and 6.0 Msolar. Our age histogram reveals that star formation began at a low level some 107 yr ago and has gradually accelerated to the present epoch. The period of most active formation is indeed confined to a few×106 yr, and has recently ended with gas dispersal from the Trapezium. We argue that the acceleration in stellar births, which extends over a wide range in mass, reflects the gravitational contraction of the parent cloud spawning this cluster.

  17. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

    PubMed

    Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig

    2007-03-01

    Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

  18. Rising prevalence of non-B HIV-1 subtypes in North Carolina and evidence for local onward transmission.

    PubMed

    Dennis, Ann M; Hué, Stephane; Learner, Emily; Sebastian, Joseph; Miller, William C; Eron, Joseph J

    2017-01-01

    HIV-1 diversity is increasing in North American and European cohorts which may have public health implications. However, little is known about non-B subtype diversity in the southern United States, despite the region being the epicenter of the nation's epidemic. We characterized HIV-1 diversity and transmission clusters to identify the extent to which non-B strains are transmitted locally. We conducted cross-sectional analyses of HIV-1 partial pol sequences collected from 1997 to 2014 from adults accessing routine clinical care in North Carolina (NC). Subtypes were evaluated using COMET and phylogenetic analysis. Putative transmission clusters were identified using maximum-likelihood trees. Clusters involving non-B strains were confirmed and their dates of origin were estimated using Bayesian phylogenetics. Data were combined with demographic information collected at the time of sample collection and country of origin for a subset of patients. Among 24,972 sequences from 15,246 persons, the non-B subtype prevalence increased from 0% to 3.46% over the study period. Of 325 persons with non-B subtypes, diversity was high with over 15 pure subtypes and recombinants; subtype C (28.9%) and CRF02_AG (24.0%) were most common. While identification of transmission clusters was lower for persons with non-B versus B subtypes, several local transmission clusters (≥3 persons) involving non-B subtypes were identified and all were presumably due to heterosexual transmission. Prevalence of non-B subtype diversity remains low in NC but a statistically significant rise was identified over time which likely reflects multiple importation. However, the combined phylogenetic clustering analysis reveals evidence for local onward transmission. Detection of these non-B clusters suggests heterosexual transmission and may guide diagnostic and prevention interventions.

  19. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DOE PAGES

    Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.; ...

    2018-01-09

    The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less

  20. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.

    The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less

  1. Strategies for high-altitude adaptation revealed from high-quality draft genome of non-violacein producing Janthinobacterium lividum ERGS5:01.

    PubMed

    Kumar, Rakshak; Acharya, Vishal; Singh, Dharam; Kumar, Sanjay

    2018-01-01

    A light pink coloured bacterial strain ERGS5:01 isolated from glacial stream water of Sikkim Himalaya was affiliated to Janthinobacterium lividum based on 16S rRNA gene sequence identity and phylogenetic clustering. Whole genome sequencing was performed for the strain to confirm its taxonomy as it lacked the typical violet pigmentation of the genus and also to decipher its survival strategy at the aquatic ecosystem of high elevation. The PacBio RSII sequencing generated genome of 5,168,928 bp with 4575 protein-coding genes and 118 RNA genes. Whole genome-based multilocus sequence analysis clustering, in silico DDH similarity value of 95.1% and, the ANI value of 99.25% established the identity of the strain ERGS5:01 (MCC 2953) as a non-violacein producing J. lividum . The genome comparisons across genus Janthinobacterium revealed an open pan-genome with the scope of the addition of new orthologous cluster to complete the genomic inventory. The genomic insight provided the genetic basis of freezing and frequent freeze-thaw cycle tolerance and, for industrially important enzymes. Extended insight into the genome provided clues of crucial genes associated with adaptation in the harsh aquatic ecosystem of high altitude.

  2. Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation

    PubMed Central

    Martin, Natalia; Patel, Satyakam; Segre, Julia A.

    2004-01-01

    Mammalian epidermis provides a permeability barrier between an organism and its environment. Under homeostatic conditions, epidermal cells produce structural proteins, which are cross-linked in an orderly fashion to form a cornified envelope (CE). However, under genetic or environmental stress, specific genes are induced to rapidly build a temporary barrier. Small proline-rich (SPRR) proteins are the primary constituents of the CE. Under stress the entire family of 14 Sprr genes is upregulated. The Sprr genes are clustered within the larger epidermal differentiation complex on mouse chromosome 3, human chromosome 1q21. The clustering of the Sprr genes and their upregulation under stress suggest that these genes may be coordinately regulated. To identify enhancer elements that regulate this stress response activation of the Sprr locus, we utilized bioinformatic tools and classical biochemical dissection. Long-range comparative sequence analysis identified conserved noncoding sequences (CNSs). Clusters of epidermal-specific DNaseI-hypersensitive sites (HSs) mapped to specific CNSs. Increased prevalence of these HSs in barrier-deficient epidermis provides in vivo evidence of the regulation of the Sprr locus by these conserved sequences. Individual components of these HSs were cloned, and one was shown to have strong enhancer activity specific to conditions when the Sprr genes are coordinately upregulated. PMID:15574822

  3. Camps 2.0: exploring the sequence and structure space of prokaryotic, eukaryotic, and viral membrane proteins.

    PubMed

    Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij

    2012-03-01

    Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.

  4. Molecular phylogeny of Coxsackievirus A16 in Shenzhen, China, from 2005 to 2009.

    PubMed

    Zong, Wenping; He, Yaqing; Yu, Shouyi; Yang, Hong; Xian, Huixia; Liao, Yuxue; Hu, Guifang

    2011-04-01

    Phylogenetic analysis of a Coxsackievirus A16 (CA16) sequence from Shenzhen, China, and other Chinese and international CA16 sequences revealed a pattern of endemic cocirculation of strains of clusters B2a and B2b within subtype B2 viruses. Amino acid evolution and nucleotide variation in the VP1 region were slight for 5 years.

  5. Isolation and molecular characterization of a urease-negative Actinobacillus pleuropneumoniae mutant.

    PubMed

    Ito, Hiroya; Takahashi, Sayaka; Asai, Tetsuo; Tamura, Yutaka; Yamamoto, Koshi

    2018-01-01

    An atypical urease-negative mutant of Actinobacillus pleuropneumoniae serovar 2 was isolated in Japan. Nucleotide sequence analysis of the urease gene cluster revealed that the insertion of a short DNA sequence into the cbiM gene was responsible for the urease-negative activity of the mutant. Veterinary diagnostic laboratories should be watchful for the presence of aberrant urease-negative A. pleuropneumoniae isolates.

  6. Detection of emerging rotavirus G12P[8] in Sonora, México.

    PubMed

    González-Ochoa, G; J, G de; Calleja-García, P M; Rosas-Rodríguez, J A; Virgen-Ortíz, A; Tamez-Guerra, P

    2016-06-01

    Rotavirus is the most common cause of gastroenteritis in children up to five years of age worldwide. The aim of the present study was to analyze the genotypes of rotavirus strains isolated from children with gastroenteritis, after the introduction of the rotavirus vaccine in México. Rotavirus was detected in 14/100 (14%) fecal samples from children with gastroenteritis, using a commercial test kit. The viral genome was purified from these samples and used as a template in RT-PCR amplification of the VP4 and VP7 genes, followed by gene cloning and sequencing. Among the rotavirus strains, 4/14 (28.5%) were characterized as G12P[8], 2/14 (14.3%), as G12P (not typed), and 3/14 (21.42%) as G (not typed) P[8]. Phylogenetic analysis of the VP7 gene showed that G12 genotypes clustered in lineage III. Phylogenetic analysis revealed that VP4 genotype P[8] sequences clustered in lineage V, whereas other P[8] sequences previously reported in Mexico (2005-2008) clustered in different lineages. Rotavirus genotype G12 is currently recognized as a globally emerging rotavirus. To our knowledge, this is the first report of this emerging rotavirus strain G12P[8] in México. Ongoing surveillance is recommended to monitor the distribution of rotavirus genotypes and to continually reassess the suitability of currently available rotavirus vaccines.

  7. Population-genetic analysis of HvABCG31 promoter sequence in wild barley (Hordeum vulgare ssp. spontaneum)

    PubMed Central

    2012-01-01

    Background The cuticle is an important adaptive structure whose origin played a crucial role in the transition of plants from aqueous to terrestrial conditions. HvABCG31/Eibi1 is an ABCG transporter gene, involved in cuticle formation that was recently identified in wild barley (Hordeum vulgare ssp. spontaneum). To study the genetic variation of HvABCG31 in different habitats, its 2 kb promoter region was sequenced from 112 wild barley accessions collected from five natural populations from southern and northern Israel. The sites included three mesic and two xeric habitats, and differed in annual rainfall, soil type, and soil water capacity. Results Phylogenetic analysis of the aligned HvABCG31 promoter sequences clustered the majority of accessions (69 out of 71) from the three northern mesic populations into one cluster, while all 21 accessions from the Dead Sea area, a xeric southern population, and two isolated accessions (one from a xeric population at Mitzpe Ramon and one from the xeric ‘African Slope’ of “Evolution Canyon”) formed the second cluster. The southern arid populations included six haplotypes, but they differed from the consensus sequence at a large number of positions, while the northern mesic populations included 15 haplotypes that were, on average, more similar to the consensus sequence. Most of the haplotypes (20 of 22) were unique to a population. Interestingly, higher genetic variation occurred within populations (54.2%) than among populations (45.8%). Analysis of the promoter region detected a large number of transcription factor binding sites: 121–128 and 121–134 sites in the two southern arid populations, and 123–128,125–128, and 123–125 sites in the three northern mesic populations. Three types of TFBSs were significantly enriched: those related to GA (gibberellin), Dof (DNA binding with one finger), and light. Conclusions Drought stress and adaptive natural selection may have been important determinants in the observed sequence variation of HvABCG31 promoter. Abiotic stresses may be involved in the HvABCG31 gene transcription regulations, generating more protective cuticles in plants under stresses. PMID:23006777

  8. Genetic diversity among air yam (Dioscorea bulbifera) varieties based on single sequence repeat markers.

    PubMed

    Silva, D M; Siqueira, M V B M; Carrasco, N F; Mantello, C C; Nascimento, W F; Veasey, E A

    2016-05-23

    Dioscorea is the largest genus in the Dioscoreaceae family, and includes a number of economically important species including the air yam, D. bulbifera L. This study aimed to develop new single sequence repeat primers and characterize the genetic diversity of local varieties that originated in several municipalities of Brazil. We developed an enriched genomic library for D. bulbifera resulting in seven primers, six of which were polymorphic, and added four polymorphic loci developed for other Dioscorea species. This resulted in 10 polymorphic primers to evaluate 42 air yam accessions. Thirty-three alleles (bands) were found, with an average of 3.3 alleles per locus. The discrimination power ranged from 0.113 to 0.834, with an average of 0.595. Both principal coordinate and cluster analyses (using the Jaccard Index) failed to clearly separate the accessions according to their origins. However, the 13 accessions from Conceição dos Ouros, Minas Gerais State were clustered above zero on the principal coordinate 2 axis, and were also clustered into one subgroup in the cluster analysis. Accessions from Ubatuba, São Paulo State were clustered below zero on the same principal coordinate 2 axis, except for one accession, although they were scattered in several subgroups in the cluster analysis. Therefore, we found little spatial structure in the accessions, although those from Conceição dos Ouros and Ubatuba exhibited some spatial structure, and that there is a considerable level of genetic diversity in D. bulbifera maintained by traditional farmers in Brazil.

  9. Effects of weather conditions on emergency ambulance calls for acute coronary syndromes

    NASA Astrophysics Data System (ADS)

    Vencloviene, Jone; Babarskiene, Ruta; Dobozinskas, Paulius; Siurkaite, Viktorija

    2015-08-01

    The aim of this study was to evaluate the relationship between weather conditions and daily emergency ambulance calls for acute coronary syndromes (ACS). The study included data on 3631 patients who called the ambulance for chest pain and were admitted to the department of cardiology as patients with ACS. We investigated the effect of daily air temperature ( T), barometric pressure (BP), relative humidity, and wind speed (WS) to detect the risk areas for low and high daily volume (DV) of emergency calls. We used the classification and regression tree method as well as cluster analysis. The clusters were created by applying the k-means cluster algorithm using the standardized daily weather variables. The analysis was performed separately during cold (October-April) and warm (May-September) seasons. During the cold period, the greatest DV was observed on days of low T during the 3-day sequence, on cold and windy days, and on days of low BP and high WS during the 3-day sequence; low DV was associated with high BP and decreased WS on the previous day. During June-September, a lower DV was associated with low BP, windless days, and high BP and low WS during the 3-day sequence. During the warm period, the greatest DV was associated with increased BP and changing WS during the 3-day sequence. These results suggest that daily T, BP, and WS on the day of the ambulance call and on the two previous days may be prognostic variables for the risk of ACS.

  10. Genetic characterization of a new astrovirus detected in dogs suffering from diarrhoea.

    PubMed

    Toffan, Anna; Jonassen, Christine Monceyron; De Battisti, Cristian; Schiavon, Eliana; Kofstad, Tone; Capua, Ilaria; Cattoli, Giovanni

    2009-10-20

    Astroviruses have been described in several animals species frequently associated with diarrhoea, especially in young animals. In dogs, astrovirus-like particles have been observed sporadically and very little is known about their epidemiology and characteristics. In this paper, we describe the detection of astrovirus-like particles in symptomatic puppies. Furthermore, for the first time in this species, the presumptive identification made by electron microscopy was confirmed by genetic analysis of the viral RNA conducted directly on the clinical specimens. Genetic sequences of ORF2 (2443 nt), encoding for the capsid protein, and partial sequence of ORF1b (346 nt), encoding for the viral polymerase, identified the viruses as member of the family Astroviridae. The phylogenetic analysis clearly clustered canine astroviruses in the genus Mamastrovirus. Relative closest similarities were revealed with a cluster comprising human, porcine and feline astroviruses, based on the ORF2 sequences available. Based on the species definition for astroviruses and on the data obtained in this study, we suggest a new species of astrovirus - canine astrovirus, CaAstV - to be included in the genus Mamastrovirus.

  11. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Characterization of Foodborne Outbreaks of Salmonella enterica Serovar Enteritidis with Whole-Genome Sequencing Single Nucleotide Polymorphism-Based Analysis for Surveillance and Outbreak Detection.

    PubMed

    Taylor, Angela J; Lappi, Victoria; Wolfgang, William J; Lapierre, Pascal; Palumbo, Michael J; Medus, Carlota; Boxrud, David

    2015-10-01

    Salmonella enterica serovar Enteritidis is a significant cause of gastrointestinal illness in the United States; however, current molecular subtyping methods lack resolution for this highly clonal serovar. Advances in next-generation sequencing technologies have made it possible to examine whole-genome sequencing (WGS) as a potential molecular subtyping tool for outbreak detection and source trace back. Here, we conducted a retrospective analysis of S. Enteritidis isolates from seven epidemiologically confirmed foodborne outbreaks and sporadic isolates (not epidemiologically linked) to determine the utility of WGS to identify outbreaks. A collection of 55 epidemiologically characterized clinical and environmental S. Enteritidis isolates were sequenced. Single nucleotide polymorphism (SNP)-based cluster analysis of the S. Enteritidis genomes revealed well supported clades, with less than four-SNP pairwise diversity, that were concordant with epidemiologically defined outbreaks. Sporadic isolates were an average of 42.5 SNPs distant from the outbreak clusters. Isolates collected from the same patient over several weeks differed by only two SNPs. Our findings show that WGS provided greater resolution between outbreak, sporadic, and suspect isolates than the current gold standard subtyping method, pulsed-field gel electrophoresis (PFGE). Furthermore, results could be obtained in a time frame suitable for surveillance activities, supporting the use of WGS as an outbreak detection and characterization method for S. Enteritidis. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  13. A RAD-based phylogenetics for Orestias fishes from Lake Titicaca.

    PubMed

    Takahashi, Tetsumi; Moreno, Edmundo

    2015-12-01

    The fish genus Orestias is endemic to the Andes highlands, and Lake Titicaca is the centre of the species diversity of the genus. Previous phylogenetic studies based on a single locus of mitochondrial and nuclear DNA strongly support the monophyly of a group composed of many of species endemic to the Lake Titicaca basin (the Lake Titicaca radiation), but the relationships among the species in the radiation remain unclear. Recently, restriction site-associated DNA (RAD) sequencing, which can produce a vast number of short sequences from various loci of nuclear DNA, has emerged as a useful way to resolve complex phylogenetic problems. To propose a new phylogenetic hypothesis of Orestias fishes of the Lake Titicaca radiation, we conducted a cluster analysis based on morphological similarities among fish samples and a molecular phylogenetic analysis based on RAD sequencing. From a morphological cluster analysis, we recognised four species groups in the radiation, and three of the four groups were resolved as monophyletic groups in maximum-likelihood trees based on RAD sequencing data. The other morphology-based group was not resolved as a monophyletic group in molecular phylogenies, and some members of the group were diverged from its sister group close to the root of the Lake Titicaca radiation. The evolution of these fishes is discussed from the phylogenetic relationships. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Phylogenetic structure of Leishmania tropica in the new endemic focus Birjand in East Iran in comparison to other Iranian endemic regions.

    PubMed

    Karamian, Mehdi; Kuhls, Katrin; Hemmati, Mina; Ghatee, Mohammad Amin

    2016-06-01

    Iran has been identified being among the countries with the highest number of cutaneous leishmaniasis (CL) cases. South Khorasan province in East Iran is an emerging focus of CL. Species identification of sixty clinical samples by ITS1 PCR-RFLP presented evidence for the dominance of Leishmania tropica (90%) in this region. Analysis of the ITS1 sequence of 19 L. tropica isolates revealed seven closely related sequence types. In addition, ITS1 sequences available in GenBank from other Iranian regions were compiled for comparison with the studied isolates. Iranian L. tropica was distributed in two main clusters. All East Iranian sequence types were grouped with strains from foci from Southeast and Central regions in cluster A, showing highly similar sequences. The highest similarity was observed between most L. tropica from East and all isolates from Southeast regions and from Savojbolagh county in Central Iran. Southwest L. tropica was shown to be paraphyletic as the isolates were distributed in both clusters A and B. All Northeastern L. tropica were part of cluster B, however they showed significant heterogeneity and were distributed in different subclusters. Distribution of L. tropica populations was to some extent congruent with genetic lineages of Phlebotomus sergenti in Iran and may be an evidence for parasite-vector co-evolution. Southeast-East L. tropica was also similar to strains from Herat province in Afghanistan at the East border of Iran. This is the first comprehensive study on population structure of L. tropica in Iran that provides a guideline for appropriate sampling for further molecular based epidemiological studies. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. The molecular epidemiology of HIV-1 in the Comunidad Valenciana (Spain): analysis of transmission clusters.

    PubMed

    Patiño-Galindo, Juan Ángel; Torres-Puente, Manoli; Bracho, María Alma; Alastrué, Ignacio; Juan, Amparo; Navarro, David; Galindo, María José; Ocete, Dolores; Ortega, Enrique; Gimeno, Concepción; Belda, Josefina; Domínguez, Victoria; Moreno, Rosario; González-Candelas, Fernando

    2017-09-14

    HIV infections are still a very serious concern for public heath worldwide. We have applied molecular evolution methods to study the HIV-1 epidemics in the Comunidad Valenciana (CV, Spain) from a public health surveillance perspective. For this, we analysed 1804 HIV-1 sequences comprising protease and reverse transcriptase (PR/RT) coding regions, sampled between 2004 and 2014. These sequences were subtyped and subjected to phylogenetic analyses in order to detect transmission clusters. In addition, univariate and multinomial comparisons were performed to detect epidemiological differences between HIV-1 subtypes, and risk groups. The HIV epidemic in the CV is dominated by subtype B infections among local men who have sex with men (MSM). 270 transmission clusters were identified (>57% of the dataset), 12 of which included ≥10 patients; 11 of subtype B (9 affecting MSMs) and one (n = 21) of CRF14, affecting predominately intravenous drug users (IDUs). Dated phylogenies revealed these large clusters to have originated from the mid-80s to the early 00 s. Subtype B is more likely to form transmission clusters than non-B variants and MSMs to cluster than other risk groups. Multinomial analyses revealed an association between non-B variants, which are not established in the local population yet, and different foreign groups.

  16. Phylogenetic and Temporal Dynamics of Human Immunodeficiency Virus Type 1 CRF01_AE in China

    PubMed Central

    Su, Xueli; Lu, Hongyan; Pang, Xinghuo; Yan, Hong; Feng, Xia; He, Xiong; Zeng, Yi

    2013-01-01

    To explore the epidemic history of HIV-1 CRF01_AE in China, 408 fragments of gag gene sequences of CRF01_AE sampled in 2002–2010 were determined from different geographical regions and risk populations in China. Phylogenetic analysis indicates that the CRF01_AE sequences can be grouped into four clusters, suggesting that at least four genetically independent CRF01_AE descendants are circulating in China, of which two were closely related to the isolates from Thailand and Vietnam. Cluster 1 has the most extensive distribution in China. In North China, cluster 1 and cluster 4 were mainly transmitted through homosexuality.The real substance of the recent HIV-1 epidemic in men who have sex with men(MSM) of North China is a rapid spread of CRF01_AE, or rather two distinctive natives CRF01_AE.The time of the most recent common ancestor (tMRCA) of four CRF01_AE clusters ranged from the years 1990.9 to 2003.8 in different regions of China. This is the first phylogenetic and temporal dynamics study of HIV-1 CRF01_AE in China. PMID:23365653

  17. Identification of a Pantoea Biosynthetic Cluster That Directs the Synthesis of an Antimicrobial Natural Product

    PubMed Central

    Walterson, Alyssa M.; Smith, Derek D. N.; Stavrinides, John

    2014-01-01

    Fire Blight is a destructive disease of apple and pear caused by the enteric bacterial pathogen, Erwinia amylovora. E. amylovora initiates infection by colonizing the stigmata of apple and pear trees, and entering the plants through natural openings. Epiphytic populations of the related enteric bacterium, Pantoea, reduce the incidence of disease through competition and antibiotic production. In this study, we identify an antibiotic from Pantoea ananatis BRT175, which is effective against E. amylovora and select species of Pantoea. We used transposon mutagenesis to create a mutant library, screened approximately 5,000 mutants for loss of antibiotic production, and recovered 29 mutants. Sequencing of the transposon insertion sites of these mutants revealed multiple independent disruptions of an 8.2 kb cluster consisting of seven genes, which appear to be coregulated. An analysis of the distribution of this cluster revealed that it was not present in any other of our 115 Pantoea isolates, or in any of the fully sequenced Pantoea genomes, and is most closely related to antibiotic biosynthetic clusters found in three different species of Pseudomonas. This identification of this biosynthetic cluster highlights the diversity of natural products produced by Pantoea. PMID:24796857

  18. [Comparative analysis of variable regions in the genomes of variola virus].

    PubMed

    Babkin, I V; Nepomniashchikh, T S; Maksiutov, R A; Gutorov, V V; Babkina, I N; Shchelkunov, S N

    2008-01-01

    Nucleotide sequences of two extended segments of the terminal variable regions in variola virus genome were determined. The size of the left segment was 13.5 kbp and of the right, 10.5 kbp. Totally, over 540 kbp were sequenced for 22 variola virus strains. The conducted phylogenetic analysis and the data published earlier allowed us to find the interrelations between 70 variola virus isolates, the character of their clustering, and the degree of intergroup and intragroup variations of the clusters of variola virus strains. The most polymorphic loci of the genome segments studied were determined. It was demonstrated that that these loci are localized to either noncoding genome regions or to the regions of destroyed open reading frames, characteristic of the ancestor virus. These loci are promising for development of the strategy for genotyping variola virus strains. Analysis of recombination using various methods demonstrated that, with the only exception, no statistically significant recombinational events in the genomes of variola virus strains studied were detectable.

  19. Molecular phylogeny of 21 tropical bamboo species reconstructed by integrating non-coding internal transcribed spacer (ITS1 and 2) sequences and their consensus secondary structure.

    PubMed

    Ghosh, Jayadri Sekhar; Bhattacharya, Samik; Pal, Amita

    2017-06-01

    The unavailability of the reproductive structure and unpredictability of vegetative characters for the identification and phylogenetic study of bamboo prompted the application of molecular techniques for greater resolution and consensus. We first employed internal transcribed spacer (ITS1, 5.8S rRNA and ITS2) sequences to construct the phylogenetic tree of 21 tropical bamboo species. While the sequence alone could grossly reconstruct the traditional phylogeny amongst the 21-tropical species studied, some anomalies were encountered that prompted a further refinement of the phylogenetic analyses. Therefore, we integrated the secondary structure of the ITS sequences to derive individual sequence-structure matrix to gain more resolution on the phylogenetic reconstruction. The results showed that ITS sequence-structure is the reliable alternative to the conventional phenotypic method for the identification of bamboo species. The best-fit topology obtained by the sequence-structure based phylogeny over the sole sequence based one underscores closer clustering of all the studied Bambusa species (Sub-tribe Bambusinae), while Melocanna baccifera, which belongs to Sub-Tribe Melocanneae, disjointedly clustered as an out-group within the consensus phylogenetic tree. In this study, we demonstrated the dependability of the combined (ITS sequence+structure-based) approach over the only sequence-based analysis for phylogenetic relationship assessment of bamboo.

  20. Ages of intermediate-age Magellanic Cloud star clusters

    NASA Technical Reports Server (NTRS)

    Flower, P. J.

    1984-01-01

    Ages of intermediate-age Large Magellanic Cloud star clusters have been estimated without locating the faint, unevolved portion of cluster main sequences. Six clusters with established color-magnitude diagrams were selected for study: SL 868, NGC 1783, NGC 1868, NGC 2121, NGC 2209, and NGC 2231. Since red giant photometry is more accurate than the necessarily fainter main-sequence photometry, the distributions of red giants on the cluster color-magnitude diagrams were compared to a grid of 33 stellar evolutionary tracks, evolved from the main sequence through core-helium exhaustion, spanning the expected mass and metallicity range for Magellanic Cloud cluster red giants. The time-dependent behavior of the luminosity of the model red giants was used to estimate cluster ages from the observed cluster red giant luminosities. Except for the possibility of SL 868 being an old globular cluster, all clusters studied were found to have ages less than 10 to the 9th yr. It is concluded that there is currently no substantial evidence for a major cluster population of large, populous clusters greater than 10 to the 9th yr old in the Large Magellanic Cloud.

  1. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    PubMed

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  2. Clustering evolving proteins into homologous families.

    PubMed

    Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

    2013-04-08

    Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.

  3. Phylogenetic analysis of canine distemper virus in domestic dogs in Nanjing, China.

    PubMed

    Bi, Zhenwei; Wang, Yongshan; Wang, Xiaoli; Xia, Xingxia

    2015-02-01

    Canine distemper virus (CDV) infects a broad range of carnivores, including wild and domestic Canidae. The hemagglutinin gene, which encodes the attachment protein that determines viral tropism, has been widely used to determine the relationship between CDV strains of different lineages circulating worldwide. We determined the full-length H gene sequences of seven CDV field strains detected in domestic dogs in Nanjing, China. A phylogenetic analysis of the H gene sequences of CDV strains from different geographic regions and vaccine strains was performed. Four of the seven CDV strains were grouped in the same cluster of the Asia-1 lineage to which the vast majority of Chinese CDV strains belong, whereas the other three were clustered within the Asia-4 lineage, which has never been detected in China. This represents the first record of detection of strains of the Asia-4 lineage in China since this lineage was reported in Thailand in 2013.

  4. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants

    PubMed Central

    Barta, Endre; Sebestyén, Endre; Pálfy, Tamás B.; Tóth, Gábor; Ortutay, Csaba P.; Patthy, László

    2005-01-01

    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21 061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically. PMID:15608291

  5. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants.

    PubMed

    Barta, Endre; Sebestyén, Endre; Pálfy, Tamás B; Tóth, Gábor; Ortutay, Csaba P; Patthy, László

    2005-01-01

    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21,061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically.

  6. Acquisition of initial /s/-stop and stop-/s/sequences in Greek.

    PubMed

    Syrika, Asimina; Nicolaidis, Katerina; Edwards, Jan; Beckman, Mary E

    2011-09-01

    Previous work on children's acquisition of complex sequences points to a tendency for affricates to be acquired before clusters, but there is no clear evidence of a difference in order of acquisition between clusters with /s/ that violate the Sonority Sequencing Principle (SSP), such as /s/ followed by stop in onset position, and other clusters that obey the SSP. One problem with studies that have compared the acquisition of SSP-obeying and SSP-violating clusters is that the component sounds in the two types of sequences were different.This paper examines the acquisition of initial /s/-stop and stop-/s/ sequences by sixty Greek children aged 2 through 5 years. Results showed greater accuracy for the /s/-stop relative to the stop-/s/ sequences, but no difference in accuracy between /ts/, which is usually analyzed as an affricate in Greek, and the other stop-/s/ sequences. Moreover, errors for the /s/-stop sequences and /ts/ primarily involved stop substitutions, whereas errors for /ps/ and /ks/ were more variable and often involved fricative substitutions, a pattern which may have a perceptual explanation. Finally, /ts/ showed a distinct temporal pattern relative to the stop-/s/ clusters /ps/ and /ks/, similar to what has been reported for productions of Greek adults.

  7. Acquisition of initial /s/-stop and stop-/s/ sequences in Greek

    PubMed Central

    Syrika, Asimina; Nicolaidis, Katerina; Edwards, Jan; Beckman, Mary E.

    2010-01-01

    Previous work on children’s acquisition of complex sequences points to a tendency for affricates to be acquired before clusters, but there is no clear evidence of a difference in order of acquisition between clusters with /s/ that violate the Sonority Sequencing Principle (SSP), such as /s/ followed by stop in onset position, and other clusters that obey the SSP. One problem with studies that have compared the acquisition of SSP-obeying and SSP-violating clusters is that the component sounds in the two types of sequences were different. This paper examines the acquisition of initial /s/-stop and stop-/s/ sequences by sixty Greek children aged 2 through 5 years. Results showed greater accuracy for the /s/-stop relative to the stop-/s/ sequences, but no difference in accuracy between /ts/, which is usually analyzed as an affricate in Greek, and the other stop-/s/ sequences. Moreover, errors for the /s/-stop sequences and /ts/ primarily involved stop substitutions, whereas errors for /ps/ and /ks/ were more variable and often involved fricative substitutions, a pattern which may have a perceptual explanation. Finally, /ts/ showed a distinct temporal pattern relative to the stop-/s/ clusters /ps/ and /ks/, similarly to what has been reported for productions of Greek adults. PMID:22070044

  8. Human Papillomavirus Type 6 and 11 Genetic Variants Found in 71 Oral and Anogenital Epithelial Samples from Australia

    PubMed Central

    Danielewski, Jennifer A.; Garland, Suzanne M.; McCloskey, Jenny; Hillman, Richard J.; Tabrizi, Sepehr N.

    2013-01-01

    Genetic variation of 49 human papillomavirus (HPV) 6 and 22 HPV11 isolates from recurrent respiratory papillomatosis (RRP) (n = 17), genital warts (n = 43), anal cancer (n = 6) and cervical neoplasia cells (n = 5), was determined by sequencing the long control region (LCR) and the E6 and E7 genes. Comparative analysis of genetic variability was examined to determine whether different disease states resulting from HPV6 or HPV11 infection cluster into distinct variant groups. Sequence variation analysis of HPV6 revealed that isolates cluster into variants within previously described HPV6 lineages, with the majority (65%) clustering to HPV6 sublineage B1 across the three genomic regions examined. Overall 72 HPV6 and 25 HPV11 single nucleotide variations, insertions and deletions were observed within samples examined. In addition, missense alterations were observed in the E6/E7 genes for 6 HPV6 and 5 HPV11 variants. No nucleotide variations were identified in any isolates at the four E2 binding sites for HPV6 or HPV11, nor were any isolates found to be identical to the HPV6 lineage A or HPV11 sublineage A1 reference genomes. Overall, a high degree of sequence conservation was observed between isolates across each of the regions investigated for both HPV6 and HPV11. Genetic variants identified a slight association with HPV6 and anogenital lesions (p = 0.04). This study provides important information on the genetic diversity of circulating HPV 6 and HPV11 variants within the Australian population and supports the observation that the majority of HPV6 isolates cluster to the HPV6 sublineage B1 with anogenital lesions demonstrating an association with this sublineage (p = 0.02). Comparative analysis of Australian isolates for both HPV6 and HPV11 to those from other geographical regions based on the LCR revealed a high degree of sequence similarity throughout the world, confirming previous observations that there are no geographically specific variants for these HPV types. PMID:23691108

  9. Graph analysis of cell clusters forming vascular networks

    NASA Astrophysics Data System (ADS)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  10. Photometric and spectroscopic study of low mass embedded star clusters in reflection nebulae

    NASA Astrophysics Data System (ADS)

    Soares, J. B.; Bica, E.; Ahumada, A. V.; Clariá, J. J.

    2005-02-01

    An analysis of the candidate embedded stellar systems in the reflection nebulae vdBH-RN 26, vdBH-RN} 38, vdBH-RN} 53a, GGD 20, ESO 95-RN 18 and NGC 6595 is presented. Optical spectroscopic data from CASLEO (Argentina) in conjunction with near infrared photometry from the 2MASS Point Source Catalogue were employed. The analysis is based on source surface density, colour-colour and colour-magnitude diagrams together with theoretical pre-main sequence isochrones. We take into account the field population affecting the analysis by carrying out a statistical subtraction. The fundamental parameters for the stellar systems were derived. The resulting ages are in the range 1-4 Myr and the objects are dominated by pre-main sequence stars. The observed masses locked in the clusters are less than 25 M⊙. The studied systems have no stars of spectral types earlier than B, indicating that star clusters do not necessarily evolve through an HII region phase. The relatively small locked mass combined with the fact that they are not numerous in catalogues suggests that these low mass clusters are not important donors of stars to the field populations. Based on observations made at Complejo Astronómico El Leoncito, which is operated under agreement between the Consejo Nacional de Investigaciones Científicas y Técnicas de la República Argentina and the National Universities of La Plata, Córdoba and San Juan, Argentina.

  11. Novel Insights into the Diversity of Catabolic Metabolism from Ten Haloarchaeal Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain; Scheuner, Carmen; Goker, Markus

    2011-05-03

    The extremely halophilic archaea are present worldwide in saline environments and have important biotechnological applications. Ten complete genomes of haloarchaea are now available, providing an opportunity for comparative analysis. We report here the comparative analysis of five newly sequenced haloarchaeal genomes with five previously published ones. Whole genome trees based on protein sequences provide strong support for deep relationships between the ten organisms. Using a soft clustering approach, we identified 887 protein clusters present in all halophiles. Of these core clusters, 112 are not found in any other archaea and therefore constitute the haloarchaeal signature. Four of the halophiles weremore » isolated from water, and four were isolated from soil or sediment. Although there are few habitat-specific clusters, the soil/sediment halophiles tend to have greater capacity for polysaccharide degradation, siderophore synthesis, and cell wall modification. Halorhabdus utahensis and Haloterrigena turkmenica encode over forty glycosyl hydrolases each, and may be capable of breaking down naturally occurring complex carbohydrates. H. utahensis is specialized for growth on carbohydrates and has few amino acid degradation pathways. It uses the non-oxidative pentose phosphate pathway instead of the oxidative pathway, giving it more flexibility in the metabolism of pentoses. These new genomes expand our understanding of haloarchaeal catabolic pathways, providing a basis for further experimental analysis, especially with regard to carbohydrate metabolism. Halophilic glycosyl hydrolases for use in biofuel production are more likely to be found in halophiles isolated from soil or sediment.« less

  12. Tracking inter-institutional spread of NDM and identification of a novel NDM-positive plasmid, pSg1-NDM, using next-generation sequencing approaches.

    PubMed

    Khong, Wei Xin; Marimuthu, Kalisvar; Teo, Jeanette; Ding, Yichen; Xia, Eryu; Lee, Jia Jun; Ong, Rick Twee-Hee; Venkatachalam, Indumathi; Cherng, Benjamin; Pada, Surinder Kaur; Choong, Weng Lam; Smitasin, Nares; Ooi, Say Tat; Deepak, Rama Narayana; Kurup, Asok; Fong, Raymond; Van La, My; Tan, Thean Yen; Koh, Tse Hsien; Lin, Raymond Tzer Pin; Tan, Eng Lee; Krishnan, Prabha Unny; Singh, Siddharth; Pitout, Johann D; Teo, Yik-Ying; Yang, Liang; Ng, Oon Tek

    2016-11-01

    Owing to gene transposition and plasmid conjugation, New Delhi metallo-β-lactamase (NDM) is typically identified among varied Enterobacteriaceae species and STs. We used WGS to characterize the chromosomal and plasmid molecular epidemiology of NDM transmission involving four institutions in Singapore. Thirty-three Enterobacteriaceae isolates (collection years 2010-14) were sequenced using short-read sequencing-by-synthesis and analysed. Long-read single molecule, real-time sequencing (SMRTS) was used to characterize genetically a novel plasmid pSg1-NDM carried on Klebsiella pneumoniae ST147. In 20 (61%) isolates, bla NDM was located on the pNDM-ECS01 plasmid in the background of multiple bacterial STs, including eight K. pneumoniae STs and five Escherichia coli STs. In six (18%) isolates, a novel bla NDM -positive plasmid, pSg1-NDM, was found only in K. pneumoniae ST147. The pSg1-NDM-K. pneumoniae ST147 clone (Sg1-NDM) was fully sequenced using SMRTS. pSg1-NDM, a 90 103 bp IncR plasmid, carried genes responsible for resistance to six classes of antimicrobials. A large portion of pSg1-NDM had no significant homology to any known plasmids in GenBank. pSg1-NDM had no conjugative transfer region. Combined chromosomal-plasmid phylogenetic analysis revealed five clusters of clonal bacterial NDM-positive plasmid transmission, of which two were inter-institution clusters. The largest inter-institution cluster involved six K. pneumoniae ST147-pSg1-NDM isolates. Fifteen patients were involved in transmission clusters, of which four had ward contact, six had hospital contact and five had an unknown transmission link. A combined sequencing-by-synthesis and SMRTS approach can determine effectively the transmission clusters of bla NDM and genetically characterize novel plasmids. Plasmid molecular epidemiology is important to understanding NDM spread as bla NDM -positive plasmids can conjugate extensively across species and STs. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences.

    PubMed

    Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory

    2014-01-01

    We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.

  14. Discrete Cosine Transform Image Coding With Sliding Block Codes

    NASA Astrophysics Data System (ADS)

    Divakaran, Ajay; Pearlman, William A.

    1989-11-01

    A transform trellis coding scheme for images is presented. A two dimensional discrete cosine transform is applied to the image followed by a search on a trellis structured code. This code is a sliding block code that utilizes a constrained size reproduction alphabet. The image is divided into blocks by the transform coding. The non-stationarity of the image is counteracted by grouping these blocks in clusters through a clustering algorithm, and then encoding the clusters separately. Mandela ordered sequences are formed from each cluster i.e identically indexed coefficients from each block are grouped together to form one dimensional sequences. A separate search ensues on each of these Mandela ordered sequences. Padding sequences are used to improve the trellis search fidelity. The padding sequences absorb the error caused by the building up of the trellis to full size. The simulations were carried out on a 256x256 image ('LENA'). The results are comparable to any existing scheme. The visual quality of the image is enhanced considerably by the padding and clustering.

  15. MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

    PubMed

    Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2014-01-01

    A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Genotypic diversity of oscillatoriacean strains belonging to the genera Geitlerinema and Spirulina determined by 16S rDNA restriction analysis.

    PubMed

    Margheri, Maria C; Piccardi, Raffaella; Ventura, Stefano; Viti, Carlo; Giovannetti, Luciana

    2003-05-01

    Genotypic diversity of several cyanobacterial strains mostly isolated from marine or brackish waters, belonging to the genera Geitlerinema and Spirulina, was investigated by amplified 16S ribosomal DNA restriction analysis and compared with morphological features and response to salinity. Cluster analysis was performed on amplified 16S rDNA restriction profiles of these strains along with profiles obtained from sequence data of five Spirulina-like strains, including three representatives of the new genus Halospirulina. Our strains with tightly coiled trichomes from hypersaline waters could be assigned to the Halospirulina genus. Among the uncoiled strains, the two strains of hypersaline origin clustered together and were found to be distant from their counterparts of marine and freshwater habitat. Moreover, another cluster, formed by alkali-tolerant strains with tightly coiled trichomes, was well delineated.

  17. First description of Grapevine leafroll-associated virus 5 in Argentina and partial genome sequence.

    PubMed

    Gómez Talquenca, Sebastián; Muñoz, Claudio; Grau, Oscar; Gracia, Olga

    2009-02-01

    An accession of Vitis vinifera cv. Red Globe from Argentina, was found to be infected with Grapevine leafroll-associated virus-5 by ELISA. It was partially sequenced, and three ORFs, corresponding to HSP70h, HSP90h, and CP, were found. This isolate shares a high aminoacid identity with the previously reported sequence of the virus, and identities between 80% and 90% with previously reported GLRaV-9 and GLRaV-4 isolates. The analysis of the sequence supports the clustering together with GLRaV-4 and GLRV-9 inside the Ampelovirus genus.

  18. THE RED SEQUENCE AT BIRTH IN THE GALAXY CLUSTER Cl J1449+0856 AT z = 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strazzullo, V.; Pannella, M.; Daddi, E.

    We use Hubble Space Telescope /WFC3 imaging to study the red population in the IR-selected, X-ray detected, low-mass cluster Cl J1449+0856 at z = 2, one of the few bona fide established clusters discovered at this redshift, and likely a typical progenitor of an average massive cluster today. This study explores the presence and significance of an early red sequence in the core of this structure, investigating the nature of red-sequence galaxies, highlighting environmental effects on cluster galaxy populations at high redshift, and at the same time underlining similarities and differences with other distant dense environments. Our results suggest thatmore » the red population in the core of Cl J1449+0856 is made of a mixture of quiescent and dusty star-forming galaxies, with a seedling of the future red sequence already growing in the very central cluster region, and already characterizing the inner cluster core with respect to lower-density environments. On the other hand, the color–magnitude diagram of this cluster is definitely different from that of lower-redshift z ≲ 1 clusters, as well as of some rare particularly evolved massive clusters at similar redshift, and it is suggestive of a transition phase between active star formation and passive evolution occurring in the protocluster and established lower-redshift cluster regimes.« less

  19. Development of Genomic Microsatellite Markers in Carthamus tinctorius L. (Safflower) Using Next Generation Sequencing and Assessment of Their Cross-Species Transferability and Utility for Diversity Analysis

    PubMed Central

    Variath, Murali Tottekkad; Joshi, Gopal; Bali, Sapinder; Agarwal, Manu; Kumar, Amar; Jagannath, Arun; Goel, Shailendra

    2015-01-01

    Background Safflower (Carthamus tinctorius L.), an Asteraceae member, yields high quality edible oil rich in unsaturated fatty acids and is resilient to dry conditions. The crop holds tremendous potential for improvement through concerted molecular breeding programs due to the availability of significant genetic and phenotypic diversity. Genomic resources that could facilitate such breeding programs remain largely underdeveloped in the crop. The present study was initiated to develop a large set of novel microsatellite markers for safflower using next generation sequencing. Principal Findings Low throughput genome sequencing of safflower was performed using Illumina paired end technology providing ~3.5X coverage of the genome. Analysis of sequencing data allowed identification of 23,067 regions harboring perfect microsatellite loci. The safflower genome was found to be rich in dinucleotide repeats followed by tri-, tetra-, penta- and hexa-nucleotides. Primer pairs were designed for 5,716 novel microsatellite sequences with repeat length ≥ 20 bases and optimal flanking regions. A subset of 325 microsatellite loci was tested for amplification, of which 294 loci produced robust amplification. The validated primers were used for assessment of 23 safflower accessions belonging to diverse agro-climatic zones of the world leading to identification of 93 polymorphic primers (31.6%). The numbers of observed alleles at each locus ranged from two to four and mean polymorphism information content was found to be 0.3075. The polymorphic primers were tested for cross-species transferability on nine wild relatives of cultivated safflower. All primers except one showed amplification in at least two wild species while 25 primers amplified across all the nine species. The UPGMA dendrogram clustered C. tinctorius accessions and wild species separately into two major groups. The proposed progenitor species of safflower, C. oxyacantha and C. palaestinus were genetically closer to cultivated safflower and formed a distinct cluster. The cluster analysis also distinguished diploid and tetraploid wild species of safflower. Conclusion Next generation sequencing of safflower genome generated a large set of microsatellite markers. The novel markers developed in this study will add to the existing repertoire of markers and can be used for diversity analysis, synteny studies, construction of linkage maps and marker-assisted selection. PMID:26287743

  20. Whole genome characterization of human influenza A(H1N1)pdm09 viruses isolated from Kenya during the 2009 pandemic.

    PubMed

    Gachara, George; Symekher, Samuel; Otieno, Michael; Magana, Japheth; Opot, Benjamin; Bulimo, Wallace

    2016-06-01

    An influenza pandemic caused by a novel influenza virus A(H1N1)pdm09 spread worldwide in 2009 and is estimated to have caused between 151,700 and 575,400 deaths globally. While whole genome data on new virus enables a deeper insight in the pathogenesis, epidemiology, and drug sensitivities of the circulating viruses, there are relatively limited complete genetic sequences available for this virus from African countries. We describe herein the full genome analysis of influenza A(H1N1)pdm09 viruses isolated in Kenya between June 2009 and August 2010. A total of 40 influenza A(H1N1)pdm09 viruses isolated during the pandemic were selected. The segments from each isolate were amplified and directly sequenced. The resulting sequences of individual gene segments were concatenated and used for subsequent analysis. These were used to infer phylogenetic relationships and also to reconstruct the time of most recent ancestor, time of introduction into the country, rates of substitution and to estimate a time-resolved phylogeny. The Kenyan complete genome sequences clustered with globally distributed clade 2 and clade 7 sequences but local clade 2 viruses did not circulate beyond the introductory foci while clade 7 viruses disseminated country wide. The time of the most recent common ancestor was estimated between April and June 2009, and distinct clusters circulated during the pandemic. The complete genome had an estimated rate of nucleotide substitution of 4.9×10(-3) substitutions/site/year and greater diversity in surface expressed proteins was observed. We show that two clades of influenza A(H1N1)pdm09 virus were introduced into Kenya from the UK and the pandemic was sustained as a result of importations. Several closely related but distinct clusters co-circulated locally during the peak pandemic phase but only one cluster dominated in the late phase of the pandemic suggesting that it possessed greater adaptability. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Genetic diversity analysis of Capparis spinosa L. populations by using ISSR markers.

    PubMed

    Liu, C; Xue, G P; Cheng, B; Wang, X; He, J; Liu, G H; Yang, W J

    2015-12-09

    Capparis spinosa L. is an important medicinal species in the Xinjiang Province of China. Ten natural populations of C. spinosa from 3 locations in North, Central, and South Xinjiang were studied using morphological trait inter simple sequence repeat (ISSR) molecular markers to assess the genetic diversity and population structure. In this study, the 10 ISSR primers produced 313 amplified DNA fragments, with 52% of fragments being polymorphic. Unweighted pair-group method with arithmetic average (UPGMA) cluster analysis indicated that 10 C. spinosa populations were clustered into 3 geographically distinct groups. The Nei gene of C. spinosa populations in different regions had Diversity and Shannon's information index ranges of 0.1312-0.2001 and 0.1004-0.1875, respectively. The 362 markers were used to construct the dendrogram based on the UPGMA cluster analysis. The dendrogram indicated that 10 populations of C. spinosa were clustered into 3 geographically distinct groups. The results showed these genotypes have high genetic diversity, and can be used for an alternative breeding program.

  2. SEAN: SNP prediction and display program utilizing EST sequence clusters.

    PubMed

    Huntley, Derek; Baldo, Angela; Johri, Saurabh; Sergot, Marek

    2006-02-15

    SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

  3. CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence

    PubMed Central

    Nepal, Madhav P; Benson, Benjamin V

    2015-01-01

    Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the Ks-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future. PMID:25922568

  4. CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence.

    PubMed

    Nepal, Madhav P; Benson, Benjamin V

    2015-01-01

    Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the K s-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future.

  5. A Public Health Model for the Molecular Surveillance of HIV Transmission in San Diego, California

    PubMed Central

    May, Susanne; Tweeten, Samantha; Drumright, Lydia; Pacold, Mary E.; Kosakovsky Pond, Sergei L.; Pesano, Rick L.; Lie, Yolanda S.; Richman, Douglas D.; Frost, Simon D.W.; Woelk, Christopher H.; Little, Susan J.

    2009-01-01

    Background Current public health efforts often use molecular technologies to identify and contain communicable disease networks, but not for HIV. Here, we investigate how molecular epidemiology can be used to identify highly-related HIV networks within a population and how voluntary contact tracing of sexual partners can be used to selectively target these networks. Methods We evaluated the use of HIV-1 pol sequences obtained from participants of a community-recruited cohort (n=268) and a primary infection research cohort (n=369) to define highly related transmission clusters and the use of contact tracing to link other individuals (n=36) within these clusters. The presence of transmitted drug resistance was interpreted from the pol sequences (Calibrated Population Resistance v3.0). Results Phylogenetic clustering was conservatively defined when the genetic distance between any two pol sequences was <1%, which identified 34 distinct transmission clusters within the combined community-recruited and primary infection research cohorts containing 160 individuals. Although sequences from the epidemiologically-linked partners represented approximately 5% of the total sequences, they clustered with 60% of the sequences that clustered from the combined cohorts (O.R. 21.7; p=<0.01). Major resistance to at least one class of antiretroviral medication was found in 19% of clustering sequences. Conclusions Phylogenetic methods can be used to identify individuals who are within highly related transmission groups, and contact tracing of epidemiologically-linked partners of recently infected individuals can be used to link into previously-defined transmission groups. These methods could be used to implement selectively targeted prevention interventions. PMID:19098493

  6. Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

    PubMed

    Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

    2017-07-11

    A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.

  7. Piscine reovirus: Genomic and molecular phylogenetic analysis from farmed and wild salmonids collected on the Canada/US Pacific Coast

    USGS Publications Warehouse

    Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.

    2015-01-01

    Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.

  8. Piscine Reovirus: Genomic and Molecular Phylogenetic Analysis from Farmed and Wild Salmonids Collected on the Canada/US Pacific Coast

    PubMed Central

    Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.

    2015-01-01

    Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673

  9. A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.

    PubMed

    Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe

    2018-02-01

    In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Photometric Calibrations of Gemini Images of NGC 6253

    NASA Astrophysics Data System (ADS)

    Pearce, Sean; Jeffery, Elizabeth

    2017-01-01

    We present preliminary results of our analysis of the metal-rich open cluster NGC 6253 using imaging data from GMOS on the Gemini-South Observatory. These data are part of a larger project to observe the effects of high metallicity on white dwarf cooling processes, especially the white dwarf cooling age, which have important implications on the processes of stellar evolution. To standardize the Gemini photometry, we have also secured imaging data of both the cluster and standard star fields using the 0.6-m SARA Observatory at CTIO. By analyzing and comparing the standard star fields of both the SARA data and the published Gemini zero-points of the standard star fields, we will calibrate the data obtained for the cluster. These calibrations are an important part of the project to obtain a standardized deep color-magnitude diagram to analyze the cluster. We present the process of verifying our standardization process. With a standardized CMD, we also present an analysis of the cluster's main sequence turn off age.

  11. Sequence spaces [Formula: see text] and [Formula: see text] with application in clustering.

    PubMed

    Khan, Mohd Shoaib; Alamri, Badriah As; Mursaleen, M; Lohani, Qm Danish

    2017-01-01

    Distance measures play a central role in evolving the clustering technique. Due to the rich mathematical background and natural implementation of [Formula: see text] distance measures, researchers were motivated to use them in almost every clustering process. Beside [Formula: see text] distance measures, there exist several distance measures. Sargent introduced a special type of distance measures [Formula: see text] and [Formula: see text] which is closely related to [Formula: see text]. In this paper, we generalized the Sargent sequence spaces through introduction of [Formula: see text] and [Formula: see text] sequence spaces. Moreover, it is shown that both spaces are BK -spaces, and one is a dual of another. Further, we have clustered the two-moon dataset by using an induced [Formula: see text]-distance measure (induced by the Sargent sequence space [Formula: see text]) in the k-means clustering algorithm. The clustering result established the efficacy of replacing the Euclidean distance measure by the [Formula: see text]-distance measure in the k-means algorithm.

  12. The CNO Bi-cycle in the Open Cluster NGC 752

    NASA Astrophysics Data System (ADS)

    Hawkins, Keith; Schuler, S.; King, J.; The, L.

    2011-01-01

    The CNO bi-cycle is the primary energy source for main sequence stars more massive than the sun. To test our understanding of stellar evolution models using the CNO bi-cycle, we have undertaken light-element (CNO) abundance analysis of three main sequence dwarf stars and three red giant stars in the open cluster NGC 752 utilizing high resolution (R 50,000) spectroscopy from the Keck Observatory. Preliminary results indicate, as expected, there is a depletion of carbon in the giants relative to the dwarfs. Additional analysis is needed to determine if the amount of depletion is in line with model predictions, as seen in the Hyades open cluster. Oxygen abundances are derived from the high-excitation O I triplet, and there is a 0.19 dex offset in the [O/H] abundances between the giants and dwarfs which may be explained by non-local thermodynamic equilibrium (NLTE), although further analysis is needed to verify this. The standard procedure for spectroscopically determining stellar parameters used here allows for a measurement of the cluster metallicity, [Fe/H] = 0.04 ± 0.02. In addition to the Fe abundances we have determined Na, Mg, and Al abundances to determine the status of other nucleosynthesis processes. The Na, Mg and Al abundances of the giants are enhanced relative to the dwarfs, which is consistent with similar findings in giants of other open clusters. Support for K. Hawkins was provided by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program and the Department of Defense ASSURE program through Scientific Program Order No. 13 (AST-0754223) of the Cooperative Agreement No. AST-0132798 between the Association of Universities for Research in Astronomy (AURA) and the NSF.

  13. Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

    NASA Astrophysics Data System (ADS)

    Dick, G. J.; Andersson, A.; Banfield, J. F.

    2007-12-01

    Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are not expected to reflect the tetranucleotide frequency signature of the host genome. Four unknown tetranucleotide frequency clusters with significant sequence (6 Mb total) were noted and analyzed further. Based on phylogenetic markers and BLAST results, these clusters represent low abundance bacteria including Acintobacteria, Firmicutes, and Proteobacteria. Functional analysis of these clusters revealved that the low- abundance bacteria harbor genes that could potentially encode important ecosystem functions such as sulfur utilization (e.g. polysulfide reductase) and polymer degradation (e.g. chitinase and glycoside hydrolase). We conclude that ESOM clustering of tetranucleotide frequency patterns is an effective method for rapidly binning shotgun community genomic sequences and a valuable tool for analyzing minor community members, which despite their low abundance may play crucial ecological roles.

  14. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less

  15. Characterization of expressed sequence tag-derived simple sequence repeat markers for Aspergillus flavus: emphasis on variability of isolates from the southern United States.

    PubMed

    Wang, Xinwang; Wadl, Phillip A; Wood-Jones, Alicia; Windham, Gary; Trigiano, Robert N; Scruggs, Mary; Pilgrim, Candace; Baird, Richard

    2012-12-01

    Simple sequence repeat (SSR) markers were developed from Aspergillus flavus expressed sequence tag (EST) database to conduct an analysis of genetic relationships of Aspergillus isolates from numerous host species and geographical regions, but primarily from the United States. Twenty-nine primers were designed from 362 tri-nucleotide EST-SSR sequences. Eighteen polymorphic loci were used to genotype 96 Aspergillus species isolates. The number of alleles detected per locus ranged from 2 to 24 with a mean of 8.2 alleles. Haploid diversity ranged from 0.28 to 0.91. Genetic distance matrix was used to perform principal coordinates analysis (PCA) and to generate dendrograms using unweighted pair group method with arithmetic mean (UPGMA). Two principal coordinates explained more than 75 % of the total variation among the isolates. One clade was identified for A. flavus isolates (n = 87) with the other Aspergillus species (n = 7) using PCA, but five distinct clusters were present when the others taxa were excluded from the analysis. Six groups were noted when the EST-SSR data were compared using UPGMA. However, the latter PCA or UPGMA comparison resulted in no direct associations with host species, geographical region or aflatoxin production. Furthermore, there was no direct correlation to visible morphological features such as sclerotial types. The isolates from Mississippi Delta region, which contained the largest percentage of isolates, did not show any unusual clustering except for isolates K32, K55, and 199. Further studies of these three isolates are warranted to evaluate their pathogenicity, aflatoxin production potential, additional gene sequences (e.g., RPB2), and morphological comparisons.

  16. Development of phoH as a Novel Signature Gene for Assessing Marine Phage Diversity▿

    PubMed Central

    Goldsmith, Dawn B.; Crosti, Giuseppe; Dwivedi, Bhakti; McDaniel, Lauren D.; Varsani, Arvind; Suttle, Curtis A.; Weinbauer, Markus G.; Sandaa, Ruth-Anne; Breitbart, Mya

    2011-01-01

    Phages play a key role in the marine environment by regulating the transfer of energy between trophic levels and influencing global carbon and nutrient cycles. The diversity of marine phage communities remains difficult to characterize because of the lack of a signature gene common to all phages. Recent studies have demonstrated the presence of host-derived auxiliary metabolic genes in phage genomes, such as those belonging to the Pho regulon, which regulates phosphate uptake and metabolism under low-phosphate conditions. Among the completely sequenced phage genomes in GenBank, this study identified Pho regulon genes in nearly 40% of the marine phage genomes, while only 4% of nonmarine phage genomes contained these genes. While several Pho regulon genes were identified, phoH was the most prevalent, appearing in 42 out of 602 completely sequenced phage genomes. Phylogenetic analysis demonstrated that phage phoH sequences formed a cluster distinct from those of their bacterial hosts. PCR primers designed to amplify a region of the phoH gene were used to determine the diversity of phage phoH sequences throughout a depth profile in the Sargasso Sea and at six locations worldwide. phoH was present at all sites examined, and a high diversity of phoH sequences was recovered. Most phoH sequences belonged to clusters without any cultured representatives. Each depth and geographic location had a distinct phoH composition, although most phoH clusters were recovered from multiple sites. Overall, phoH is an effective signature gene for examining phage diversity in the marine environment. PMID:21926220

  17. IMG-ABC. A knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites

    DOE PAGES

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; ...

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.« less

  18. Phylogeny of the Defined Murine Microbiota: Altered Schaedler Flora

    PubMed Central

    Dewhirst, Floyd E.; Chien, Chih-Ching; Paster, Bruce J.; Ericson, Rebecca L.; Orcutt, Roger P.; Schauer, David B.; Fox, James G.

    1999-01-01

    The “altered Schaedler flora” (ASF) was developed for colonizing germfree rodents with a standardized microbiota. The purpose of this study was to identify each of the eight ASF strains by 16S rRNA sequence analysis. Three strains were previously identified as Lactobacillus acidophilus (strain ASF 360), Lactobacillus salivarius (strain ASF 361), and Bacteroides distasonis (strain ASF 519) based on phenotypic criteria. 16S rRNA analysis indicated that each of the strains differed from its presumptive identity. The 16S rRNA sequence of strain ASF 361 is essentially identical to the 16S rRNA sequences of the type strains of Lactobacillus murinis and Lactobacillus animalis (both isolated from mice), and all of these strains probably belong to a single species. Strain ASF 360 is a novel lactobacillus that clusters with L. acidophilus and Lactobacillus lactis. Strain ASF 519 falls into an unnamed genus containing [Bacteroides] distasonis, [Bacteroides] merdae, [Bacteroides] forsythus, and CDC group DF-3. This unnamed genus is in the Cytophaga-Flavobacterium-Bacteroides phylum and is most closely related to the genus Porphyromonas. The spiral-shaped strain, strain ASF 457, is in the Flexistipes phylum and exhibits sequence identity with rodent isolates of Robertson. The remaining four ASF strains, which are extremely oxygen-sensitive fusiform bacteria, group phylogenetically with the low-G+C-content gram-positive bacteria (Firmicutes, Bacillus-Clostridium group). ASF 356, ASF 492, and ASF 502 fall into Clostridium cluster XIV of Collins et al. Morphologically, ASF 492 resembles members of this cluster, Roseburia cecicola, and Eubacterium plexicaudatum. The 16S rRNA sequence of ASF 492 is identical to that of E. plexicaudatum. Since the type strain and other viable original isolates of E. plexicaudatum have been lost, strain ASF 492 is a candidate for a neotype strain. Strain ASF 500 branches deeply in the low-G+C-content gram-positive phylogenetic tree but is not closely related to any organisms whose 16S rRNA sequences are currently in the GenBank database. The 16S rRNA sequence information determined in the present study should allow rapid identification of ASF strains and should permit detailed analysis of the interactions of ASF organisms during development of intestinal disease in mice that are coinfected with a variety of pathogenic microorganisms. PMID:10427008

  19. Infrasound and seismic analysis of the SpaceX Falcon9 explosion sequence of 1-September-2016

    NASA Astrophysics Data System (ADS)

    Thompson, G.; McNutt, S. R.; Brown, R. G.; Braunmiller, J.; Mehta, C.

    2017-12-01

    During a static launch test on 1-Sep-2016 at Kennedy Space Center, a SpaceX Falcon 9 rocket exploded causing loss of the rocket and the payload, and extensively damaging the launch complex. The sequence was captured by a 3-element infrasound array and a broadband 3-component seismometer at the Astronaut Beach House, just 0.87 miles (1.4 km) from the launch pad. Manual picking identified 153 impulsive airwave signals over a 26-minute interval and these were compared to video recordings of the sequence. The explosion onset consisted of a moderate signal on both seismic and infrasound (52 Pa) instruments. This corresponds to the rupture of the second-stage fuel tank. We found no signals before this, so we do not believe that there was an external cause. The primary fuel tank ruptured 4 seconds later and was the strongest event by far, producing an infrasound signal that exceeded 1400 Pa ( 2000 Pa in reduced pressure). The seismic signal consists mainly of air-coupled Rayleigh waves with frequencies of 5-23 Hz. The infrasound events occurred in four clusters. The first cluster included the onset and main events and 46 smaller events. This was followed by several minutes without infrasound signals during which a 3.5 minute continuous seismic vibration occurred. Cluster 2 consisted of 4 events ranging from 117-256 Pa. Cluster 3 comprised 96 events of 7-78 Pa. Cluster 4 consisted of 5 events with overpressures of 23-63 Pa. Gaps of several minutes without infrasound and seismic signals occurred between clusters 2 and 3, and 3 and 4. In terms of energy, the main event dominated; in terms of numbers, cluster 3 had the most infrasound events. The seismic and infrasound data are complementary to video recordings of the explosion, and provide additional characterization that may be useful to interpret the sequence of events. Because of the proximity of our array to this rocket explosion, our dataset may be unique.

  20. Discrimination of Scedosporium prolificans against Pseudallescheria boydii and Scedosporium apiospermum by semiautomated repetitive sequence-based PCR.

    PubMed

    Steinmann, J; Schmidt, D; Buer, J; Rath, P-M

    2011-07-01

    The laboratory identification of Pseudallescheria and Scedosporium isolates at the species level is important for clinical and epidemiological purposes. This study used semiautomated repetitive sequence-based polymerase chain reaction (rep-PCR) to identify Pseudallescheria/Scedosporium. Reference strains of Pseudallescheria boydii (n = 12), Scedosporium prolificans (n = 8), Scedosporium apiospermum (n = 9), and clinical/environmental isolates (P. boydii, 7; S. prolificans, 7; S. apiospermum, 7) were analyzed by rep-PCR. All clinical isolates were identified by morphological and phenotypic characteristics and by sequence analysis. Species identification of reference strains was based on the results of available databases. Rep-PCR studies were also conducted with various molds to differentiate Pseudallescheria/Scedosporium spp. from other commonly encountered filamentous fungi. All tested Pseudallescheria/Scedosporium isolates were distinguishable from the other filamentous fungi. All Scedosporium prolificans strains clustered within the cutoff of 85%, and species identification by rep-PCR showed an agreement of 100% with sequence analysis. However, several isolates of P. boydii and S. apiospermum did not cluster within the 85% cutoff with the same species by rep-PCR. Although the identification of P. boydii and S. apiospermum was not correct, the semiautomated rep-PCR system is a promising tool for the identification of S. prolificans isolates.

  1. Molecular epidemiology of goat pox viruses.

    PubMed

    Roy, P; Jaisree, S; Balakrishnan, S; Senthilkumar, K; Mahaprabhu, R; Mishra, A; Maity, B; Ghosh, T K; Karmakar, A P

    2018-02-01

    Goat pox disease outbreaks were observed in different places affecting Black Bengal Goats in West Bengal (WB) and Tellicherry, Vembur and non-descriptive breeds in Tamil Nadu (TN) causing severe lesions and mortality up to 30%. Clinical specimens from all the outbreaks were screened by polymerase chain reaction followed by restriction fragment length polymorphism (PCR-RFLP) and confirmed the diseases as Goat Pox. Virus isolation in Vero cell line was done with randomly selected ten samples, cytopathic effects (CPE) characterized by syncytia and intracytoplasmic inclusion bodies were observed after several blind passages. Nucleotide sequence of complete p32 gene using randomly selected two isolates and three clinical specimens revealed presence of Goat pox virus (GTPV)-specific signature residues in all the sequences. Phylogenetic analysis using the present five sequences along with GenBank data of GTPV complete p32 gene sequences showed all the GTPV sequences cluster together except Pellor strain (NC004003) and FZ Chinese strain (KC951854). The five sequences either from WB or TN cluster more closely with GTPV isolates of Maharashtra state that were responsible for cross species outbreak of pox disease in both sheep (KF468759) and goats (KF468762) in India during the year 2010. All the Indian goat pox viruses, including the Mukteswar strain, isolated in 1946 and sequence reported in 2004 clustered together with the GTPVs causing the recent outbreaks. It was observed that GTPVs caused similar clinical manifestation irrespective of their geographical locations and breed characteristics, no variation observed among the Indian isolates based on p32 gene over the period of seventy years and disease outbreaks could not be observed or reported in vaccinated goats. © 2017 Blackwell Verlag GmbH.

  2. A national study of the molecular epidemiology of HIV-1 in Australia 2005-2012.

    PubMed

    Castley, Alison; Sawleshwarkar, Shailendra; Varma, Rick; Herring, Belinda; Thapa, Kiran; Dwyer, Dominic; Chibo, Doris; Nguyen, Nam; Hawke, Karen; Ratcliff, Rodney; Garsia, Roger; Kelleher, Anthony; Nolan, David

    2017-01-01

    Rates of new HIV-1 diagnoses are increasing in Australia, with evidence of an increasing proportion of non-B HIV-1 subtypes reflecting a growing impact of migration and travel. The present study aims to define HIV-1 subtype diversity patterns and investigate possible HIV-1 transmission networks within Australia. The Australian Molecular Epidemiology Network (AMEN) HIV collaborating sites in Western Australia, South Australia, Victoria, Queensland and western Sydney (New South Wales), provided baseline HIV-1 partial pol sequence, age and gender information for 4,873 patients who had genotypes performed during 2005-2012. HIV-1 phylogenetic analyses utilised MEGA V6, with a stringent classification of transmission pairs or clusters (bootstrap ≥98%, genetic distance ≤1.5% from at least one other sequence in the cluster). HIV-1 subtype B represented 74.5% of the 4,873 sequences (WA 59%, SA 68.4%, w-Syd 73.8%, Vic 75.6%, Qld 82.1%), with similar proportion of transmission pairs and clusters found in the B and non-B cohorts (23% vs 24.5% of sequences, p = 0.3). Significantly more subtype B clusters were comprised of ≥3 sequences compared with non-B clusters (45.0% vs 24.0%, p = 0.021) and significantly more subtype B pairs and clusters were male-only (88% compared to 53% CRF01_AE and 17% subtype C clusters). Factors associated with being in a cluster of any size included; being sequenced in a more recent time period (p<0.001), being younger (p<0.001), being male (p = 0.023) and having a B subtype (p = 0.02). Being in a larger cluster (>3) was associated with being sequenced in a more recent time period (p = 0.05) and being male (p = 0.008). This nationwide HIV-1 study of 4,873 patient sequences highlights the increased diversity of HIV-1 subtypes within the Australian epidemic, as well as differences in transmission networks associated with these HIV-1 subtypes. These findings provide epidemiological insights not readily available using standard surveillance methods and can inform the development of effective public health strategies in the current paradigm of HIV prevention in Australia.

  3. Complex spatiotemporal evolution of the 2008 Mw 4.9 Mogul earthquake swarm (Reno, Nevada): Interplay of fluid and faulting

    NASA Astrophysics Data System (ADS)

    Ruhl, C. J.; Abercrombie, R. E.; Smith, K. D.; Zaliapin, I.

    2016-11-01

    After approximately 2 months of swarm-like earthquakes in the Mogul neighborhood of west Reno, NV, seismicity rates and event magnitudes increased over several days culminating in an Mw 4.9 dextral strike-slip earthquake on 26 April 2008. Although very shallow, the Mw 4.9 main shock had a different sense of slip than locally mapped dip-slip surface faults. We relocate 7549 earthquakes, calculate 1082 focal mechanisms, and statistically cluster the relocated earthquake catalog to understand the character and interaction of active structures throughout the Mogul, NV earthquake sequence. Rapid temporary instrument deployment provides high-resolution coverage of microseismicity, enabling a detailed analysis of swarm behavior and faulting geometry. Relocations reveal an internally clustered sequence in which foreshocks evolved on multiple structures surrounding the eventual main shock rupture. The relocated seismicity defines a fault-fracture mesh and detailed fault structure from approximately 2-6 km depth on the previously unknown Mogul fault that may be an evolving incipient strike-slip fault zone. The seismicity volume expands before the main shock, consistent with pore pressure diffusion, and the aftershock volume is much larger than is typical for an Mw 4.9 earthquake. We group events into clusters using space-time-magnitude nearest-neighbor distances between events and develop a cluster criterion through randomization of the relocated catalog. Identified clusters are largely main shock-aftershock sequences, without evidence for migration, occurring within the diffuse background seismicity. The migration rate of the largest foreshock cluster and simultaneous background events is consistent with it having triggered, or having been triggered by, an aseismic slip event.

  4. Application of Geostatistical Methods and Machine Learning for spatio-temporal Earthquake Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Schaefer, A. M.; Daniell, J. E.; Wenzel, F.

    2014-12-01

    Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.

  5. Quantiprot - a Python package for quantitative analysis of protein sequences.

    PubMed

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  6. Transcription of Two Adjacent Carbohydrate Utilization Gene Clusters in Bifidobacterium breve UCC2003 Is Controlled by LacI- and Repressor Open Reading Frame Kinase (ROK)-Type Regulators

    PubMed Central

    O'Connell, Kerry Joan; O'Connell Motherway, Mary; Liedtke, Andrea; Fitzgerald, Gerald F.; Ross, R. Paul; Stanton, Catherine; Zomer, Aldert

    2014-01-01

    Members of the genus Bifidobacterium are commonly found in the gastrointestinal tracts of mammals, including humans, where their growth is presumed to be dependent on various diet- and/or host-derived carbohydrates. To understand transcriptional control of bifidobacterial carbohydrate metabolism, we investigated two genetic carbohydrate utilization clusters dedicated to the metabolism of raffinose-type sugars and melezitose. Transcriptomic and gene inactivation approaches revealed that the raffinose utilization system is positively regulated by an activator protein, designated RafR. The gene cluster associated with melezitose metabolism was shown to be subject to direct negative control by a LacI-type transcriptional regulator, designated MelR1, in addition to apparent indirect negative control by means of a second LacI-type regulator, MelR2. In silico analysis, DNA-protein interaction, and primer extension studies revealed the MelR1 and MelR2 operator sequences, each of which is positioned just upstream of or overlapping the correspondingly regulated promoter sequences. Similar analyses identified the RafR binding operator sequence located upstream of the rafB promoter. This study indicates that transcriptional control of gene clusters involved in carbohydrate metabolism in bifidobacteria is subject to conserved regulatory systems, representing either positive or negative control. PMID:24705323

  7. Sequence Similarity of Clostridium difficile Strains by Analysis of Conserved Genes and Genome Content Is Reflected by Their Ribotype Affiliation

    PubMed Central

    Kurka, Hedwig; Ehrenreich, Armin; Ludwig, Wolfgang; Monot, Marc; Rupnik, Maja; Barbut, Frederic; Indra, Alexander; Dupuy, Bruno; Liebl, Wolfgang

    2014-01-01

    PCR-ribotyping is a broadly used method for the classification of isolates of Clostridium difficile, an emerging intestinal pathogen, causing infections with increased disease severity and incidence in several European and North American countries. We have now carried out clustering analysis with selected genes of numerous C. difficile strains as well as gene content comparisons of their genomes in order to broaden our view of the relatedness of strains assigned to different ribotypes. We analyzed the genomic content of 48 C. difficile strains representing 21 different ribotypes. The calculation of distance matrix-based dendrograms using the neighbor joining method for 14 conserved genes (standard phylogenetic marker genes) from the genomes of the C. difficile strains demonstrated that the genes from strains with the same ribotype generally clustered together. Further, certain ribotypes always clustered together and formed ribotype groups, i.e. ribotypes 078, 033 and 126, as well as ribotypes 002 and 017, indicating their relatedness. Comparisons of the gene contents of the genomes of ribotypes that clustered according to the conserved gene analysis revealed that the number of common genes of the ribotypes belonging to each of these three ribotype groups were very similar for the 078/033/126 group (at most 69 specific genes between the different strains with the same ribotype) but less similar for the 002/017 group (86 genes difference). It appears that the ribotype is indicative not only of a specific pattern of the amplified 16S–23S rRNA intergenic spacer but also reflects specific differences in the nucleotide sequences of the conserved genes studied here. It can be anticipated that the sequence deviations of more genes of C. difficile strains are correlated with their PCR-ribotype. In conclusion, the results of this study corroborate and extend the concept of clonal C. difficile lineages, which correlate with ribotypes affiliation. PMID:24482682

  8. [Chlorobaculum macestae sp. nov., a new green sulfur bacterium].

    PubMed

    Koppen, O I; Berg, I A; Lebedeva, N V; Taisova, A S; Kolganova, T V; Slobodova, N V; Bulygina, E S; Turova, T P; Ivanovskiĭ, R N

    2008-01-01

    The investigated green sulfur bacterium, strain M, was isolated from a sulfidic spring on the Black Sea Coast of the Caucasus. The cells of strain M are straight or curved rods 0.6-0.9 x 1.8-4.2 microm in size. According to the cell wall structure, the bacteria are gram-negative. Chlorosomes are located along the cell periphery. Strain M is an obligate anaerobe capable of photoautotrophic growth on sulfide, thiosulfate, and H2. It utilizes ammonium, urea, casein hydrolysate, and N2 as nitrogen sources and sulfide, thiosulfate, and elemental sulfur as sulfur sources. Bacteriochlorophyll c and the carotenoid chlorobactene are the main pigments. The optimal growth temperature is 25-28 degrees C; the optimal pH is 6.8. The strain does not require NaCl. Vitamin B12 stimulates growth. The content of the G+C base pairs in the DNA of strain M is 58.3 mol %. In the phylogenetic tree constructed on the basis of analysis of nucleotide sequences of 16S rRNA genes, strain M forms a separate branch, which occupies an intermediate position between the phylogenetic cluster containing representatives of the genus Chlorobaculum (94.9-96.8%) and the cluster containing species of the genus Chlorobium (94.1-96.5%). According to the results of analysis of the amino acid sequence corresponding to the fmo gene, strain M represents a branch which, unlike that in the "ribosomal" tree, falls into the cluster of the genus Chlorobaculum (95.8-97.2%). Phylogenetic analysis of the amino acid sequence corresponding to the nifH gene placed species of the genera Chlorobaculum and Chlorobium into a single cluster, whereas strain M formed a separate branch. The results obtained allow us to describe strain M as a new species of the genus Chlorobaculum. Chlorobaculum macestae sp. nov.

  9. WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

    PubMed

    Sharma, Parichit; Mantri, Shrikant S

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.

  10. WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

    PubMed Central

    Sharma, Parichit; Mantri, Shrikant S.

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410

  11. Biosynthetic Investigations of Lactonamycin and Lactonamycin Z: Cloning of the Biosynthetic Gene Clusters and Discovery of an Unusual Starter Unit▿ †

    PubMed Central

    Zhang, Xiujun; Alemany, Lawrence B.; Fiedler, Hans-Peter; Goodfellow, Michael; Parry, Ronald J.

    2008-01-01

    The antibiotics lactonamycin and lactonamycin Z provide attractive leads for antibacterial drug development. Both antibiotics contain a novel aglycone core called lactonamycinone. To gain insight into lactonamycinone biosynthesis, cloning and precursor incorporation experiments were undertaken. The lactonamycin gene cluster was initially cloned from Streptomyces rishiriensis. Sequencing of ca. 61 kb of S. rishiriensis DNA revealed the presence of 57 open reading frames. These included genes coding for the biosynthesis of l-rhodinose, the sugar found in lactonamycin, and genes similar to those in the tetracenomycin biosynthetic gene cluster. Since lactonamycin production by S. rishiriensis could not be sustained, additional proof for the identity of the S. rishiriensis cluster was obtained by cloning the lactonamycin Z gene cluster from Streptomyces sanglieri. Partial sequencing of the S. sanglieri cluster revealed 15 genes that exhibited a very high degree of similarity to genes within the lactonamycin cluster, as well as an identical organization. Double-crossover disruption of one gene in the S. sanglieri cluster abolished lactonamycin Z production, and production was restored by complementation. These results confirm the identity of the genetic locus cloned from S. sanglieri and indicate that the highly similar locus in S. rishiriensis encodes lactonamycin biosynthetic genes. Precursor incorporation experiments with S. sanglieri revealed that lactonamycinone is biosynthesized in an unusual manner whereby glycine or a glycine derivative serves as a starter unit that is extended by nine acetate units. Analysis of the gene clusters and of the precursor incorporation data suggested a hypothetical scheme for lactonamycinone biosynthesis. PMID:18070976

  12. Genomic characterization and taxonomic position of a rhabdovirus from a hybrid snakehead.

    PubMed

    Zeng, Weiwei; Wang, Qing; Wang, Yingying; Liu, Cun; Liang, Hongru; Fang, Xiang; Wu, Shuqin

    2014-09-01

    A new rhabdovirus, tentatively designated as hybrid snakehead rhabdovirus C1207 (HSHRV-C1207), was first isolated from a moribund hybrid snakehead (Channa maculata×Channa argus) in China. We present the complete genome sequence of HSHRV-C1207 and a comprehensive sequence comparison between HSHRV-C1207 and other rhabdoviruses. Sequence alignment and phylogenetic analysis revealed that HSHRV-C1207 shared the highest degree of homology with Monopterus albus rhabdovirus and Siniperca chuatsi rhabdovirus. All three viruses clustered into a single group that was distinct from the recognized genera in the family Rhabdoviridae. Our analysis suggests that HSHRV-C1207, as well as MARV and SCRV, should be assigned to a new rhabdovirus genus.

  13. Identification of Novel Sequence Types among Staphylococcus haemolyticus Isolated from Variety of Infections in India.

    PubMed

    Panda, Sasmita; Jena, Smrutiti; Sharma, Savitri; Dhawan, Benu; Nath, Gopal; Singh, Durg Vijai

    2016-01-01

    The aim of this study was to determine sequence types of 34 S. haemolyticus strains isolated from a variety of infections between 2013 and 2016 in India by MLST. The MEGA5.2 software was used to align and compare the nucleotide sequences. The advanced cluster analysis was performed to define the clonal complexes. MLST analysis showed 24 new sequence types (ST) among S. haemolyticus isolates, irrespective of sources and place of isolation. The finding of this study allowed to set up an MLST database on the PubMLST.org website using BIGSdb software and made available at http://pubmlst.org/shaemolyticus/. The data of this study thus suggest that MLST can be used to study population structure and diversity among S. haemolyticus isolates.

  14. Identification of Novel Sequence Types among Staphylococcus haemolyticus Isolated from Variety of Infections in India

    PubMed Central

    Panda, Sasmita; Jena, Smrutiti; Sharma, Savitri; Dhawan, Benu; Nath, Gopal

    2016-01-01

    The aim of this study was to determine sequence types of 34 S. haemolyticus strains isolated from a variety of infections between 2013 and 2016 in India by MLST. The MEGA5.2 software was used to align and compare the nucleotide sequences. The advanced cluster analysis was performed to define the clonal complexes. MLST analysis showed 24 new sequence types (ST) among S. haemolyticus isolates, irrespective of sources and place of isolation. The finding of this study allowed to set up an MLST database on the PubMLST.org website using BIGSdb software and made available at http://pubmlst.org/shaemolyticus/. The data of this study thus suggest that MLST can be used to study population structure and diversity among S. haemolyticus isolates. PMID:27824930

  15. Transcriptome and gene expression analysis during flower blooming in Rosa chinensis 'Pallida'.

    PubMed

    Yan, Huijun; Zhang, Hao; Chen, Min; Jian, Hongying; Baudino, Sylvie; Caissard, Jean-Claude; Bendahmane, Mohammed; Li, Shubin; Zhang, Ting; Zhou, Ningning; Qiu, Xianqin; Wang, Qigang; Tang, Kaixue

    2014-04-25

    Rosa chinensis 'Pallida' (Rosa L.) is one of the most important ancient rose cultivars originating from China. It contributed the 'tea scent' trait to modern roses. However, little information is available on the gene regulatory networks involved in scent biosynthesis and metabolism in Rosa. In this study, the transcriptome of R. chinensis 'Pallida' petals at different developmental stages, from flower buds to senescent flowers, was investigated using Illumina sequencing technology. De novo assembly generated 89,614 clusters with an average length of 428bp. Based on sequence similarity search with known proteins, 62.9% of total clusters were annotated. Out of these annotated transcripts, 25,705 and 37,159 sequences were assigned to gene ontology and clusters of orthologous groups, respectively. The dataset provides information on transcripts putatively associated with known scent metabolic pathways. Digital gene expression (DGE) was obtained using RNA samples from flower bud, open flower and senescent flower stages. Comparative DGE and quantitative real time PCR permitted the identification of five transcripts encoding proteins putatively associated with scent biosynthesis in roses. The study provides a foundation for scent-related gene discovery in roses. Copyright © 2014. Published by Elsevier B.V.

  16. Tales of diversity: Genomic and morphological characteristics of forty-six Arthrobacter phages

    PubMed Central

    Adair, Tamarah L.; Afram, Patricia; Allen, Katherine G.; Archambault, Megan L.; Aziz, Rahat M.; Bagnasco, Filippa G.; Ball, Sarah L.; Barrett, Natalie A.; Benjamin, Robert C.; Blasi, Christopher J.; Borst, Katherine; Braun, Mary A.; Broomell, Haley; Brown, Conner B.; Brynell, Zachary S.; Bue, Ashley B.; Burke, Sydney O.; Casazza, William; Cautela, Julia A.; Chen, Kevin; Chimalakonda, Nitish S.; Chudoff, Dylan; Connor, Jade A.; Cross, Trevor S.; Curtis, Kyra N.; Dahlke, Jessica A.; Deaton, Bethany M.; Degroote, Sarah J.; DeNigris, Danielle M.; DeRuff, Katherine C.; Dolan, Milan; Dunbar, David; Egan, Marisa S.; Evans, Daniel R.; Fahnestock, Abby K.; Farooq, Amal; Finn, Garrett; Fratus, Christopher R.; Gaffney, Bobby L.; Garlena, Rebecca A.; Garrigan, Kelly E.; Gibbon, Bryan C.; Goedde, Michael A.; Guerrero Bustamante, Carlos A.; Harrison, Melinda; Hartwell, Megan C.; Heckman, Emily L.; Huang, Jennifer; Hughes, Lee E.; Hyduchak, Kathryn M.; Jacob, Aswathi E.; Kaku, Machika; Karstens, Allen W.; Kenna, Margaret A.; Khetarpal, Susheel; King, Rodney A.; Kobokovich, Amanda L.; Kolev, Hannah; Konde, Sai A.; Kriese, Elizabeth; Lamey, Morgan E.; Lantz, Carter N.; Lapin, Jonathan S.; Lawson, Temiloluwa O.; Lee, In Young; Lee, Scott M.; Lee-Soety, Julia Y.; Lehmann, Emily M.; London, Shawn C.; Lopez, A. Javier; Lynch, Kelly C.; Mageeney, Catherine M.; Martynyuk, Tetyana; Mathew, Kevin J.; Mavrich, Travis N.; McDaniel, Christopher M.; McDonald, Hannah; McManus, C. Joel; Medrano, Jessica E.; Mele, Francis E.; Menninger, Jennifer E.; Miller, Sierra N.; Minick, Josephine E.; Nabua, Courtney T.; Napoli, Caroline K.; Nkangabwa, Martha; Oates, Elizabeth A.; Ott, Cassandra T.; Pellerino, Sarah K.; Pinamont, William J.; Pirnie, Ross T.; Pizzorno, Marie C.; Plautz, Emilee J.; Pope, Welkin H.; Pruett, Katelyn M.; Rickstrew, Gabbi; Rimple, Patrick A.; Rinehart, Claire A.; Robinson, Kayla M.; Rose, Victoria A.; Russell, Daniel A.; Schick, Amelia M.; Schlossman, Julia; Schneider, Victoria M.; Sells, Chloe A.; Sieker, Jeremy W.; Silva, Morgan P.; Silvi, Marissa M.; Simon, Stephanie E.; Staples, Amanda K.; Steed, Isabelle L.; Stowe, Emily L.; Stueven, Noah A.; Swartz, Porter T.; Sweet, Emma A.; Sweetman, Abigail T.; Tender, Corrina; Terry, Katrina; Thomas, Chrystal; Thomas, Daniel S.; Thompson, Allison R.; Vanderveen, Lorianna; Varma, Rohan; Vaught, Hannah L.; Vo, Quynh D.; Vonberg, Zachary T.; Ware, Vassie C.; Warrad, Yasmene M.; Wathen, Kaitlyn E.; Weinstein, Jonathan L.; Wyper, Jacqueline F.; Yankauskas, Jakob R.; Zhang, Christine

    2017-01-01

    The vast bacteriophage population harbors an immense reservoir of genetic information. Almost 2000 phage genomes have been sequenced from phages infecting hosts in the phylum Actinobacteria, and analysis of these genomes reveals substantial diversity, pervasive mosaicism, and novel mechanisms for phage replication and lysogeny. Here, we describe the isolation and genomic characterization of 46 phages from environmental samples at various geographic locations in the U.S. infecting a single Arthrobacter sp. strain. These phages include representatives of all three virion morphologies, and Jasmine is the first sequenced podovirus of an actinobacterial host. The phages also span considerable sequence diversity, and can be grouped into 10 clusters according to their nucleotide diversity, and two singletons each with no close relatives. However, the clusters/singletons appear to be genomically well separated from each other, and relatively few genes are shared between clusters. Genome size varies from among the smallest of siphoviral phages (15,319 bp) to over 70 kbp, and G+C contents range from 45–68%, compared to 63.4% for the host genome. Although temperate phages are common among other actinobacterial hosts, these Arthrobacter phages are primarily lytic, and only the singleton Galaxy is likely temperate. PMID:28715480

  17. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment

    PubMed Central

    2013-01-01

    Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200

  18. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    PubMed

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.

  19. MICCA: a complete and accurate software for taxonomic profiling of metagenomic data.

    PubMed

    Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

    2015-05-19

    The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project.

  20. Quantifying humpback whale song sequences to understand the dynamics of song exchange at the ocean basin scale.

    PubMed

    Garland, Ellen C; Noad, Michael J; Goldizen, Anne W; Lilley, Matthew S; Rekdahl, Melinda L; Garrigue, Claire; Constantine, Rochelle; Daeschler Hauser, Nan; Poole, M Michael; Robbins, Jooke

    2013-01-01

    Humpback whales have a continually evolving vocal sexual display, or "song," that appears to undergo both evolutionary and "revolutionary" change. All males within a population adhere to the current content and arrangement of the song. Populations within an ocean basin share similarities in their songs; this sharing is complex as multiple variations of the song (song types) may be present within a region at any one time. To quantitatively investigate the similarity of song types, songs were compared at both the individual singer and population level using the Levenshtein distance technique and cluster analysis. The highly stereotyped sequences of themes from the songs of 211 individuals from populations within the western and central South Pacific region from 1998 through 2008 were grouped together based on the percentage of song similarity, and compared to qualitatively assigned song types. The analysis produced clusters of highly similar songs that agreed with previous qualitative assignments. Each cluster contained songs from multiple populations and years, confirming the eastward spread of song types and their progressive evolution through the study region. Quantifying song similarity and exchange will assist in understanding broader song dynamics and contribute to the use of vocal displays as population identifiers.

  1. MICCA: a complete and accurate software for taxonomic profiling of metagenomic data

    PubMed Central

    Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

    2015-01-01

    The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project. PMID:25988396

  2. Amplified fragment length polymorphism of Streptococcus suis strains correlates with their profile of virulence-associated genes and clinical background.

    PubMed

    Rehm, Thomas; Baums, Christoph G; Strommenger, Birgit; Beyerbach, Martin; Valentin-Weigand, Peter; Goethe, Ralph

    2007-01-01

    Amplified fragment length polymorphism (AFLP) typing was applied to 116 Streptococcus suis isolates with different clinical backgrounds (invasive/pneumonia/carrier/human) and with known profiles of virulence-associated genes (cps1, -2, -7 and -9, as well as mrp, epf and sly). A dendrogram was generated that allowed identification of two clusters (A and C) with different subclusters (A1, A2, C1 and C2) and two heterogeneous groups of strains (B and D). For comparison, three strains from each AFLP subcluster and group were subjected to multilocus sequence typing (MLST) analysis. The closest relationship and lowest diversity were found for patterns clustering within AFLP subcluster A1, which corresponded with sequence type (ST) complex 1. Strains within subcluster A1 were mainly invasive cps1 and mrp+ epf+ (or epf*) sly+ cps2+ strains of porcine or human origin. A new finding of this study was the clustering of invasive mrp* cps9 isolates within subcluster A2. MLST analysis suggested that A2 correlates with a single ST complex (ST87). In contrast to A1 and A2, subclusters C1 and C2 contained mainly pneumonia isolates of genotype cps7 or cps2 and epf- sly-. In conclusion, this study demonstrates that AFLP allows identification of clusters of S. suis strains with clinical relevance.

  3. R/S analysis of reaction time in Neuron Type Test for human activity in civil aviation

    NASA Astrophysics Data System (ADS)

    Zhang, Hong-Yan; Kang, Ming-Cui; Li, Jing-Qiang; Liu, Hai-Tao

    2017-03-01

    Human factors become the most serious problem leading to accidents of civil aviation, which stimulates the design and analysis of Neuron Type Test (NTT) system to explore the intrinsic properties and patterns behind the behaviors of professionals and students in civil aviation. In the experiment, normal practitioners' reaction time sequences, collected from NTT, exhibit log-normal distribution approximately. We apply the χ2 test to compute the goodness-of-fit by transforming the time sequence with Box-Cox transformation to cluster practitioners. The long-term correlation of different individual practitioner's time sequence is represented by the Hurst exponent via Rescaled Range Analysis, also named by Range/Standard deviation (R/S) Analysis. The different Hurst exponent suggests the existence of different collective behavior and different intrinsic patterns of human factors in civil aviation.

  4. acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

    DOE PAGES

    Lux, Markus; Kruger, Jan; Rinke, Christian; ...

    2016-12-20

    A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less

  5. acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lux, Markus; Kruger, Jan; Rinke, Christian

    A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less

  6. Spectroscopic Confirmation of a Massive Red-sequence Selected Galaxy Cluster at Z=1.34 in the SpARCS-South Cluster Survey

    NASA Technical Reports Server (NTRS)

    Wilson, Gillian; Demarco, Ricardo; Muzzin, Adam; Yee, H.K.C.; Lacy, Mark; Surace, Jason; Gilbank, David; Blindert, Kris; Hoekstra, Henk; Majumdar, Subhabrata; hide

    2008-01-01

    The Spitzer Adaptation of the Red-sequence Cluster Survey (SpARCS) is a z'-passband imaging survey, consisting of deep (z' approx. 24 AB) observations made from both hemispheres using the CFHT 3.6m and CTIO 4m telescopes. The survey was designed with the primary aim of detecting galaxy clusters at z > 1. In tandem with pre-existing 3.6 micron observations from the Spitzer Space Telescope SWIRE Legacy Survey, SpARCS detects clusters using an infrared adaptation of the two-filter red-sequence cluster technique. The total effective area of the SpARCS cluster survey is 41.9 sq deg. In this paper, we provide an overview of the 13.6 sq deg Southern CTIO/MOSAICII observations. The 28.3 sq deg Northern CFHT/MegaCam observations are summarized in a companion paper by Muzzin et al. (2008a). In this paper, we also report spectroscopic confirmation of SpARCS J003550-431224, a very rich galaxy cluster at z = 1.335, discovered in the ELAIS-S1 field. To date, this is the highest spectroscopically confirmed redshift for a galaxy cluster discovered using the red-sequence technique. Based on nine confirmed members, SpARCS J003550-431224 has a preliminary velocity dispersion of 1050+/-230 km/s. With its proven capability for efficient cluster detection, SpARCS is a demonstration that we have entered an era of large, homogeneously-selected z > 1 cluster surveys.

  7. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences.

    PubMed

    Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques

    2008-01-01

    This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.

  8. Nucleotide sequence analysis of a DNA region involved in capsular polysaccharide biosynthesis reveals the molecular basis of the nontypeability of two Actinobacillus pleuropneumoniae isolates.

    PubMed

    Ito, Hiroya; Ogawa, Torata; Fukamizu, Dai; Morinaga, Yuiko; Kusumoto, Masahiro

    2016-11-01

    The aim of our study was to reveal the molecular basis of the serologic nontypeability of 2 Actinobacillus pleuropneumoniae field isolates. Nine field strains of A. pleuropneumoniae, the causative agent of porcine pleuropneumonia, were isolated from pigs raised on the same farm and sent to our diagnostic laboratory for serotyping. Seven of the 9 strains were identified as serovar 15 strains by immunodiffusion tests. However, 2 strains, designated FH24-2 and FH24-5, could not be serotyped with antiserum prepared against serovars 1-15. Strain FH24-5 showed positive results in 2 serovar 15-specific PCR tests, whereas strain FH24-2 was only positive in 1 of the 2 PCR tests. The nucleotide sequence analysis of gene clusters involved in capsular polysaccharide biosynthesis of the 2 nontypeable strains revealed that both had been rendered nontypeable by the action of ISApl1, a transposable element of A. pleuropneumoniae belonging to the IS30 family. The results showed that ISApl1 of A. pleuropneumoniae can interfere with both the serologic and molecular typing methods, and that nucleotide sequence analysis across the capsular gene clusters is the best means of determining the cause of serologic nontypeability in A. pleuropneumoniae. © 2016 The Author(s).

  9. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  10. Phylogenetic and Pathotypic Characterization of Newcastle Disease Viruses Circulating in West Africa and Efficacy of a Current Vaccine

    PubMed Central

    Samuel, Arthur; Nayak, Baibaswata; Paldurai, Anandan; Xiao, Sa; Aplogan, Gilbert L.; Awoume, Kodzo A.; Webby, Richard J.; Ducatez, Mariette F.; Collins, Peter L.

    2013-01-01

    Newcastle disease (ND) is a deadly avian disease worldwide. In Africa, ND is enzootic and causes large economic losses, but little is known about the Newcastle disease virus (NDV) strains circulating in African countries. In this study, 27 NDV isolates collected from apparently healthy chickens in live-bird markets of the West African countries Benin and Togo in 2009 were characterized. All isolates had polybasic fusion (F)-protein cleavage sites and were shown to be highly virulent in standard pathogenicity assays. Infection of 2-week-old chickens with two of the isolates resulted in 100% mortality within 4 days. Phylogenetic analysis of the 27 isolates based on a partial F-protein gene sequence identified three clusters: one containing all the isolates from Togo and one from Benin (cluster 2), one containing most isolates from Benin (cluster 3), and an outlier isolate from Benin (cluster 1). All the three clusters are related to genotype VII strains of NDV. In addition, the cluster of viruses from Togo contained a recently identified 6-nucleotide insert between the hemagglutinin-neuraminidase (HN) and large polymerase (L) genes in a complete genome of an NDV isolate from this geographical region. Multiple strains that include this novel element suggest local emergence of a new genome length class. These results reveal genetic diversity within and among local NDV populations in Africa. Sequence analysis showed that the F and HN proteins of six West African isolates share 83.2 to 86.6% and 86.5 to 87.9% identities, respectively, with vaccine strain LaSota, indicative of considerable diversity. A vaccine efficacy study showed that the LaSota vaccine protected birds from morbidity and mortality but did not prevent shedding of West African challenge viruses. PMID:23254128

  11. Not-so-simple stellar populations in the intermediate-age Large Magellanic Cloud star clusters NGC 1831 and NGC 1868

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Chengyuan; De Grijs, Richard; Deng, Licai, E-mail: joshuali@pku.edu.cn, E-mail: grijs@pku.edu.cn

    2014-04-01

    Using a combination of high-resolution Hubble Space Telescope/Wide-Field and Planetary Camera-2 observations, we explore the physical properties of the stellar populations in two intermediate-age star clusters, NGC 1831 and NGC 1868, in the Large Magellanic Cloud based on their color-magnitude diagrams. We show that both clusters exhibit extended main-sequence turn offs. To explain the observations, we consider variations in helium abundance, binarity, age dispersions, and the fast rotation of the clusters' member stars. The observed narrow main sequence excludes significant variations in helium abundance in both clusters. We first establish the clusters' main-sequence binary fractions using the bulk of themore » clusters' main-sequence stellar populations ≳ 1 mag below their turn-offs. The extent of the turn-off regions in color-magnitude space, corrected for the effects of binarity, implies that age spreads of order 300 Myr may be inferred for both clusters if the stellar distributions in color-magnitude space were entirely due to the presence of multiple populations characterized by an age range. Invoking rapid rotation of the population of cluster members characterized by a single age also allows us to match the observed data in detail. However, when taking into account the extent of the red clump in color-magnitude space, we encounter an apparent conflict for NGC 1831 between the age dispersion derived from that based on the extent of the main-sequence turn off and that implied by the compact red clump. We therefore conclude that, for this cluster, variations in stellar rotation rate are preferred over an age dispersion. For NGC 1868, both models perform equally well.« less

  12. Genome-wide SNP discovery and population structure analysis in pepper (Capsicum annuum) using genotyping by sequencing.

    PubMed

    Taranto, F; D'Agostino, N; Greco, B; Cardi, T; Tripodi, P

    2016-11-21

    Knowledge on population structure and genetic diversity in vegetable crops is essential for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Herein we used the GBS approach for the genome-wide identification of SNPs in a collection of Capsicum spp. accessions and for the assessment of the level of genetic diversity in a subset of 222 cultivated pepper (Capsicum annum) genotypes. GBS analysis generated a total of 7,568,894 master tags, of which 43.4% uniquely aligned to the reference genome CM334. A total of 108,591 SNP markers were identified, of which 105,184 were in C. annuum accessions. In order to explore the genetic diversity of C. annuum and to select a minimal core set representing most of the total genetic variation with minimum redundancy, a subset of 222 C. annuum accessions were analysed using 32,950 high quality SNPs. Based on Bayesian and Hierarchical clustering it was possible to divide the collection into three clusters. Cluster I had the majority of varieties and landraces mainly from Southern and Northern Italy, and from Eastern Europe, whereas clusters II and III comprised accessions of different geographical origins. Considering the genome-wide genetic variation among the accessions included in cluster I, a second round of Bayesian (K = 3) and Hierarchical (K = 2) clustering was performed. These analysis showed that genotypes were grouped not only based on geographical origin, but also on fruit-related features. GBS data has proven useful to assess the genetic diversity in a collection of C. annuum accessions. The high number of SNP markers, uniformly distributed on the 12 chromosomes, allowed the accessions to be distinguished according to geographical origin and fruit-related features. SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs.

  13. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  14. IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, I-Min; Chu, Ken; Ratner, Anna

    2014-10-28

    In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorwaymore » to a new era in the discovery of novel molecules.« less

  15. Evaluation of SNP Data from the Malus Infinium Array Identifies Challenges for Genetic Analysis of Complex Genomes of Polyploid Origin.

    PubMed

    Troggio, Michela; Surbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James

    2013-01-01

    High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the 'Golden Delicious' genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.

  16. An RNA-Seq Transcriptome Analysis of Orthophosphate-Deficient White Lupin Reveals Novel Insights into Phosphorus Acclimation in Plants1[W][OA

    PubMed Central

    O’Rourke, Jamie A.; Yang, S. Samuel; Miller, Susan S.; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W.; Vance, Carroll P.

    2013-01-01

    Phosphorus, in its orthophosphate form (Pi), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to Pi deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in Pi-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to Pi supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to Pi deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to Pi deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the Pi status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in Pi deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to Pi deficiency. PMID:23197803

  17. An RNA-Seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants.

    PubMed

    O'Rourke, Jamie A; Yang, S Samuel; Miller, Susan S; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W; Vance, Carroll P

    2013-02-01

    Phosphorus, in its orthophosphate form (P(i)), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to P(i) deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in P(i)-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to P(i) supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to P(i) deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to P(i) deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the P(i) status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in P(i) deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to P(i) deficiency.

  18. PCR detection of uncultured rumen bacteria.

    PubMed

    Rosero, Jaime A; Strosová, Lenka; Mrázek, Jakub; Fliegerová, Kateřina; Kopečný, Jan

    2012-07-01

    16S rRNA sequences of ruminal uncultured bacterial clones from public databases were phylogenetically examined. The sequences were found to form two unique clusters not affiliated with any known bacterial species: cluster of unidentified sequences of free floating rumen fluid uncultured bacteria (FUB) and cluster of unidentified sequences of bacteria associated with rumen epithelium (AUB). A set of PCR primers targeting 16S rRNA of ruminal free uncultured bacteria and rumen epithelium adhering uncultured bacteria was designed based on these sequences. FUB primers were used for relative quantification of uncultured bacteria in ovine rumen samples. The effort to increase the population size of FUB group has been successful in sulfate reducing broth and culture media supplied with cellulose.

  19. Systematic sequencing of mRNA from the Antarctic krill (Euphausia superba) and first tissue specific transcriptional signature

    PubMed Central

    De Pittà, Cristiano; Bertolucci, Cristiano; Mazzotta, Gabriella M; Bernante, Filippo; Rizzo, Giorgia; De Nardi, Barbara; Pallavicini, Alberto; Lanfranchi, Gerolamo; Costa, Rodolfo

    2008-01-01

    Background Little is known about the genome sequences of Euphausiacea (krill) although these crustaceans are abundant components of the pelagic ecosystems in all oceans and used for aquaculture and pharmaceutical industry. This study reports the results of an expressed sequence tag (EST) sequencing project from different tissues of Euphausia superba (the Antarctic krill). Results We have constructed and sequenced five cDNA libraries from different Antarctic krill tissues: head, abdomen, thoracopods and photophores. We have identified 1.770 high-quality ESTs which were assembled into 216 overlapping clusters and 801 singletons resulting in a total of 1.017 non-redundant sequences. Quantitative RT-PCR analysis was performed to quantify and validate the expression levels of ten genes presenting different EST countings in krill tissues. In addition, bioinformatic screening of the non-redundant E. superba sequences identified 69 microsatellite containing ESTs. Clusters, consensuses and related similarity and gene ontology searches were organized in a dedicated E. superba database . Conclusion We defined the first tissue transcriptional signatures of E. superba based on functional categorization among the examined tissues. The analyses of annotated transcripts showed a higher similarity with genes from insects with respect to Malacostraca possibly as an effect of the limited number of Malacostraca sequences in the public databases. Our catalogue provides for the first time a genomic tool to investigate the biology of the Antarctic krill. PMID:18226200

  20. A novel adenovirus of Western lowland gorillas (Gorilla gorilla gorilla)

    PubMed Central

    2010-01-01

    Adenoviruses (AdV) broadly infect vertebrate hosts including a variety of primates. We identified a novel AdV in the feces of captive gorillas by isolation in cell culture, electron microscopy and PCR. From the supernatants of infected cultures we amplified DNA polymerase (DPOL), preterminal protein (pTP) and hexon gene sequences with generic pan primate AdV PCR assays. The sequences in-between were amplified by long-distance PCRs of 2 - 10 kb length, resulting in a final sequence of 15.6 kb. Phylogenetic analysis placed the novel gorilla AdV into a cluster of primate AdVs belonging to the species Human adenovirus B (HAdV-B). Depending on the analyzed gene, its position within the cluster was variable. To further elucidate its origin, feces samples of wild gorillas were analyzed. AdV hexon sequences were detected which are indicative for three distinct and novel gorilla HAdV-B viruses, among them a virus nearly identical to the novel AdV isolated from captive gorillas. This shows that the discovered virus is a member of a group of HAdV-B viruses that naturally infect gorillas. The mixed phylogenetic clusters of gorilla, chimpanzee, bonobo and human AdVs within the HAdV-B species indicate that host switches may have been a component of the evolution of human and non-human primate HAdV-B viruses. PMID:21054831

  1. A novel adenovirus of Western lowland gorillas (Gorilla gorilla gorilla).

    PubMed

    Wevers, Diana; Leendertz, Fabian H; Scuda, Nelly; Boesch, Christophe; Robbins, Martha M; Head, Josephine; Ludwig, Carsten; Kühn, Joachim; Ehlers, Bernhard

    2010-11-05

    Adenoviruses (AdV) broadly infect vertebrate hosts including a variety of primates. We identified a novel AdV in the feces of captive gorillas by isolation in cell culture, electron microscopy and PCR. From the supernatants of infected cultures we amplified DNA polymerase (DPOL), preterminal protein (pTP) and hexon gene sequences with generic pan primate AdV PCR assays. The sequences in-between were amplified by long-distance PCRs of 2-10 kb length, resulting in a final sequence of 15.6 kb. Phylogenetic analysis placed the novel gorilla AdV into a cluster of primate AdVs belonging to the species Human adenovirus B (HAdV-B). Depending on the analyzed gene, its position within the cluster was variable. To further elucidate its origin, feces samples of wild gorillas were analyzed. AdV hexon sequences were detected which are indicative for three distinct and novel gorilla HAdV-B viruses, among them a virus nearly identical to the novel AdV isolated from captive gorillas. This shows that the discovered virus is a member of a group of HAdV-B viruses that naturally infect gorillas. The mixed phylogenetic clusters of gorilla, chimpanzee, bonobo and human AdVs within the HAdV-B species indicate that host switches may have been a component of the evolution of human and non-human primate HAdV-B viruses.

  2. Does typing of Chlamydia trachomatis using housekeeping multilocus sequence typing reveal different sexual networks among heterosexuals and men who have sex with men?

    PubMed

    Versteeg, Bart; Bruisten, Sylvia M; van der Ende, Arie; Pannekoek, Yvonne

    2016-04-18

    Chlamydia trachomatis infections remain the most common bacterial sexually transmitted infection worldwide. To gain more insight into the epidemiology and transmission of C. trachomatis, several schemes of multilocus sequence typing (MLST) have been developed. We investigated the clustering of C. trachomatis strains derived from men who have sex with men (MSM) and heterosexuals using the MLST scheme based on 7 housekeeping genes (MLST-7) adapted for clinical specimens and a high-resolution MLST scheme based on 6 polymorphic genes, including ompA (hr-MLST-6). Specimens from 100 C. trachomatis infected men who have sex with men (MSM) and 100 heterosexual women were randomly selected from previous studies and sequenced. We adapted the MLST-7 scheme to a nested assay to be suitable for direct typing of clinical specimens. All selected specimens were typed using both the adapted MLST-7 scheme and the hr-MLST-6 scheme. Clustering of C. trachomatis strains derived from MSM and heterosexuals was assessed using minimum spanning tree analysis. Sufficient chlamydial DNA was present in 188 of the 200 (94 %) selected samples. Using the adapted MLST-7 scheme, full MLST profiles were obtained for 187 of 188 tested specimens resulting in a high success rate of 99.5 %. Of these 187 specimens, 91 (48.7 %) were from MSM and 96 (51.3 %) from heterosexuals. We detected 21 sequence types (STs) using the adapted MLST-7 and 79 STs using the hr-MLST-6 scheme. Minimum spanning tree analyses was used to examine the clustering of MLST-7 data, which showed no reflection of separate transmission in MSM and heterosexual hosts. Moreover, typing using the hr-MLST-6 scheme identified genetically related clusters within each of clusters that were identified by using the MLST-7 scheme. No distinct transmission of C. trachomatis could be observed in MSM and heterosexuals using the adapted MLST-7 scheme in contrast to using the hr-MLST-6. In addition, we compared clustering of both MLST schemes and demonstrated that typing using the hr-MLST-6 scheme is able to identify genetically related clusters of C. trachomatis strains within each of the clusters that were identified by using the MLST-7 scheme.

  3. [Molecular epidemiology and transmission of HIV-1 infection in Zhejiang province, 2015].

    PubMed

    Yang, J Z; Chen, W J; Zhang, W J; He, L; Zhang, J F; Pan, X H

    2017-11-10

    Objective: To understand the distribution of HIV-1 subtype diversity and its transmission characteristics in Zhejiang province. Methods: A total of 302 newly diagnosed HIV-1 positive patients were selected through stratified random sampling in Zhejiang in 2015. HIV-1 pol genes were sequenced successfully with reverse transcription PCR/nested PCR and phylogenetic analysis was conducted for 276 patients. Then a molecular epidemiologic study was performed combined with field epidemiological investigation. Results: Of 276 sequence samples analyzed, 122 CRF07_BC strains (44.2%), 103 CRF01_AE strains (37.3%), 17 CRF08_BC strains (6.1%), 9 B strains (3.2%), 6 CRF55_01B strains (2.2%), 5 C strains (1.8%), 1 CRF59_01B strain (0.4%), 1 CRF67_01B strain (0.4%), 1 A1 strain (0.4%), and 11 URFs strains (4.0%) were identified. Phylogenetic analysis revealed 16 clusters with only 15.1% (34/225) sequences involved among CRF07_BC and CRF01_AE strains. The clustered cases in MSM were higher than that in populations with other transmission routes. And clusters existed between the populations with different transmission routes. Conclusion: The major strains of HIV-1 in Zhejiang are CRF07_BC and CRF01_AE. The HIV subtypes showed more complexity in Zhejiang. It is necessary to strengthen the surveillance for HIV subtypes, carry out classified management and conduct effective prevention and control in the population at high risk.

  4. Routes of transmission during a nosocomial influenza A(H3N2) outbreak among geriatric patients and healthcare workers.

    PubMed

    Eibach, D; Casalegno, J-S; Bouscambert, M; Bénet, T; Regis, C; Comte, B; Kim, B-A; Vanhems, P; Lina, B

    2014-03-01

    Influenza presents a life-threatening infection for hospitalized geriatric patients, who might be nosocomially infected via healthcare workers (HCWs), other patients or visitors. In the 2011/2012 influenza season an influenza A(H3N2) outbreak occurred in the geriatric department at the Hôpital Edouard Herriot, Lyon. To clarify the transmission chain for this influenza A(H3N2) outbreak by sequence analysis and to identify preventive measures. Laboratory testing of patients with influenza-like illness in the acute care geriatric department revealed 22 cases of influenza between 19th February and 15th March 2012. Incidences for patients and HCWs were calculated and possible epidemiological links were analysed using a questionnaire. Neuraminidase and haemagglutinin genes of culture-positive samples and community influenza samples were sequenced and clustered to detect patients with identical viral strains. Sixteen patients and six HCWs were affected, resulting in an attack rate of 24% and 11% respectively. Six nosocomial infections were recorded. The sequence analysis confirmed three independent influenza clusters on three different sections of the geriatric ward. For at least two clusters, an HCW source was determined. Epidemiological and microbiological results confirm influenza transmission from HCWs to patients. A higher vaccination rate, isolation measures and better hand hygiene are recommended in order to prevent outbreaks in future influenza seasons. Copyright © 2014 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  5. Sequence and genetic organization of a Zymomonas mobilis gene cluster that encodes several enzymes of glucose metabolism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnell, W.O.; Kyung Cheol Yi; Conway, T.

    1990-12-01

    The Zymomonas mobilis genes that encode glucose-6-phosphate dehydrogenase (zwf), 6-phosphogluconate dehydratase (edd), and glucokinase (glk) were cloned independently by genetic complementation of specific defects in Escherichia coli metabolism. The identify of these cloned genes was confirmed by various biochemical means. Nucleotide sequence analysis established that these three genes are clustered on the genome and revealed an additional open reading frame in this region that has significant amino acid identity to the E.coli xylose-proton symporter and the human glucose transporter. On the basis of this evidence and structural analysis of the deduced primary amino acid sequence, this gene is believed tomore » encode the Z. mobilis glucose-facilitated diffusion protein, glf. The four genes in the 6-kb cluster are organized in the order glf, zwf, edd, glk. The glf and zwf genes are separated by 146 bp. The zwf and edd genes overlap by 8 bp, and their expression may be translationally coupled. The edd and glk genes are separated by 203 bp. The glk gene is followed by tandem transcriptional terminators. The four genes appear to be organized in an operon. Such an arrangement of the genes that govern glucose uptake and the first three steps of the Entner-Doudoroff glycolytic pathway provides the organism with a mechanism for carefully regulating the levels of the enzymes that control carbon flux into the pathway.« less

  6. Comparative genomic sequence analysis of strawberry and other rosids reveals significant microsynteny

    PubMed Central

    2010-01-01

    Background Fragaria belongs to the Rosaceae, an economically important family that includes a number of important fruit producing genera such as Malus and Prunus. Using genomic sequences from 50 Fragaria fosmids, we have examined the microsynteny between Fragaria and other plant models. Results In more than half of the strawberry fosmids, we found syntenic regions that are conserved in Populus, Vitis, Medicago and/or Arabidopsis with Populus containing the greatest number of syntenic regions with Fragaria. The longest syntenic region was between LG VIII of the poplar genome and the strawberry fosmid 72E18, where seven out of twelve predicted genes were collinear. We also observed an unexpectedly high level of conserved synteny between Fragaria (rosid I) and Vitis (basal rosid). One of the strawberry fosmids, 34E24, contained a cluster of R gene analogs (RGAs) with NBS and LRR domains. We detected clusters of RGAs with high sequence similarity to those in 34E24 in all the genomes compared. In the phylogenetic tree we have generated, all the NBS-LRR genes grouped together with Arabidopsis CNL-A type NBS-LRR genes. The Fragaria RGA grouped together with those of Vitis and Populus in the phylogenetic tree. Conclusions Our analysis shows considerable microsynteny between Fragaria and other plant genomes such as Populus, Medicago, Vitis, and Arabidopsis to a lesser degree. We also detected a cluster of NBS-LRR type genes that are conserved in all the genomes compared. PMID:20565715

  7. Extraordinary proliferation of microorganisms in aposymbiotic pea aphids, Acyrthosiphon pisum.

    PubMed

    Nakabachi, Atsushi; Ishikawa, Hajime; Kudo, Toshiaki

    2003-03-01

    Aposymbiotic pea aphids, which were deprived of their intracellular symbiotic bacterium, Buchnera, exhibit growth retardation and no fecundity. High performance liquid chromatographic (HPLC) analysis revealed that these aposymbiotic aphids, when reared on broad bean plants, accumulated a large amount of histamine. To assess the possibility of extraordinary proliferation of microorganisms other than Buchnera, we enumerated eubacteria and fungi in aphids using the real-time quantitative PCR method that targets genes encoding small-subunit rRNAs. The result showed that these microorganisms were extremely abundant in the aposymbiotic aphids reared on plants. Microbial communities in aposymbiotic aphids were further profiled by phylogenetic analysis of small-subunit rDNAs. Of 172 nonchimeric sequences of fungal 18S rDNAs, 138 (80.2%) belonged to the phylum Ascomycota. Among them, 21 clustered within a monophyletic group consisting of insect-pathogenic fungi and yeast-like symbionts of homopteran insects. Thirty-one (18.0%), two (1.2%), and one (0.6%) clones were clustered within the Basidiomycota, Zygomycota, and Oomycota, respectively. Of 167 nonchimeric sequences of eubacterial 16S rDNAs, 84 (50.3%) belonged to the gamma-subdivision of Proteobacteria to which most primary endosymbionts of insects and prolific histamine producers belong. Forty (24.0%), 25 (15.0%), 10 (6.0%), and five (3.0%) clones were clustered within alpha-Proteobacteria, Cytophaga-Flavobacterium-Bacteroides (CFB) group, Actinobacteria, and beta-Proteobacteria, respectively. Three had no phylogenetic association with known taxonomic divisions. None of the sequences studied in this study coincided exactly with those deposited in GenBank.

  8. New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data

    NASA Astrophysics Data System (ADS)

    Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.

    2007-12-01

    High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.

  9. Genetic characterization of poxviruses in Camelus dromedarius in Ethiopia, 2011-2014.

    PubMed

    Gelaye, Esayas; Achenbach, Jenna Elizabeth; Ayelet, Gelagay; Jenberie, Shiferaw; Yami, Martha; Grabherr, Reingard; Loitsch, Angelika; Diallo, Adama; Lamien, Charles Euloge

    2016-10-01

    Camelpox and camel contagious ecthyma are infectious viral diseases of camelids caused by camelpox virus (CMLV) and camel contagious ecthyma virus (CCEV), respectively. Even though, in Ethiopia, pox disease has been creating significant economic losses in camel production, little is known on the responsible pathogens and their genetic diversity. Thus, the present study aimed at isolation, identification and genetic characterization of the causative viruses. Accordingly, clinical case observations, infectious virus isolation, and molecular and phylogenetic analysis of poxviruses infecting camels in three regions and six districts in the country, Afar (Chifra), Oromia (Arero, Miyu and Yabello) and Somali (Gursum and Jijiga) between 2011 and 2014 were undertaken. The full hemagglutinin (HA) and partial A-type inclusion protein (ATIP) genes of CMLV and full major envelope protein (B2L) gene of CCEV of Ethiopian isolates were sequenced, analyzed and compared among each other and to foreign isolates. The viral isolation confirmed the presence of infectious poxviruses. The preliminary screening by PCR showed 27 CMLVs and 20 CCEVs. The sequence analyses showed that the HA and ATIP gene sequences are highly conserved within the local isolates of CMLVs, and formed a single cluster together with isolates from Somalia and Syria. Unlike CMLVs, the B2L gene analysis of Ethiopian CCEV showed few genetic variations. The phylogenetic analysis revealed three clusters of CCEV in Ethiopia with the isolates clustering according to their geographical origins. To our knowledge, this is the first report indicating the existence of CCEV in Ethiopia where camel contagious ecthyma was misdiagnosed as camelpox. Additionally, this study has also disclosed the existence of co-infections with CMLV and CCEV. A comprehensive characterization of poxviruses affecting camels in Ethiopia and the full genome sequencing of representative isolates are recommended to better understand the dynamics of pox diseases of camels and to assist in the implementation of more efficient control measures. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Genetic diversity and relationship analysis of Gossypium arboreum accessions.

    PubMed

    Liu, F; Zhou, Z L; Wang, C Y; Wang, Y H; Cai, X Y; Wang, X X; Zhang, Z S; Wang, K B

    2015-11-19

    Simple sequence repeat techniques were used to identify the genetic diversity of 101 Gossypium arboreum accessions collected from India, Vietnam, and the southwest of China (Guizhou, Guangxi, and Yunnan provinces). Twenty-six pairs of SSR primers produced a total of 103 polymorphic loci with an average of 3.96 polymorphic loci per primer. The average of the effective number of alleles, Nei's gene diversity, and Shannon's information index were 0.59, 0.2835, and 0.4361, respectively. The diversity varied among different geographic regions. The result of principal component analysis was consistent with that of unweighted pair group method with arithmetic mean clustering analysis. The 101 G. arboreum accessions were clustered into 2 groups.

  11. Ancient genomic architecture for mammalian olfactory receptor clusters

    PubMed Central

    Aloni, Ronny; Olender, Tsviya; Lancet, Doron

    2006-01-01

    Background Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence. PMID:17010214

  12. Molecular epidemiology demonstrated three emerging clusters of human immunodeficiency virus type 1 subtype B infection in Hong Kong.

    PubMed

    Leung, Tommy W C; Mak, Darwin; Wong, K H; Wang, Y; Song, Y H; Tsang, D N C; Wong, C; Shao, Y M; Lim, W L

    2008-07-01

    We conducted a molecular epidemiological study on newly diagnosed human immunodeficiency virus type 1 (HIV-1)-infected patients in Hong Kong to identify the epidemiological linkage of HIV-1 infection in the locality. Reverse transcription polymerase chain reaction (RT-PCR) for HIV-1 was performed on newly diagnosed HIV-1-positive sera collected from January 2002 to December 2006. PCR products correspond to the env C2V3V4 region and gag p17/p24 junction of the HIV-1 genome were nucleotide sequenced. Phylogenetic analyses performed on the acquired nucleotide sequences revealed that CRF01_AE and subtype B were the two dominant HIV-1 subtypes. Analyses also demonstrated the presence of three emerging HIV-1 clusters among the subtype B sequences in Hong Kong. Individual cluster possesses a unique cluster-specific amino acid signature for identification. Data show that one of the clusters (Cluster I) is rapidly expanding. In addition to the unique cluster-specific amino acid signature, the majority of sequences in Cluster I harbor a 6-amino acid insertion at the gag p17/p24 junction in a region that is thought to be closely associated with HIV-1 infectivity.

  13. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale

    PubMed Central

    Schmidt, Thomas S. B.; Matias Rodrigues, João F.; von Mering, Christian

    2014-01-01

    Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate ‘true’ microbial taxa. Here, we explore the ecological consistency of OTUs – based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale. PMID:24763141

  14. Clinical and epidemiological analysis of Campylobacter fetus subsp. fetus infections in humans and comparative genetic analysis with strains isolated from cattle.

    PubMed

    Escher, Robert; Brunner, Colette; von Steiger, Niklaus; Brodard, Isabelle; Droz, Sara; Abril, Carlos; Kuhnert, Peter

    2016-05-14

    Campylobacter fetus subspecies fetus (CFF) is an important pathogen for both cattle and humans. We performed a systematic epidemiological and clinical study of patients and evaluated the genetic relatedness of 17 human and 17 bovine CFF isolates by using different genotyping methods. In addition, the serotype, the dissemination of the genomic island containing a type IV secretion system (T4SS) and resistance determinants for tetracycline and streptomycin were also evaluated. The isolates from patients diagnosed with CFF infection as well as those from faecal samples of healthy calves were genotyped using pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), as well as single locus sequence typing (SLST) targeting cmp1 and cmp2 genes encoding two major outer membrane proteins in CFF. The presence of the genomic island and identification of serotype was determined by PCRs targeting genes of the T4SS and the sap locus, respectively. Tetracycline and streptomycin resistance phenotypes were determined by minimal inhibitory concentration. Clinical data obtained from medical records and laboratory data were supplemented by data obtained via telephone interviews with the patients and treating physicians. PFGE analysis defined two major clusters; cluster A containing 16 bovine (80 %) isolates and cluster B containing 13 human (92 %) isolates, suggesting a host preference. Further genotypic analysis using MLST, SLST as well as sap and T4SS PCR showed the presence of genotypically identical isolates in cattle and humans. The low diversity observed within the cmp alleles of CFF corroborates the clonal nature of this pathogen. The genomic island containing the tetracycline and streptomycin resistance determinants was found in 55 % of the isolates in cluster A and correlated with phenotypic antibiotic resistance. Most human and bovine isolates were separated on two phylogenetic clusters. However, several human and bovine isolates were identical by diverse genotyping methods, indicating a possible link between strains from these two hosts.

  15. Extensive concerted evolution of rice paralogs and the road to regaining independence.

    PubMed

    Wang, Xiyin; Tang, Haibao; Bowers, John E; Feltus, Frank A; Paterson, Andrew H

    2007-11-01

    Many genes duplicated by whole-genome duplications (WGDs) are more similar to one another than expected. We investigated whether concerted evolution through conversion and crossing over, well-known to affect tandem gene clusters, also affects dispersed paralogs. Genome sequences for two Oryza subspecies reveal appreciable gene conversion in the approximately 0.4 MY since their divergence, with a gradual progression toward independent evolution of older paralogs. Since divergence from subspecies indica, approximately 8% of japonica paralogs produced 5-7 MYA on chromosomes 11 and 12 have been affected by gene conversion and several reciprocal exchanges of chromosomal segments, while approximately 70-MY-old "paleologs" resulting from a genome duplication (GD) show much less conversion. Sequence similarity analysis in proximal gene clusters also suggests more conversion between younger paralogs. About 8% of paleologs may have been converted since rice-sorghum divergence approximately 41 MYA. Domain-encoding sequences are more frequently converted than nondomain sequences, suggesting a sort of circularity--that sequences conserved by selection may be further conserved by relatively frequent conversion. The higher level of concerted evolution in the 5-7 MY-old segmental duplication may reflect the behavior of many genomes within the first few million years after duplication or polyploidization.

  16. Detection of Helicobacter and Campylobacter spp. from the aquatic environment of marine mammals.

    PubMed

    Goldman, C G; Matteo, M J; Loureiro, J D; Degrossi, J; Teves, S; Heredia, S Rodriguez; Alvarez, K; González, A Beltrán; Catalano, M; Boccio, J; Cremaschi, G; Solnick, J V; Zubillaga, M B

    2009-01-13

    The mechanism by which Helicobacter species are transmitted remains unclear. To examine the possible role of environmental transmission in marine mammals, we sought the presence of Helicobacter spp. and non-Helicobacter bacteria within the order Campylobacterales in water from the aquatic environment of marine mammals, and in fish otoliths regurgitated by dolphins. Water was collected from six pools, two inhabited by dolphins and four inhabited by seals. Regurgitated otoliths were collected from the bottom of dolphins' pools. Samples were evaluated by culture, PCR and DNA sequence analysis. Sequences from dolphins' water and from regurgitated otoliths clustered with 99.8-100% homology with sequences from gastric fluids, dental plaque and saliva from dolphins living in those pools, and with 99.5% homology with H. cetorum. Sequences from seals' water clustered with 99.5% homology with a sequence amplified from a Northern sea lion (AY203900). Control PCR on source water for the pools and from otoliths dissected from feeder fish were negative. The findings of Helicobacter spp. DNA in the aquatic environment suggests that contaminated water from regurgitated fish otoliths and perhaps other tissues may play a role in Helicobacter transmission among marine mammals.

  17. Genetic variability of Baylisascaris schroederi from the Qinling subspecies of the giant panda in China revealed by sequences of three mitochondrial genes.

    PubMed

    Zhao, Zhong-Hui; Bian, Qing-Qing; Ren, Wan-Xin; Cheng, Wen-Yu; Jia, Yan-Qing; Fang, Yan-Qin; Zhao, Guang-Hui

    2014-06-01

    The present study examined the variations in three mitochondrial (mt) DNA sequences, namely cytochrome b (cytb), cytochrome c oxidase subunit 3 (cox3) and NADH dehydrogenase subunit 5 (nad5), among Baylisascaris schroederi isolates from the Qinling subspecies of the giant panda in Shaanxi province, northwestern China. No differences in length were detected in the three mt fragments from different isolates. The intra-specific sequence variations within all B. schroederi samples were 0-2.6% for pcytb, 0-1.8% for pcox3 and 0-2.1% for pnad5, while the inter-specific sequence differences among members of the genus Baylisascaris were 8.2-15.2%, 6.2-15.9% and 8.4-16.0% for pcytb, pcox3, pnad5, respectively. A phylogenetic analysis of the combined sequences of pcytb, pcox3 and pnad 5 showed that all B. schroederi samples in the present study were located in two large clusters, with one cluster containing samples from giant pandas in Sichuan province. These findings provide basic information for further study of molecular epidemiology and control of B. schroederi infection in the Qinling subspecies of the giant panda and throughout China.

  18. Transcription factor clusters regulate genes in eukaryotic cells

    PubMed Central

    Hedlund, Erik G; Friemann, Rosmarie; Hohmann, Stefan

    2017-01-01

    Transcription is regulated through binding factors to gene promoters to activate or repress expression, however, the mechanisms by which factors find targets remain unclear. Using single-molecule fluorescence microscopy, we determined in vivo stoichiometry and spatiotemporal dynamics of a GFP tagged repressor, Mig1, from a paradigm signaling pathway of Saccharomyces cerevisiae. We find the repressor operates in clusters, which upon extracellular signal detection, translocate from the cytoplasm, bind to nuclear targets and turnover. Simulations of Mig1 configuration within a 3D yeast genome model combined with a promoter-specific, fluorescent translation reporter confirmed clusters are the functional unit of gene regulation. In vitro and structural analysis on reconstituted Mig1 suggests that clusters are stabilized by depletion forces between intrinsically disordered sequences. We observed similar clusters of a co-regulatory activator from a different pathway, supporting a generalized cluster model for transcription factors that reduces promoter search times through intersegment transfer while stabilizing gene expression. PMID:28841133

  19. Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

    PubMed

    Szymanski, Maciej; Karlowski, Wojciech M

    2016-01-01

    In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.

  20. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.

    PubMed

    Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

    2013-07-09

    The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with useful annotation information with easy-to-use web interfaces, which helps researchers to efficiently search for target sequences such as insect resistance-related genes. KONAGAbase will be continuously updated and additional genomic/transcriptomic resources and analysis tools will be provided for further efficient analysis of the mechanism of insecticide resistance and the development of effective insecticides with a novel mode of action for DBM.

Top