Science.gov

Sample records for b-globin gene cluster

  1. [Joint locus of a/b-globin genes in Danio rerio is segregated into structural subdomains active at different stages of development].

    PubMed

    Dolgushin, K V; Petrova, N V; Iudinkova, E S; Razin, S V; Iarovaia, O V

    2015-01-01

    In the domain model of eukaryotic genome organization, the functional unit of the genome, along with the relevant regulatory elements, is considered to be a gene or a gene family. In hot-blooded vertebrate animals, the domains of a- and b-globin genes are positioned at different chromosomes and are organized and regulated in different fashion. In cold-blooded animals, in particular in tropical fish Danio rerio, a- and b globin genes are located in a common gene cluster. However, the joint a/b-globin gene cluster is subdivided into two development stage-specific subdomains, the adult one and the embryonic-larval one. In an attempt to find out whether this functional segregation correlates with structural segregation of the domain we compared the DNase I sensitivity and profiles of histone modifications of adult and embryonic-larval segments of the domain in cultured D. rerio fibroblasts. We have demonstrated that, in these nonerythroid cells, adult and embryonic- larval subdomains possess different DNase I sensitivities and different profiles of H3K27me3, a histone modification introduced by PRC2 complex. These observations suggest that joint a/b globin gene domain of Danio rerio is segregated into two structural subdomain harboring adult and embryonic-larval globin genes.

  2. Gene Cluster Statistics with Gene Families

    PubMed Central

    Durand, Dannie

    2009-01-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such “gene clusters” is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  3. Conversion events in gene clusters

    PubMed Central

    2011-01-01

    Background Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments. Results To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab. Conclusions These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes. PMID:21798034

  4. Clustering cancer gene expression data by projective clustering ensemble

    PubMed Central

    Yu, Xianxue; Yu, Guoxian

    2017-01-01

    Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920

  5. Persistence drives gene clustering in bacterial genomes

    PubMed Central

    Fang, Gang; Rocha, Eduardo PC; Danchin, Antoine

    2008-01-01

    Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering. PMID:18179692

  6. Gene-Ontology-based clustering of gene expression data.

    PubMed

    Adryan, Boris; Schuh, Reinhard

    2004-11-01

    The expected correlation between genetic co-regulation and affiliation to a common biological process is not necessarily the case when numerical cluster algorithms are applied to gene expression data. GO-Cluster uses the tree structure of the Gene Ontology database as a framework for numerical clustering, and thus allowing a simple visualization of gene expression data at various levels of the ontology tree. The 32-bit Windows application is freely available at http://www.mpibpc.mpg.de/go-cluster/

  7. Finding approximate gene clusters with Gecko 3

    PubMed Central

    Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

    2016-01-01

    Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. PMID:27679480

  8. Clustering of High Throughput Gene Expression Data

    PubMed Central

    Pirim, Harun; Ekşioğlu, Burak; Perkins, Andy; Yüceer, Çetin

    2012-01-01

    High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community. PMID:23144527

  9. Sequence analysis of porothramycin biosynthetic gene cluster.

    PubMed

    Najmanova, Lucie; Ulanova, Dana; Jelinkova, Marketa; Kamenik, Zdenek; Kettnerova, Eliska; Koberska, Marketa; Gazak, Radek; Radojevic, Bojana; Janata, Jiri

    2014-11-01

    The biosynthetic gene cluster of porothramycin, a sequence-selective DNA alkylating compound, was identified in the genome of producing strain Streptomyces albus subsp. albus (ATCC 39897) and sequentially characterized. A 39.7 kb long DNA region contains 27 putative genes, 18 of them revealing high similarity with homologous genes from biosynthetic gene cluster of closely related pyrrolobenzodiazepine (PBD) compound anthramycin. However, considering the structures of both compounds, the number of differences in the gene composition of compared biosynthetic gene clusters was unexpectedly high, indicating participation of alternative enzymes in biosynthesis of both porothramycin precursors, anthranilate, and branched L-proline derivative. Based on the sequence analysis of putative NRPS modules Por20 and Por21, we suppose that in porothramycin biosynthesis, the methylation of anthranilate unit occurs prior to the condensation reaction, while modifications of branched proline derivative, oxidation, and dimethylation of the side chain occur on already condensed PBD core. Corresponding two specific methyltransferase encoding genes por26 and por25 were identified in the porothramycin gene cluster. Surprisingly, also methyltransferase gene por18 homologous to orf19 from anthramycin biosynthesis was detected in porothramycin gene cluster even though the appropriate biosynthetic step is missing, as suggested by ultra high-performance liquid chromatography-diode array detection-mass spectrometry (UHPLC-DAD-MS) analysis of the product in the S. albus culture broth.

  10. Clustering of gene ontology terms in genomes.

    PubMed

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  11. Chicken rRNA Gene Cluster Structure.

    PubMed

    Dyomin, Alexander G; Koshel, Elena I; Kiselev, Artem M; Saifitdinova, Alsu F; Galkina, Svetlana A; Fukagawa, Tatsuo; Kostareva, Anna A; Gaginskaya, Elena R

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5'ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3'ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity.

  12. Chicken rRNA Gene Cluster Structure

    PubMed Central

    Dyomin, Alexander G.; Koshel, Elena I.; Kiselev, Artem M.; Saifitdinova, Alsu F.; Galkina, Svetlana A.; Fukagawa, Tatsuo; Kostareva, Anna A.

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity. PMID:27299357

  13. Evolution of chordate hox gene clusters.

    PubMed

    Ruddle, F H; Amemiya, C T; Carr, J L; Kim, C B; Ledje, C; Shashikant, C S; Wagner, G P

    1999-05-18

    In this article, we consider the role of the Hox genes in chordate and vertebrate evolution from the viewpoints of molecular and developmental evolution. Models of Hox cluster duplication are considered with emphasis on a threefold duplication model. We also show that cluster duplication is consistent with a semiconservative model of duplication, where following duplication, one daughter cluster remains unmodified, while the other diverges and assumes a new architecture and presumably new functions. Evidence is reviewed, suggesting that Hox gene enhancers have played an important role in body plan evolution. Finally, we contrast the invertebrates and vertebrates in terms of genome and Hox cluster duplication which are present in the latter, but not the former. We question whether gene duplication has been important in vertebrates for the introduction of novel features such as limbs, a urogenital system, and specialized neuromuscular interactions.

  14. Clustering gene expression data using graph separators.

    PubMed

    Kaba, Bangaly; Pinet, Nicolas; Lelandais, Gaëlle; Sigayret, Alain; Berry, Anne

    2007-01-01

    Recent work has used graphs to modelize expression data from microarray experiments, in view of partitioning the genes into clusters. In this paper, we introduce the use of a decomposition by clique separators. Our aim is to improve the classical clustering methods in two ways: first we want to allow an overlap between clusters, as this seems biologically sound, and second we want to be guided by the structure of the graph to define the number of clusters. We test this approach with a well-known yeast database (Saccharomyces cerevisiae). Our results are good, as the expression profiles of the clusters we find are very coherent. Moreover, we are able to organize into another graph the clusters we find, and order them in a fashion which turns out to respect the chronological order defined by the the sporulation process.

  15. Pichia stipitis genomics, transcriptomics, and gene clusters

    Treesearch

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  16. Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes.

    PubMed

    De Mulder, Wim; Kuiper, Martin; Boel, René

    2010-03-25

    Clustering is an important approach in the analysis of biological data, and often a first step to identify interesting patterns of coexpression in gene expression data. Because of the high complexity and diversity of gene expression data, many genes cannot be easily assigned to a cluster, but even if the dissimilarity of these genes with all other gene groups is large, they will finally be forced to become member of a cluster. In this paper we show how to detect such elements, called unstable elements. We have developed an approach for iterative clustering algorithms in which unstable elements are deleted, making the iterative algorithm less dependent on initial centers. Although the approach is unsupervised, it is less likely that the clusters into which the reduced data set is subdivided contain false positives. This clustering yields a more differentiated approach for biological data, since the cluster analysis is divided into two parts: the pruned data set is divided into highly consistent clusters in an unsupervised way and the removed, unstable elements for which no meaningful cluster exists in unsupervised terms can be given a cluster with the use of biological knowledge and information about the likelihood of cluster membership. We illustrate our framework on both an artificial and real biological data set.

  17. Finding and analyzing plant metabolic gene clusters.

    PubMed

    Osbourn, Anne; Papadopoulou, Kalliopi K; Qi, Xiaoquan; Field, Ben; Wegel, Eva

    2012-01-01

    Plants produce an array of diverse secondary metabolites with important ecological functions, providing protection against pests, diseases, and abiotic stresses. Secondary metabolites are also a rich source of bioactive compounds for drug and agrochemical development. Despite the importance of these compounds, the metabolic diversity of plants remains largely unexploited, primarily due to the problems associated with mining large and complex genomes. It has recently emerged that genes for the synthesis of multiple major classes of plant-derived secondary metabolites (benzoxinones, diterpenes, triterpenes, and cyanogenic glycosides) are organized in clusters reminiscent of the metabolic gene clusters found in microbes. Many more secondary metabolic clusters are likely to emerge as the body of sequence information available for plants continues to grow, accelerated by high-throughput sequencing. Here, we describe approaches for the identification of secondary metabolic gene clusters in plants through forward and reverse genetics, map-based cloning, and genome mining and give examples of methods used for the analysis and functional confirmation of new clusters. Copyright © 2012 Elsevier Inc. All rights reserved.

  18. Clustering Genes of Common Evolutionary History.

    PubMed

    Gori, Kevin; Suchan, Tomasz; Alvarez, Nadir; Goldman, Nick; Dessimoz, Christophe

    2016-06-01

    Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward's method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl). © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Combined clustering models for the analysis of gene expression

    SciTech Connect

    Angelova, M. Ellman, J.

    2010-02-15

    Clustering has become one of the fundamental tools for analyzing gene expression and producing gene classifications. Clustering models enable finding patterns of similarity in order to understand gene function, gene regulation, cellular processes and sub-types of cells. The clustering results however have to be combined with sequence data or knowledge about gene functionality in order to make biologically meaningful conclusions. In this work, we explore a new model that integrates gene expression with sequence or text information.

  20. Penicillium roqueforti PR toxin gene cluster characterization.

    PubMed

    Hidalgo, Pedro I; Poirier, Elisabeth; Ullán, Ricardo V; Piqueras, Justine; Meslet-Cladière, Laurence; Coton, Emmanuel; Coton, Monika

    2017-03-01

    PR toxin is a well-known isoprenoid mycotoxin almost solely produced by Penicillium roqueforti after growth on food or animal feed. This mycotoxin has been described as the most toxic produced by this species. In this study, an in silico analysis allowed identifying for the first time a 22.4-kb biosynthetic gene cluster involved in PR toxin biosynthesis in P. roqueforti. The pathway contains 11 open reading frames encoding for ten putative proteins including the major fungal terpene cyclase, aristolochene synthase, involved in the first farnesyl-diphosphate cyclization step as well as an oxidoreductase, an oxidase, two P450 monooxygenases, a transferase, and two dehydrogenase enzymes. Gene silencing was used to study three genes (ORF5, ORF6, and ORF8 encoding for an acetyltransferase and two P450 monooxygenases, respectively) and resulted in 20 to 40% PR toxin production reductions in all transformants proving the involvement of these genes and the corresponding enzyme activities in PR toxin biosynthesis. According to the considered silenced gene target, eremofortin A and B productions were also affected suggesting their involvement as biosynthetic intermediates in this pathway. A PR toxin biosynthesis pathway is proposed based on the most recent and available data.

  1. Evolution of Hox gene clusters in deuterostomes

    PubMed Central

    2013-01-01

    Hox genes, with their similar roles in animals as evolutionarily distant as humans and flies, have fascinated biologists since their discovery nearly 30 years ago. During the last two decades, reports on Hox genes from a still growing number of eumetazoan species have increased our knowledge on the Hox gene contents of a wide range of animal groups. In this review, we summarize the current Hox inventory among deuterostomes, not only in the well-known teleosts and tetrapods, but also in the earlier vertebrate and invertebrate groups. We draw an updated picture of the ancestral repertoires of the different lineages, a sort of “genome Hox bar-code” for most clades. This scenario allows us to infer differential gene or cluster losses and gains that occurred during deuterostome evolution, which might be causally linked to the morphological changes that led to these widely diverse animal taxa. Finally, we focus on the challenging family of posterior Hox genes, which probably originated through independent tandem duplication events at the origin of each of the ambulacrarian, cephalochordate and vertebrate/urochordate lineages. PMID:23819519

  2. Tumor clustering using nonnegative matrix factorization with gene selection.

    PubMed

    Zheng, Chun-Hou; Huang, De-Shuang; Zhang, Lei; Kong, Xiang-Zhen

    2009-07-01

    Tumor clustering is becoming a powerful method in cancer class discovery. Nonnegative matrix factorization (NMF) has shown advantages over other conventional clustering techniques. Nonetheless, there is still considerable room for improving the performance of NMF. To this end, in this paper, gene selection and explicitly enforcing sparseness are introduced into the factorization process. Particularly, independent component analysis is employed to select a subset of genes so that the effect of irrelevant or noisy genes can be reduced. The NMF and its extensions, sparse NMF and NMF with sparseness constraint, are then used for tumor clustering on the selected genes. A series of elaborate experiments are performed by varying the number of clusters and the number of selected genes to evaluate the cooperation between different gene selection settings and NMF-based clustering. Finally, the experiments on three representative gene expression datasets demonstrated that the proposed scheme can achieve better clustering results.

  3. The rise of operon-like gene clusters in plants.

    PubMed

    Boycheva, Svetlana; Daviet, Laurent; Wolfender, Jean-Luc; Fitzpatrick, Teresa B

    2014-07-01

    Gene clusters are common features of prokaryotic genomes also present in eukaryotes. Most clustered genes known are involved in the biosynthesis of secondary metabolites. Although horizontal gene transfer is a primary source of prokaryotic gene cluster (operon) formation and has been reported to occur in eukaryotes, the predominant source of cluster formation in eukaryotes appears to arise de novo or through gene duplication followed by neo- and sub-functionalization or translocation. Here we aim to provide an overview of the current knowledge and open questions related to plant gene cluster functioning, assembly, and regulation. We also present potential research approaches and point out the benefits of a better understanding of gene clusters in plants for both fundamental and applied plant science. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Genome scan identifies a locus affecting gamma-globin expression in human beta-cluster YAC transgenic mice

    SciTech Connect

    Lin, S.D.; Cooper, P.; Fung, J.; Weier, H.U.G.; Rubin, E.M.

    2000-03-01

    Genetic factors affecting post-natal g-globin expression - a major modifier of the severity of both b-thalassemia and sickle cell anemia, have been difficult to study. This is especially so in mice, an organism lacking a globin gene with an expression pattern equivalent to that of human g-globin. To model the human b-cluster in mice, with the goal of screening for loci affecting human g-globin expression in vivo, we introduced a human b-globin cluster YAC transgene into the genome of FVB mice . The b-cluster contained a Greek hereditary persistence of fetal hemoglobin (HPFH) g allele resulting in postnatal expression of human g-globin in transgenic mice. The level of human g-globin for various F1 hybrids derived from crosses between the FVB transgenics and other inbred mouse strains was assessed. The g-globin level of the C3HeB/FVB transgenic mice was noted to be significantly elevated. To map genes affecting postnatal g-globin expression, a 20 centiMorgan (cM) genome scan of a C3HeB/F VB transgenics [prime] FVB backcross was performed, followed by high-resolution marker analysis of promising loci. From this analysis we mapped a locus within a 2.2 cM interval of mouse chromosome 1 at a LOD score of 4.2 that contributes 10.4% of variation in g-globin expression level. Combining transgenic modeling of the human b-globin gene cluster with quantitative trait analysis, we have identified and mapped a murine locus that impacts on human g-globin expression in vivo.

  5. Formation of plant metabolic gene clusters within dynamic chromosomal regions

    PubMed Central

    Field, Ben; Fiston-Lavier, Anna-Sophie; Kemen, Ariane; Geisler, Katrin; Quesneville, Hadi; Osbourn, Anne E.

    2011-01-01

    In bacteria, genes with related functions often are grouped together in operons and are cotranscribed as a single polycistronic mRNA. In eukaryotes, functionally related genes generally are scattered across the genome. Notable exceptions include gene clusters for catabolic pathways in yeast, synthesis of secondary metabolites in filamentous fungi, and the major histocompatibility complex in animals. Until quite recently it was thought that gene clusters in plants were restricted to tandem duplicates (for example, arrays of leucine-rich repeat disease-resistance genes). However, operon-like clusters of coregulated nonhomologous genes are an emerging theme in plant biology, where they may be involved in the synthesis of certain defense compounds. These clusters are unlikely to have arisen by horizontal gene transfer, and the mechanisms behind their formation are poorly understood. Previously in thale cress (Arabidopsis thaliana) we identified an operon-like gene cluster that is required for the synthesis and modification of the triterpene thalianol. Here we characterize a second operon-like triterpene cluster (the marneral cluster) from A. thaliana, compare the features of these two clusters, and investigate the evolutionary events that have led to cluster formation. We conclude that common mechanisms are likely to underlie the assembly and control of operon-like gene clusters in plants. PMID:21876149

  6. Valinomycin biosynthetic gene cluster in Streptomyces: conservation, ecology and evolution.

    PubMed

    Matter, Andrea M; Hoot, Sara B; Anderson, Patrick D; Neves, Susana S; Cheng, Yi-Qiang

    2009-09-29

    Many Streptomyces strains are known to produce valinomycin (VLM) antibiotic and the VLM biosynthetic gene cluster (vlm) has been characterized in two independent isolates. Here we report the phylogenetic relationships of these strains using both parsimony and likelihood methods, and discuss whether the vlm gene cluster shows evidence of horizontal transmission common in natural product biosynthetic genes. Eight Streptomyces strains from around the world were obtained and sequenced for three regions of the two large nonribosomal peptide synthetase genes (vlm1 and vlm2) involved in VLM biosynthesis. The DNA sequences representing the vlm gene cluster are highly conserved among all eight environmental strains. The geographic distribution pattern of these strains and the strict congruence between the trees of the two vlm genes and the housekeeping genes, 16S rDNA and trpB, suggest vertical transmission of the vlm gene cluster in Streptomyces with no evidence of horizontal gene transfer. We also explored the relationship of the sequence of vlm genes to that of the cereulide biosynthetic genes (ces) found in Bacillus cereus and found them highly divergent from each other at DNA level (genetic distance values >or= 95.6%). It is possible that the vlm gene cluster and the ces gene cluster may share a relatively distant common ancestor but these two gene clusters have since evolved independently.

  7. Diversity of Carotenoid Synthesis Gene Clusters from Environmental Enterobacteriaceae Strains

    PubMed Central

    Sedkova, Natalia; Tao, Luan; Rouvière, Pierre E.; Cheng, Qiong

    2005-01-01

    Eight Enterobacteriaceae strains that produce zeaxanthin and derivatives of this compound were isolated from a variety of environmental samples. Phylogenetic analysis showed that these strains grouped with different clusters of Erwinia type strains. Four strains representing the phylogenetic diversity were chosen for further characterization, which revealed their genetic diversity as well as their biochemical diversity. The carotenoid synthesis gene clusters cloned from the four strains had three different gene organizations. Two of the gene clusters, those from strains DC416 and DC260, had the classical organization crtEXYIBZ; the gene cluster from DC413 had the rare organization crtE-idi-XYIBZ; and the gene cluster from DC404 had the unique organization crtE-idi-YIBZ. Besides the diversity in genetic organization, these genes also exhibited considerable sequence diversity. On average, they exhibited 60 to 70% identity with each other, as well as with the corresponding genes of the Pantoea type strains. The four different clusters were individually expressed in Escherichia coli, and the two idi-containing clusters gave more than fivefold-higher carotenoid titers than the two clusters lacking idi. Expression of the crtEYIB genes with and without idi confirmed the effect of increasing carotenoid titer by the type II idi gene linked with the carotenoid synthesis gene clusters. PMID:16332796

  8. Inferring the Recent Duplication History of a Gene Cluster

    NASA Astrophysics Data System (ADS)

    Song, Giltae; Zhang, Louxin; Vinař, Tomáš; Miller, Webb

    Much important evolutionary activity occurs in gene clusters, where a copy of a gene may be free to evolve new functions. Computational methods to extract evolutionary information from sequence data for such clusters are currently imperfect, in part because accurate sequence data are often lacking in these genomic regions, making the existing methods difficult to apply. We describe a new method for reconstructing the recent evolutionary history of gene clusters. The method’s performance is evaluated on simulated data and on actual human gene clusters.

  9. Computing gene expression data with a knowledge-based gene clustering approach.

    PubMed

    Rosa, Bruce A; Oh, Sookyung; Montgomery, Beronda L; Chen, Jin; Qin, Wensheng

    2010-01-01

    Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

  10. A Nomadic Subtelomeric Disease Resistance Gene Cluster in Common Bean

    USDA-ARS?s Scientific Manuscript database

    The B4 resistance (R)-gene cluster, located in subtelomeric region of chromosome 4, is one of the largest clusters known in common bean (Phaseolus vulgaris, Pv). We sequenced 650 kb spanning this locus and annotated 97 genes, 26 of which correspond to Coiled-coil-Nucleotide-Binding-Site-Leucine-Rich...

  11. Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

    PubMed Central

    Narayanan, Manikandan; Vetta, Adrian; Schadt, Eric E.; Zhu, Jun

    2010-01-01

    Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes. PMID:20419151

  12. Efficient Computation of Approximate Gene Clusters Based on Reference Occurrences

    NASA Astrophysics Data System (ADS)

    Jahn, Katharina

    Whole genome comparison based on the analysis of gene cluster conservation has become a popular approach in comparative genomics. While gene order and gene content as a whole randomize over time, it is observed that certain groups of genes which are often functionally related remain co-located across species. However, the conservation is usually not perfect which turns the identification of these structures, often referred to as approximate gene clusters, into a challenging task. In this paper, we present a polynomial time algorithm that computes approximate gene clusters based on reference occurrences. We show that our approach yields highly comparable results to a more general approach and allows for approximate gene cluster detection in parameter ranges currently not feasible for non-reference based approaches.

  13. Regulation of clustered protocadherin genes in individual neurons.

    PubMed

    Hirayama, Teruyoshi; Yagi, Takeshi

    2017-09-01

    Individual neurons are basic functional units in the complex system of the brain. One aspect of neuronal individuality is generated by stochastic and combinatorial expression of diverse clustered protocadherins (Pcdhs), encoded by the Pcdha, Pcdhb, and Pcdhg gene clusters, that are critical for several aspects of neural circuit formation. Each clustered Pcdh gene has its own promoter containing conserved sequences and is transcribed by a promoter choice mechanism involving interaction between the promoter and enhancers. A CTCF/Cohesin complex induces this interaction by configuration of DNA-looping in the chromatin structure. At the same time, the semi-stochastic expression of clustered Pcdh genes is regulated in individual neurons by DNA methylation: the methyltransferase Dnmt3b regulates methylation state of individual clustered Pcdh genes during early embryonic stages prior to the establishment of neural stem cells. Several other factors, including Smchd1, also contribute to the regulation of clustered Pcdh gene expression. In addition, psychiatric diseases and early life experiences of individuals can influence expression of clustered Pcdh genes in the brain, through epigenetic alterations. Clustered Pcdh gene expression is thus a significant and highly regulated step in establishing neuronal individuality and generating functional neural circuits in the brain. Copyright © 2017. Published by Elsevier Ltd.

  14. Prokaryotic Gene Clusters: A Rich Toolbox for Synthetic Biology

    PubMed Central

    Fischbach, Michael; Voigt, Christopher A.

    2014-01-01

    Bacteria construct elaborate nanostructures, obtain nutrients and energy from diverse sources, synthesize complex molecules, and implement signal processing to react to their environment. These complex phenotypes require the coordinated action of multiple genes, which are often encoded in a contiguous region of the genome, referred to as a gene cluster. Gene clusters sometimes contain all of the genes necessary and sufficient for a particular function. As an evolutionary mechanism, gene clusters facilitate the horizontal transfer of the complete function between species. Here, we review recent work on a number of clusters whose functions are relevant to biotechnology. Engineering these clusters has been hindered by their regulatory complexity, the need to balance the expression of many genes, and a lack of tools to design and manipulate DNA at this scale. Advances in synthetic biology will enable the large-scale bottom-up engineering of the clusters to optimize their functions, wake up cryptic clusters, or to transfer them between organisms. Understanding and manipulating gene clusters will move towards an era of genome engineering, where multiple functions can be “mixed-and-matched” to create a designer organism. PMID:21154668

  15. Remodelling of a homeobox gene cluster by multiple independent gene reunions in Drosophila.

    PubMed

    Chan, Carolus; Jayasekera, Suvini; Kao, Bryant; Páramo, Moisés; von Grotthuss, Marcin; Ranz, José M

    2015-03-05

    Genome clustering of homeobox genes is often thought to reflect arrangements of tandem gene duplicates maintained by advantageous coordinated gene regulation. Here we analyse the chromosomal organization of the NK homeobox genes, presumed to be part of a single cluster in the Bilaterian ancestor, across 20 arthropods. We find that the ProtoNK cluster was extensively fragmented in some lineages, showing that NK clustering in Drosophila species does not reflect selectively maintained gene arrangements. More importantly, the arrangement of NK and neighbouring genes across the phylogeny supports that, in two instances within the Drosophila genus, some cluster remnants became reunited via large-scale chromosomal rearrangements. Simulated scenarios of chromosome evolution indicate that these reunion events are unlikely unless the genome neighbourhoods harbouring the participating genes tend to colocalize in the nucleus. Our results underscore how mechanisms other than tandem gene duplication can result in paralogous gene clustering during genome evolution.

  16. Aspergillus nidulans mutants defective in stc gene cluster regulation.

    PubMed Central

    Butchko, R A; Adams, T H; Keller, N P

    1999-01-01

    The genes involved in the biosynthesis of sterigmatocystin (ST), a toxic secondary metabolite produced by Aspergillus nidulans and an aflatoxin (AF) precursor in other Aspergillus spp., are clustered on chromosome IV of A. nidulans. The sterigmatocystin gene cluster (stc gene cluster) is regulated by the pathway-specific transcription factor aflR. The function of aflR appears to be conserved between ST- and AF-producing aspergilli, as are most of the other genes in the cluster. We describe a novel screen for detecting mutants defective in stc gene cluster activity by use of a genetic block early in the ST biosynthetic pathway that results in the accumulation of the first stable intermediate, norsolorinic acid (NOR), an orange-colored compound visible with the unaided eye. We have mutagenized this NOR-accumulating strain and have isolated 176 Nor(-) mutants, 83 of which appear to be wild type in growth and development. Sixty of these 83 mutations are linked to the stc gene cluster and are likely defects in aflR or known stc biosynthetic genes. Of the 23 mutations not linked to the stc gene cluster, 3 prevent accumulation of NOR due to the loss of aflR expression. PMID:10511551

  17. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis

    PubMed Central

    Noar, Roslyn D.; Daub, Margaret E.

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  18. Mining Bacterial Genomes for Secondary Metabolite Gene Clusters.

    PubMed

    Adamek, Martina; Spohn, Marius; Stegmann, Evi; Ziemert, Nadine

    2017-01-01

    With the emergence of bacterial resistance against frequently used antibiotics, novel antibacterial compounds are urgently needed. Traditional bioactivity-guided drug discovery strategies involve laborious screening efforts and display high rediscovery rates. With the progress in next generation sequencing methods and the knowledge that the majority of antibiotics in clinical use are produced as secondary metabolites by bacteria, mining bacterial genomes for secondary metabolites with antimicrobial activity is a promising approach, which can guide a more time and cost-effective identification of novel compounds. However, what sounds easy to accomplish, comes with several challenges. To date, several tools for the prediction of secondary metabolite gene clusters are available, some of which are based on the detection of signature genes, while others are searching for specific patterns in gene content or regulation.Apart from the mere identification of gene clusters, several other factors such as determining cluster boundaries and assessing the novelty of the detected cluster are important. For this purpose, comparison of the predicted secondary metabolite genes with different cluster and compound databases is necessary. Furthermore, it is advisable to classify detected clusters into gene cluster families. So far, there is no standardized procedure for genome mining; however, different approaches to overcome all of these challenges exist and are addressed in this chapter. We give practical guidance on the workflow for secondary metabolite gene cluster identification, which includes the determination of gene cluster boundaries, addresses problems occurring with the use of draft genomes, and gives an outlook on the different methods for gene cluster classification. Based on comprehensible examples a protocol is set, which should enable the readers to mine their own genome data for interesting secondary metabolites.

  19. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    PubMed

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  20. Hierarchical Dirichlet process model for gene expression clustering

    PubMed Central

    2013-01-01

    Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447

  1. Nonlinear model-based method for clustering periodically expressed genes.

    PubMed

    Tian, Li-Ping; Liu, Li-Zhi; Zhang, Qian-Wei; Wu, Fang-Xiang

    2011-01-01

    Clustering periodically expressed genes from their time-course expression data could help understand the molecular mechanism of those biological processes. In this paper, we propose a nonlinear model-based clustering method for periodically expressed gene profiles. As periodically expressed genes are associated with periodic biological processes, the proposed method naturally assumes that a periodically expressed gene dataset is generated by a number of periodical processes. Each periodical process is modelled by a linear combination of trigonometric sine and cosine functions in time plus a Gaussian noise term. A two stage method is proposed to estimate the model parameter, and a relocation-iteration algorithm is employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. One synthetic dataset and two biological datasets were employed to evaluate the performance of the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g., k-means) for periodically expressed gene data, and thus it is an effective cluster analysis method for periodically expressed gene data.

  2. clusterProfiler: an R package for comparing biological themes among gene clusters.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

    2012-05-01

    Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

  3. Sesterterpene ophiobolin biosynthesis involving multiple gene clusters in Aspergillus ustus

    PubMed Central

    Chai, Hangzhen; Yin, Ru; Liu, Yongfeng; Meng, Huiying; Zhou, Xianqiang; Zhou, Guolin; Bi, Xupeng; Yang, Xue; Zhu, Tonghan; Zhu, Weiming; Deng, Zixin; Hong, Kui

    2016-01-01

    Terpenoids are the most diverse and abundant natural products among which sesterterpenes account for less than 2%, with very few reports on their biosynthesis. Ophiobolins are tricyclic 5–8–5 ring sesterterpenes with potential pharmaceutical application. Aspergillus ustus 094102 from mangrove rizhosphere produces ophiobolin and other terpenes. We obtained five gene cluster knockout mutants, with altered ophiobolin yield using genome sequencing and in silico analysis, combined with in vivo genetic manipulation. Involvement of the five gene clusters in ophiobolin synthesis was confirmed by investigation of the five key terpene synthesis relevant enzymes in each gene cluster, either by gene deletion and complementation or in vitro verification of protein function. The results demonstrate that ophiobolin skeleton biosynthesis involves five gene clusters, which are responsible for C15, C20, C25, and C30 terpenoid biosynthesis. PMID:27273151

  4. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods.

    PubMed

    Huttenhower, Curtis; Flamholz, Avi I; Landis, Jessica N; Sahi, Sauhard; Myers, Chad L; Olszewski, Kellen L; Hibbs, Matthew A; Siemers, Nathan O; Troyanskaya, Olga G; Coller, Hilary A

    2007-07-12

    The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a

  5. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

    PubMed Central

    Huttenhower, Curtis; Flamholz, Avi I; Landis, Jessica N; Sahi, Sauhard; Myers, Chad L; Olszewski, Kellen L; Hibbs, Matthew A; Siemers, Nathan O; Troyanskaya, Olga G; Coller, Hilary A

    2007-01-01

    Background The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). Results We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. Conclusion The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets

  6. Genomic analyses of bacterial porin-cytochrome gene clusters

    DOE PAGES

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteriamore » from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III) and Mn(IV) oxides.« less

  7. ORFcurator: molecular curation of genes and gene clusters in prokaryotic organisms.

    PubMed

    Rosenfeld, Jeffrey A; Sarkar, Indra N; Planet, Paul J; Figurski, David H; DeSalle, Rob

    2004-12-12

    The ability to detect clusters of functionally related genes in multiple microbial genomes has enormous potential for enhancing studies on gene function and microbial evolution. The staggering amount of new genome sequence data presents a largely untapped resource for gene cluster discovery. To date, gene cluster analysis has not been fully automated, and one must rely on manual, tedious and time-consuming manipulation of sequences. To facilitate accurate and rapid identification of conserved gene clusters, we developed a database-driven web application, called ORFcurator. We used ORFcurator to find clusters containing any genes similar to those of the 14-gene Widespread Colonization Island of Actinobacillus actinomycetemcomitans. From 126 genomes, ORFcurator identified all 73 clusters previously determined by manual searching. ORFcurator and all associated scripts are freely available as supplementary information. http://www.genomecurator.org/ORFcurator/

  8. Transcription factor clusters regulate genes in eukaryotic cells

    PubMed Central

    Hedlund, Erik G; Friemann, Rosmarie; Hohmann, Stefan

    2017-01-01

    Transcription is regulated through binding factors to gene promoters to activate or repress expression, however, the mechanisms by which factors find targets remain unclear. Using single-molecule fluorescence microscopy, we determined in vivo stoichiometry and spatiotemporal dynamics of a GFP tagged repressor, Mig1, from a paradigm signaling pathway of Saccharomyces cerevisiae. We find the repressor operates in clusters, which upon extracellular signal detection, translocate from the cytoplasm, bind to nuclear targets and turnover. Simulations of Mig1 configuration within a 3D yeast genome model combined with a promoter-specific, fluorescent translation reporter confirmed clusters are the functional unit of gene regulation. In vitro and structural analysis on reconstituted Mig1 suggests that clusters are stabilized by depletion forces between intrinsically disordered sequences. We observed similar clusters of a co-regulatory activator from a different pathway, supporting a generalized cluster model for transcription factors that reduces promoter search times through intersegment transfer while stabilizing gene expression. PMID:28841133

  9. Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca.

    PubMed

    Temme, Karsten; Zhao, Dehua; Voigt, Christopher A

    2012-05-01

    Bacterial genes associated with a single trait are often grouped in a contiguous unit of the genome known as a gene cluster. It is difficult to genetically manipulate many gene clusters because of complex, redundant, and integrated host regulation. We have developed a systematic approach to completely specify the genetics of a gene cluster by rebuilding it from the bottom up using only synthetic, well-characterized parts. This process removes all native regulation, including that which is undiscovered. First, all noncoding DNA, regulatory proteins, and nonessential genes are removed. The codons of essential genes are changed to create a DNA sequence as divergent as possible from the wild-type (WT) gene. Recoded genes are computationally scanned to eliminate internal regulation. They are organized into operons and placed under the control of synthetic parts (promoters, ribosome binding sites, and terminators) that are functionally separated by spacer parts. Finally, a controller consisting of genetic sensors and circuits regulates the conditions and dynamics of gene expression. We applied this approach to an agriculturally relevant gene cluster from Klebsiella oxytoca encoding the nitrogen fixation pathway for converting atmospheric N(2) to ammonia. The native gene cluster consists of 20 genes in seven operons and is encoded in 23.5 kb of DNA. We constructed a "refactored" gene cluster that shares little DNA sequence identity with WT and for which the function of every genetic part is defined. This work demonstrates the potential for synthetic biology tools to rewrite the genetics encoding complex biological functions to facilitate access, engineering, and transferability.

  10. Clustering Algorithms: Their Application to Gene Expression Data

    PubMed Central

    Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

    2016-01-01

    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867

  11. Differential retention of gene functions in a secondary metabolite cluster

    USDA-ARS?s Scientific Manuscript database

    In fungi, distribution of secondary metabolite (SM) gene clusters is often associated with host- or environment-specific benefits provided by the SMs. In the plant pathogen Alternaria brassicicola (Dothideomycetes), the DEP cluster confers an ability to synthesize the SM depudecin, a histone deacety...

  12. Why biosynthetic genes for chemical defense compounds cluster.

    PubMed

    Takos, Adam M; Rook, Fred

    2012-07-01

    In plants, the genomic clustering of non-homologous genes for the biosynthesis of chemical defense compounds is an emerging theme. Gene clustering is also observed for polymorphic sexual traits under balancing selection, and examples in plants are self-incompatibility and floral dimorphy. The chemical defense pathways organized as gene clusters are self-contained biosynthetic modules under opposing selection pressures and adaptive polymorphisms, often the presence or absence of a functional pathway, are observed in nature. We propose that these antagonistic selection pressures favor closer physical linkage between beneficially interacting alleles as the resulting reduction in recombination maintains a larger fraction of the fitter genotypes. Gene clusters promote the stable inheritance of functional chemical defense pathways in the dynamic ecological context of natural populations. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Entropy-based cluster validation and estimation of the number of clusters in gene expression data.

    PubMed

    Novoselova, Natalia; Tom, Igor

    2012-10-01

    Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.

  14. The human RHOX gene cluster: target genes and functional analysis of gene variants in infertile men.

    PubMed

    Borgmann, Jennifer; Tüttelmann, Frank; Dworniczak, Bernd; Röpke, Albrecht; Song, Hye-Won; Kliesch, Sabine; Wilkinson, Miles F; Laurentino, Sandra; Gromoll, Jörg

    2016-09-15

    The X-linked reproductive homeobox (RHOX) gene cluster encodes transcription factors preferentially expressed in reproductive tissues. This gene cluster has important roles in male fertility based on phenotypic defects of Rhox-mutant mice and the finding that aberrant RHOX promoter methylation is strongly associated with abnormal human sperm parameters. However, little is known about the molecular mechanism of RHOX function in humans. Using gene expression profiling, we identified genes regulated by members of the human RHOX gene cluster. Some genes were uniquely regulated by RHOXF1 or RHOXF2/2B, while others were regulated by both of these transcription factors. Several of these regulated genes encode proteins involved in processes relevant to spermatogenesis; e.g. stress protection and cell survival. One of the target genes of RHOXF2/2B is RHOXF1, suggesting cross-regulation to enhance transcriptional responses. The potential role of RHOX in human infertility was addressed by sequencing all RHOX exons in a group of 250 patients with severe oligozoospermia. This revealed two mutations in RHOXF1 (c.515G > A and c.522C > T) and four in RHOXF2/2B (-73C > G, c.202G > A, c.411C > T and c.679G > A), of which only one (c.202G > A) was found in a control group of men with normal sperm concentration. Functional analysis demonstrated that c.202G > A and c.679G > A significantly impaired the ability of RHOXF2/2B to regulate downstream genes. Molecular modelling suggested that these mutations alter RHOXF2/F2B protein conformation. By combining clinical data with in vitro functional analysis, we demonstrate how the X-linked RHOX gene cluster may function in normal human spermatogenesis and we provide evidence that it is impaired in human male fertility.

  15. Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis.

    PubMed

    Koh, Esther G L; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V; Brenner, Sydney; Venkatesh, Byrappa

    2003-02-04

    The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes.

  16. Clustering of genes necessary for hydrogen oxidation in Rhodobacter capsulatus.

    PubMed Central

    Xu, H W; Wall, J D

    1991-01-01

    Three cosmids previously shown to contain information necessary for the expression of uptake of hydrogenase in Rhodobacter capsulatus were found to be present in a cluster on the chromosome. Earlier genetic experiments suggested the presence of at least six genes essential for hydrogenase activity that are now shown to be in a region of approximately 18 kb that includes the structural genes for the enzyme. A potential response regulator gene was sequenced as a part of the hup gene region. PMID:2007559

  17. 3D visualization of gene clusters and networks

    NASA Astrophysics Data System (ADS)

    Zhang, Leishi; Sheng, Weiguo; Liu, Xiaohui

    2005-03-01

    In this paper, we try to provide a global view of DNA microarray gene expression data analysis and modeling process by combining novel and effective visualization techniques with data mining algorithms. An integrated framework has been proposed to model and visualize short, high-dimensional gene expression data. The framework reduces the dimensionality of variables before applying appropriate temporal modeling method. Prototype has been built using Java3D to visualize the framework. The prototype takes gene expression data as input, clusters the genes, displays the clustering results using a novel graph layout algorithm, models individual gene clusters using Dynamic Bayesian Network and then visualizes the modeling results using simple but effective visualization techniques.

  18. SMART: Unique Splitting-While-Merging Framework for Gene Clustering

    PubMed Central

    Fa, Rui; Roberts, David J.; Nandi, Asoke K.

    2014-01-01

    Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159

  19. Identifying a gene expression signature of cluster headache in blood

    PubMed Central

    Eising, Else; Pelzer, Nadine; Vijfhuizen, Lisanne S.; Vries, Boukje de; Ferrari, Michel D.; ‘t Hoen, Peter A. C.; Terwindt, Gisela M.; van den Maagdenberg, Arn M. J. M.

    2017-01-01

    Cluster headache is a relatively rare headache disorder, typically characterized by multiple daily, short-lasting attacks of excruciating, unilateral (peri-)orbital or temporal pain associated with autonomic symptoms and restlessness. To better understand the pathophysiology of cluster headache, we used RNA sequencing to identify differentially expressed genes and pathways in whole blood of patients with episodic (n = 19) or chronic (n = 20) cluster headache in comparison with headache-free controls (n = 20). Gene expression data were analysed by gene and by module of co-expressed genes with particular attention to previously implicated disease pathways including hypocretin dysregulation. Only moderate gene expression differences were identified and no associations were found with previously reported pathogenic mechanisms. At the level of functional gene sets, associations were observed for genes involved in several brain-related mechanisms such as GABA receptor function and voltage-gated channels. In addition, genes and modules of co-expressed genes showed a role for intracellular signalling cascades, mitochondria and inflammation. Although larger study samples may be required to identify the full range of involved pathways, these results indicate a role for mitochondria, intracellular signalling and inflammation in cluster headache. PMID:28074859

  20. Minimum spanning trees for gene expression data clustering.

    PubMed

    Xu, Y; Olman, V; Xu, D

    2001-01-01

    This paper describes a new framework for microarray gene-expression data clustering. The foundation of this framework is a minimum spanning tree (MST) representation of a set of multi-dimensional gene expression data. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXCAVATOR. To demonstrate its effectiveness, we have tested it on two data sets, i.e., expression data from yeast Saccharomyces cerevisiae, and Arabidopsis expression data in response to chitin elicitation.

  1. Gene Cluster Encoding Cholate Catabolism in Rhodococcus spp.

    PubMed Central

    Wilbrink, Maarten H.; Casabon, Israël; Stewart, Gordon R.; Liu, Jie; van der Geize, Robert; Eltis, Lindsay D.

    2012-01-01

    Bile acids are highly abundant steroids with important functions in vertebrate digestion. Their catabolism by bacteria is an important component of the carbon cycle, contributes to gut ecology, and has potential commercial applications. We found that Rhodococcus jostii RHA1 grows well on cholate, as well as on its conjugates, taurocholate and glycocholate. The transcriptome of RHA1 growing on cholate revealed 39 genes upregulated on cholate, occurring in a single gene cluster. Reverse transcriptase quantitative PCR confirmed that selected genes in the cluster were upregulated 10-fold on cholate versus on cholesterol. One of these genes, kshA3, encoding a putative 3-ketosteroid-9α-hydroxylase, was deleted and found essential for growth on cholate. Two coenzyme A (CoA) synthetases encoded in the cluster, CasG and CasI, were heterologously expressed. CasG was shown to transform cholate to cholyl-CoA, thus initiating side chain degradation. CasI was shown to form CoA derivatives of steroids with isopropanoyl side chains, likely occurring as degradation intermediates. Orthologous gene clusters were identified in all available Rhodococcus genomes, as well as that of Thermomonospora curvata. Moreover, Rhodococcus equi 103S, Rhodococcus ruber Chol-4 and Rhodococcus erythropolis SQ1 each grew on cholate. In contrast, several mycolic acid bacteria lacking the gene cluster were unable to grow on cholate. Our results demonstrate that the above-mentioned gene cluster encodes cholate catabolism and is distinct from a more widely occurring gene cluster encoding cholesterol catabolism. PMID:23024343

  2. Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation.

    PubMed

    Osbourn, Anne

    2010-10-01

    Microbes and plants produce a huge array of secondary metabolites that have important ecological functions. These molecules have long been exploited in medicine as antibiotics, anticancer and anti-infective agents and for a wide range of other applications. Gene clusters for secondary metabolic pathways are common in bacteria and filamentous fungi, and examples have now been discovered in plants. Here, current knowledge of gene clusters across the kingdoms is evaluated with the aim of trying to understand the rules behind cluster existence and evolution. Such knowledge will be crucial in learning how to activate the enormous number of 'silent' gene clusters being revealed by whole-genome sequencing and hence in making available a wealth of novel compounds for evaluation as drug leads and other bioactives. It could also facilitate the development of crop plants with enhanced pest or disease resistance, improved nutritional qualities and/or elevated levels of high-value products.

  3. Clustering gene expression data using a diffraction‐inspired framework

    PubMed Central

    2012-01-01

    Background The recent developments in microarray technology has allowed for the simultaneous measurement of gene expression levels. The large amount of captured data challenges conventional statistical tools for analysing and finding inherent correlations between genes and samples. The unsupervised clustering approach is often used, resulting in the development of a wide variety of algorithms. Typical clustering algorithms require selecting certain parameters to operate, for instance the number of expected clusters, as well as defining a similarity measure to quantify the distance between data points. The diffraction‐based clustering algorithm however is designed to overcome this necessity for user‐defined parameters, as it is able to automatically search the data for any underlying structure. Methods The diffraction‐based clustering algorithm presented in this paper is tested using five well‐known expression datasets pertaining to cancerous tissue samples. The clustering results are then compared to those results obtained from conventional algorithms such as the k‐means, fuzzy c‐means, self‐organising map, hierarchical clustering algorithm, Gaussian mixture model and density‐based spatial clustering of applications with noise (DBSCAN). The performance of each algorithm is measured using an average external criterion and an average validity index. Results The diffraction‐based clustering algorithm is shown to be independent of the number of clusters as the algorithm searches the feature space and requires no form of parameter selection. The results show that the diffraction‐based clustering algorithm performs significantly better on the real biological datasets compared to the other existing algorithms. Conclusion The results of the diffraction‐based clustering algorithm presented in this paper suggest that the method can provide researchers with a new tool for successfully analysing microarray data. PMID:23164195

  4. Characterization of the Largest Effector Gene Cluster of Ustilago maydis

    PubMed Central

    Vincon, Volker; Kahmann, Regine

    2014-01-01

    In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function. PMID:24992561

  5. Characterization of the largest effector gene cluster of Ustilago maydis.

    PubMed

    Brefort, Thomas; Tanaka, Shigeyuki; Neidig, Nina; Doehlemann, Gunther; Vincon, Volker; Kahmann, Regine

    2014-07-01

    In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function.

  6. Heterologous Expression of Novobiocin and Clorobiocin Biosynthetic Gene Clusters

    PubMed Central

    Eustáquio, Alessandra S.; Gust, Bertolt; Galm, Ute; Li, Shu-Ming; Chater, Keith F.; Heide, Lutz

    2005-01-01

    A method was developed for the heterologous expression of biosynthetic gene clusters in different Streptomyces strains and for the modification of these clusters by single or multiple gene replacements or gene deletions with unprecedented speed and versatility. λ-Red-mediated homologous recombination was used for genetic modification of the gene clusters, and the attachment site and integrase of phage φC31 were employed for the integration of these clusters into the heterologous hosts. This method was used to express the gene clusters of the aminocoumarin antibiotics novobiocin and clorobiocin in the well-studied strains Streptomyces coelicolor and Streptomyces lividans, which, in contrast to the natural producers, can be easily genetically manipulated. S. coelicolor M512 derivatives produced the respective antibiotic in yields comparable to those of natural producer strains, whereas S. lividans TK24 derivatives were at least five times less productive. This method could also be used to carry out functional investigations. Shortening of the cosmids' inserts showed which genes are essential for antibiotic production. PMID:15870333

  7. An improved algorithm for clustering gene expression data.

    PubMed

    Bandyopadhyay, Sanghamitra; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2007-11-01

    Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.

  8. Cluster J Mycobacteriophages: Intron Splicing in Capsid and Tail Genes

    PubMed Central

    Pope, Welkin H.; Jacobs-Sera, Deborah; Best, Aaron A.; Broussard, Gregory W.; Connerly, Pamela L.; Dedrick, Rebekah M.; Kremer, Timothy A.; Offner, Susan; Ogiefo, Amenawon H.; Pizzorno, Marie C.; Rockenbach, Kate; Russell, Daniel A.; Stowe, Emily L.; Stukey, Joseph; Thibault, Sarah A.; Conway, James F.; Hendrix, Roger W.; Hatfull, Graham F.

    2013-01-01

    Bacteriophages isolated on Mycobacterium smegmatis mc2155 represent many distinct genomes sharing little or no DNA sequence similarity. The genomes are architecturally mosaic and are replete with genes of unknown function. A new group of genomes sharing substantial nucleotide sequences constitute Cluster J. The six mycobacteriophages forming Cluster J are morphologically members of the Siphoviridae, but have unusually long genomes ranging from 106.3 to 117 kbp. Reconstruction of the capsid by cryo-electron microscopy of mycobacteriophage BAKA reveals an icosahedral structure with a triangulation number of 13. All six phages are temperate and homoimmune, and prophage establishment involves integration into a tRNA-Leu gene not previously identified as a mycobacterial attB site for phage integration. The Cluster J genomes provide two examples of intron splicing within the virion structural genes, one in a major capsid subunit gene, and one in a tail gene. These genomes also contain numerous free-standing HNH homing endonuclease, and comparative analysis reveals how these could contribute to genome mosaicism. The unusual Cluster J genomes provide new insights into phage genome architecture, gene function, capsid structure, gene mobility, intron splicing, and evolution. PMID:23874930

  9. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    PubMed Central

    Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417

  10. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    PubMed

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  11. Clustered Genes Involved in Cyclopiazonic Acid Production are Next to the Aflatoxin Biosynthesis Gene Cluster in Aspergillus flavus

    USDA-ARS?s Scientific Manuscript database

    Cyclopiazonic acid (CPA), an indole-tetramic acid toxin, is produced by many species of Aspergillus and Penicillium. In addition to CPA Aspergillus flavus produces polyketide-derived carcinogenic aflatoxins (AFs). AF biosynthesis genes form a gene cluster in a subtelomeric region. Isolates of A. fla...

  12. Cloning large natural product gene clusters from the environment: Piecing environmental DNA gene clusters back together with TAR

    PubMed Central

    Kim, Jeffrey H; Feng, Zhiyang; Bauer, John D; Kallifidas, Dimitris; Calle, Paula Y; Brady, Sean F

    2010-01-01

    A single gram of soil can contain thousands of unique bacterial species, of which only a small fraction is regularly cultured in the laboratory. Although the fermentation of cultured microorganisms has provided access to numerous bioactive secondary metabolites, with these same methods it is not possible to characterize the natural products encoded by the uncultured majority. The heterologous expression of biosynthetic gene clusters cloned from DNA extracted directly from environmental samples (eDNA) has the potential to provide access to the chemical diversity encoded in the genomes of uncultured bacteria. One of the challenges facing this approach has been that many natural product biosynthetic gene clusters are too large to be readily captured on a single fragment of cloned eDNA. The reassembly of large eDNA-derived natural product gene clusters from collections of smaller overlapping clones represents one potential solution to this problem. Unfortunately, traditional methods for the assembly of large DNA sequences from multiple overlapping clones can be technically challenging. Here we present a general experimental framework that permits the recovery of large natural product biosynthetic gene clusters on overlapping soil-derived eDNA cosmid clones and the reassembly of these large gene clusters using transformation-associated recombination (TAR) in Saccharomyces cerevisiae. The development of practical methods for the rapid assembly of biosynthetic gene clusters from collections of overlapping eDNA clones is an important step toward being able to functionally study larger natural product gene clusters from uncultured bacteria. © 2010 Wiley Periodicals, Inc. Biopolymers 93: 833–844, 2010. PMID:20577994

  13. Genomic analyses of bacterial porin-cytochrome gene clusters

    SciTech Connect

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteria from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular

  14. Transcriptional regulation of the novobiocin biosynthetic gene cluster.

    PubMed

    Dangel, Volker; Härle, Johannes; Goerke, Christiane; Wolz, Christiane; Gust, Bertolt; Pernodet, Jean-Luc; Heide, Lutz

    2009-12-01

    The aminocoumarin antibiotic novobiocin is a gyrase inhibitor formed by a Streptomyces strain. The biosynthetic gene cluster of novobiocin spans 23.4 kb and contains 20 coding sequences, among them the two regulatory genes novE and novG. We investigated the location of transcriptional promoters within this cluster by insertion of transcriptional terminator cassettes and RT-PCR analysis of the resulting mutants. The cluster was found to contain eight DNA regions with promoter activity. The regulatory protein NovG binds to a previously identified binding site within the promoter region located upstream of novH, but apparently not to any of the other seven promoters. Quantitative real-time PCR was used to compare the number of transcripts in a strain carrying an intact novobiocin cluster with strains carrying mutated clusters. Both in-frame deletion of the regulatory gene novG and insertion of a terminator cassette into the biosynthetic gene novH led to a strong reduction of the number of transcripts of the genes located between novH and novW. This suggested that these 16 biosynthetic genes form a single operon. Three internal promoters are located within this operon but appear to be of minor importance, if any, under our experimental conditions. Transcription of novG was found to depend on the presence of NovE, suggesting that the two regulatory genes, novE and novG, act in a cascade-like mechanism. The resistance gene gyrB(R), encoding an aminocoumarin-resistant gyrase B subunit, may initially be co-transcribed with the genes from novH to novW. However, when the gyrase inhibitor novobiocin accumulates in the cultures, gyrB(R) is transcribed from its own promoter. Previous work has suggested that this promoter is controlled by the superhelical density of chromosomal DNA.

  15. Interpolation based consensus clustering for gene expression time series.

    PubMed

    Chiu, Tai-Yu; Hsu, Ting-Chieh; Yen, Chia-Cheng; Wang, Jia-Shung

    2015-04-16

    Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

  16. An Agent-Based Clustering Approach for Gene Selection in Gene Expression Microarray.

    PubMed

    Ramos, Juan; Castellanos-Garzón, José A; González-Briones, Alfonso; de Paz, Juan F; Corchado, Juan M

    2017-03-09

    Gene selection is a major research area in microarray analysis, which seeks to discover differentially expressed genes for a particular target annotation. Such genes also often called informative genes are able to differentiate tissue samples belonging to different classes of the studied disease. Despite the fact that there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This research proposes a gene selection approach by means of a clustering-based multi-agent system. This proposal manages different filter methods and gene clustering through coordinated agents to discover informative gene subsets. To assess the reliability of our approach, we have used four important and public gene expression datasets, two Lung cancer datasets, Colon and Leukemia cancer dataset. The achieved results have been validated through cluster validity measures, visual analytics, a classifier and compared with other gene selection methods, proving the reliability of our proposal.

  17. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters.

    PubMed

    Santini, Simona; Boore, Jeffrey L; Meyer, Axel

    2003-06-01

    Comparisons of DNA sequences among evolutionarily distantly related genomes permit identification of conserved functional regions in noncoding DNA. Hox genes are highly conserved in vertebrates, occur in clusters, and are uninterrupted by other genes. We aligned (PipMaker) the nucleotide sequences of the HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human, and mouse, which are separated by approximately 500 million years of evolution. In support of our approach, several identified putative regulatory elements known to regulate the expression of Hox genes were recovered. The majority of the newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac database). The regulatory intergenic regions located between the genes that are expressed most anteriorly in the embryo are longer and apparently more evolutionarily conserved than those at the other end of Hox clusters. Different presumed regulatory sequences are retained in either the Aalpha or Abeta duplicated Hox clusters in the fish lineages. This suggests that the conserved elements are involved in different gene regulatory networks and supports the duplication-deletion-complementation model of functional divergence of duplicated genes.

  18. Evolutionary Conservation of Regulatory Elements in Vertebrate Hox Gene Clusters

    PubMed Central

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-01-01

    Comparisons of DNA sequences among evolutionarily distantly related genomes permit identification of conserved functional regions in noncoding DNA. Hox genes are highly conserved in vertebrates, occur in clusters, and are uninterrupted by other genes. We aligned (PipMaker) the nucleotide sequences of the HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human, and mouse, which are separated by approximately 500 million years of evolution. In support of our approach, several identified putative regulatory elements known to regulate the expression of Hox genes were recovered. The majority of the newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac database). The regulatory intergenic regions located between the genes that are expressed most anteriorly in the embryo are longer and apparently more evolutionarily conserved than those at the other end of Hox clusters. Different presumed regulatory sequences are retained in either the Aα or Aβ duplicated Hox clusters in the fish lineages. This suggests that the conserved elements are involved in different gene regulatory networks and supports the duplication-deletion-complementation model of functional divergence of duplicated genes. PMID:12799348

  19. Accurate prediction of secondary metabolite gene clusters in filamentous fungi.

    PubMed

    Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H

    2013-01-02

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.

  20. DNA methylation profiling identifies CG methylation clusters in Arabidopsis genes.

    PubMed

    Tran, Robert K; Henikoff, Jorja G; Zilberman, Daniel; Ditt, Renata F; Jacobsen, Steven E; Henikoff, Steven

    2005-01-26

    Cytosine DNA methylation in vertebrates is widespread, but methylation in plants is found almost exclusively at transposable elements and repetitive DNA. Within regions of methylation, methylcytosines are typically found in CG, CNG, and asymmetric contexts. CG sites are maintained by a plant homolog of mammalian Dnmt1 acting on hemi-methylated DNA after replication. Methylation of CNG and asymmetric sites appears to be maintained at each cell cycle by other mechanisms. We report a new type of DNA methylation in Arabidopsis, dense CG methylation clusters found at scattered sites throughout the genome. These clusters lack non-CG methylation and are preferentially found in genes, although they are relatively deficient toward the 5' end. CG methylation clusters are present in lines derived from different accessions and in mutants that eliminate de novo methylation, indicating that CG methylation clusters are stably maintained at specific sites. Because 5-methylcytosine is mutagenic, the appearance of CG methylation clusters over evolutionary time predicts a genome-wide deficiency of CG dinucleotides and an excess of C(A/T)G trinucleotides within transcribed regions. This is exactly what we find, implying that CG methylation clusters have contributed profoundly to plant gene evolution. We suggest that CG methylation clusters silence cryptic promoters that arise sporadically within transcription units.

  1. Phage cluster relationships identified through single gene analysis

    PubMed Central

    2013-01-01

    Background Phylogenetic comparison of bacteriophages requires whole genome approaches such as dotplot analysis, genome pairwise maps, and gene content analysis. Currently mycobacteriophages, a highly studied phage group, are categorized into related clusters based on the comparative analysis of whole genome sequences. With the recent explosion of phage isolation, a simple method for phage cluster prediction would facilitate analysis of crude or complex samples without whole genome isolation and sequencing. The hypothesis of this study was that mycobacteriophage-cluster prediction is possible using comparison of a single, ubiquitous, semi-conserved gene. Tape Measure Protein (TMP) was selected to test the hypothesis because it is typically the longest gene in mycobacteriophage genomes and because regions within the TMP gene are conserved. Results A single gene, TMP, identified the known Mycobacteriophage clusters and subclusters using a Gepard dotplot comparison or a phylogenetic tree constructed from global alignment and maximum likelihood comparisons. Gepard analysis of 247 mycobacteriophage TMP sequences appropriately recovered 98.8% of the subcluster assignments that were made by whole-genome comparison. Subcluster-specific primers within TMP allow for PCR determination of the mycobacteriophage subcluster from DNA samples. Using the single-gene comparison approach for siphovirus coliphages, phage groupings by TMP comparison reflected relationships observed in a whole genome dotplot comparison and confirm the potential utility of this approach to another widely studied group of phages. Conclusions TMP sequence comparison and PCR results support the hypothesis that a single gene can be used for distinguishing phage cluster and subcluster assignments. TMP single-gene analysis can quickly and accurately aid in mycobacteriophage classification. PMID:23777341

  2. Ontology-Driven Co-clustering of Gene Expression Data

    NASA Astrophysics Data System (ADS)

    Cordero, Francesca; Pensa, Ruggero G.; Visconti, Alessia; Ienco, Dino; Botta, Marco

    The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters.

  3. The Biosynthetic Gene Cluster for Andrastin A in Penicillium roqueforti.

    PubMed

    Rojas-Aedo, Juan F; Gil-Durán, Carlos; Del-Cid, Abdiel; Valdés, Natalia; Álamos, Pamela; Vaca, Inmaculada; García-Rico, Ramón O; Levicán, Gloria; Tello, Mario; Chávez, Renato

    2017-01-01

    Penicillium roqueforti is a filamentous fungus involved in the ripening of several kinds of blue cheeses. In addition, this fungus produces several secondary metabolites, including the meroterpenoid compound andrastin A, a promising antitumoral compound. However, to date the genomic cluster responsible for the biosynthesis of this compound in P. roqueforti has not been described. In this work, we have sequenced and annotated a genomic region of approximately 29.4 kbp (named the adr gene cluster) that is involved in the biosynthesis of andrastin A in P. roqueforti. This region contains ten genes, named adrA, adrC, adrD, adrE, adrF, adrG, adrH, adrI, adrJ and adrK. Interestingly, the adrB gene previously found in the adr cluster from P. chrysogenum, was found as a residual pseudogene in the adr cluster from P. roqueforti. RNA-mediated gene silencing of each of the ten genes resulted in significant reductions in andrastin A production, confirming that all of them are involved in the biosynthesis of this compound. Of particular interest was the adrC gene, encoding for a major facilitator superfamily transporter. According to our results, this gene is required for the production of andrastin A but does not have any role in its secretion to the extracellular medium. The identification of the adr cluster in P. roqueforti will be important to understand the molecular basis of the production of andrastin A, and for the obtainment of strains of P. roqueforti overproducing andrastin A that might be of interest for the cheese industry.

  4. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering

    PubMed Central

    Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample. PMID:27764138

  5. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    PubMed

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  6. Evolutionary ecology of beta-lactam gene clusters in animals.

    PubMed

    Suring, Wouter; Meusemann, Karen; Blanke, Alexander; Mariën, Janine; Schol, Tim; Agamennone, Valeria; Faddeeva-Vakhrusheva, Anna; Berg, Matty P; Brouwer, Abraham; van Straalen, Nico M; Roelofs, Dick

    2017-06-01

    Beta-lactam biosynthesis was thought to occur only in fungi and bacteria, but we recently reported the presence of isopenicillin N synthase in a soil-dwelling animal, Folsomia candida. However, it has remained unclear whether this gene is part of a larger beta-lactam biosynthesis pathway and how widespread the occurrence of penicillin biosynthesis is among animals. Here, we analysed the distribution of beta-lactam biosynthesis genes throughout the animal kingdom and identified a beta-lactam gene cluster in the genome of F. candida (Collembola), consisting of isopenicillin N synthase (IPNS), δ-(L-α-aminoadipoyl)-L-cysteinyl-D-valine synthetase (ACVS), and two cephamycin C genes (cmcI and cmcJ) on a genomic scaffold of 0.76 Mb. All genes are transcriptionally active and are inducible by stress (heat shock). A beta-lactam compound was detected in vivo using an ELISA beta-lactam assay. The gene cluster also contains an ABC transporter which is coregulated with IPNS and ACVS after heat shock. Furthermore, we show that different combinations of beta-lactam biosynthesis genes are present in over 60% of springtail families, but they are absent from genome- and transcript libraries of other animals including close relatives of springtails (Protura, Diplura and insects). The presence of beta-lactam genes is strongly correlated with an euedaphic (soil-living) lifestyle. Beta-lactam genes IPNS and ACVS each form a phylogenetic clade in between bacteria and fungi, while cmcI and cmcJ genes cluster within bacteria. This suggests a single horizontal gene transfer event most probably from a bacterial host, followed by differential loss in more recently evolving species. © 2017 John Wiley & Sons Ltd.

  7. The use of gene clusters to infer functional coupling

    PubMed Central

    Overbeek, Ross; Fonstein, Michael; D’Souza, Mark; Pusch, Gordon D.; Maltsev, Natalia

    1999-01-01

    Previously, we presented evidence that it is possible to predict functional coupling between genes based on conservation of gene clusters between genomes. With the rapid increase in the availability of prokaryotic sequence data, it has become possible to verify and apply the technique. In this paper, we extend our characterization of the parameters that determine the utility of the approach, and we generalize the approach in a way that supports detection of common classes of functionally coupled genes (e.g., transport and signal transduction clusters). Now that the analysis includes over 30 complete or nearly complete genomes, it has become clear that this approach will play a significant role in supporting efforts to assign functionality to the remaining uncharacterized genes in sequenced genomes. PMID:10077608

  8. Gene clustering by latent semantic indexing of MEDLINE abstracts.

    PubMed

    Homayouni, Ramin; Heinrich, Kevin; Wei, Lai; Berry, Michael W

    2005-01-01

    A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.

  9. Gene Clusters, Molecular Evolution and Disease: A Speculation

    PubMed Central

    Elizondo, Leah I; Jafar-Nejad, Paymaan; Clewing, J. Marietta; Boerkoel, Cornelius F

    2009-01-01

    Traditionally eukaryotic genes are considered independently expressed under the control of their promoters and cis-regulatory domains. However, recent studies in worms, flies, mice and humans have shown that genes co-habiting a chromatin domain or “genomic neighborhood” are frequently co-expressed. Often these co-expressed genes neither constitute part of an operon nor function within the same biological pathway. The mechanisms underlying the partitioning of the genome into transcriptional genomic neighborhoods are poorly defined. However, cross-species analyses find that the linkage among the co-expressed genes of these clusters is significantly conserved and that the expression patterns of genes within clusters have coevolved with the clusters. Such selection could be mediated by chromatin interactions with the nuclear matrix and long-range remodeling of chromatin structure. In the context of human disease, we propose that dysregulation of gene expression across genomic neighborhoods will cause highly pleiotropic diseases. Candidate genomic neighborhood diseases include the nuclear laminopathies, chromosomal translocations and genomic instability disorders, imprinting disorders of errant insulator function, syndromes from impaired cohesin complex assembly, as well as diseases of global covalent histone modifications and DNA methylation. The alteration of transcriptional genomic neighborhoods provides an exciting and novel model for studying epigenetic alterations as quantitative traits in complex common human diseases. PMID:19721813

  10. Generalized gene adjacencies, graph bandwidth, and clusters in yeast evolution.

    PubMed

    Zhu, Qian; Adam, Zaky; Choi, Vicky; Sankoff, David

    2009-01-01

    We present a parameterized definition of gene clusters that allows us to control the emphasis placed on conserved order within a cluster. Though motivated by biological rather than mathematical considerations, this parameter turns out to be closely related to the bandwidth parameter of a graph. Our focus will be on how this parameter affects the characteristics of clusters: how numerous they are, how large they are, how rearranged they are, and to what extent they are preserved from ancestor to descendant in a phylogenetic tree. We infer the latter property by dynamic programming optimization of the presence of individual edges at the ancestral nodes of the phylogeny. We apply our analysis to a set of genomes drawn from the Yeast Gene Order Browser.

  11. Cloning and Heterologous Expression of the Grecocycline Biosynthetic Gene Cluster

    PubMed Central

    Bilyk, Oksana; Sekurova, Olga N.; Zotchev, Sergey B.; Luzhetskyy, Andriy

    2016-01-01

    Transformation-associated recombination (TAR) in yeast is a rapid and inexpensive method for cloning and assembly of large DNA fragments, which relies on natural homologous recombination. Two vectors, based on p15a and F-factor replicons that can be maintained in yeast, E. coli and streptomycetes have been constructed. These vectors have been successfully employed for assembly of the grecocycline biosynthetic gene cluster from Streptomyces sp. Acta 1362. Fragments of the cluster were obtained by PCR and transformed together with the “capture” vector into the yeast cells, yielding a construct carrying the entire gene cluster. The obtained construct was heterologously expressed in S. albus J1074, yielding several grecocycline congeners. Grecocyclines have unique structural moieties such as a dissacharide side chain, an additional amino sugar at the C-5 position and a thiol group. Enzymes from this pathway may be used for the derivatization of known active angucyclines in order to improve their desired biological properties. PMID:27410036

  12. PEACE: Parallel Environment for Assembly and Clustering of Gene Expression.

    PubMed

    Rao, D M; Moler, J C; Ozden, M; Zhang, Y; Liang, C; Karro, J E

    2010-07-01

    We present PEACE, a stand-alone tool for high-throughput ab initio clustering of transcript fragment sequences produced by Next Generation or Sanger Sequencing technologies. It is freely available from www.peace-tools.org. Installed and managed through a downloadable user-friendly graphical user interface (GUI), PEACE can process large data sets of transcript fragments of length 50 bases or greater, grouping the fragments by gene associations with a sensitivity comparable to leading clustering tools. Once clustered, the user can employ the GUI's analysis functions, facilitating the easy collection of statistics and allowing them to single out specific clusters for more comprehensive study or assembly. Using a novel minimum spanning tree-based clustering method, PEACE is the equal of leading tools in the literature, with an interface making it accessible to any user. It produces results of quality virtually identical to those of the WCD tool when applied to Sanger sequences, significantly improved results over WCD and TGICL when applied to the products of Next Generation Sequencing Technology and significantly improved results over Cap3 in both cases. In short, PEACE provides an intuitive GUI and a feature-rich, parallel clustering engine that proves to be a valuable addition to the leading cDNA clustering tools.

  13. Clustered Xenopus keratin genes: A genomic, transcriptomic, and proteomic analysis.

    PubMed

    Suzuki, Ken-Ichi T; Suzuki, Miyuki; Shigeta, Mitsuki; Fortriede, Joshua D; Takahashi, Shuji; Mawaribuchi, Shuuji; Yamamoto, Takashi; Taira, Masanori; Fukui, Akimasa

    2017-06-15

    Keratin genes belong to the intermediate filament superfamily and their expression is altered following morphological and physiological changes in vertebrate epithelial cells. Keratin genes are divided into two groups, type I and II, and are clustered on vertebrate genomes, including those of Xenopus species. Various keratin genes have been identified and characterized by their unique expression patterns throughout ontogeny in Xenopus laevis; however, compilation of previously reported and newly identified keratin genes in two Xenopus species is required for our further understanding of keratin gene evolution, not only in amphibians but also in all terrestrial vertebrates. In this study, 120 putative type I and II keratin genes in total were identified based on the genome data from two Xenopus species. We revealed that most of these genes are highly clustered on two homeologous chromosomes, XLA9_10 and XLA2 in X. laevis, and XTR10 and XTR2 in X. tropicalis, which are orthologous to those of human, showing conserved synteny among tetrapods. RNA-Seq data from various embryonic stages and adult tissues highlighted the unique expression profiles of orthologous and homeologous keratin genes in developmental stage- and tissue-specific manners. Moreover, we identified dozens of epidermal keratin proteins from the whole embryo, larval skin, tail, and adult skin using shotgun proteomics. In light of our results, we discuss the radiation, diversification, and unique expression of the clustered keratin genes, which are closely related to epidermal development and terrestrial adaptation during amphibian evolution, including Xenopus speciation. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. An alanine tRNA gene cluster from Nephila clavipes.

    PubMed

    Luciano, E; Candelas, G C

    1996-06-01

    We report the sequence of a 2.3-kb genomic DNA fragment from the orb-web spider, Nephila clavipes (Nc). The fragment contains four regions of high homology to tRNA(Ala). The members of this irregularly spaced cluster of genes are oriented in the same direction and have the same anticodon (GCA), but their sequence differs at several positions. Initiation and termination signals, as well as consensus intragenic promoter sequences characteristic of tRNA genes, have been identified in all genes. tRNA(Ala) are involved in the regulation of the fibroin synthesis in the large ampullate Nc glands.

  15. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    SciTech Connect

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involved in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.

  16. Expression profile based gene clusters for ischemic stroke detection.

    PubMed

    Adamski, Mateusz G; Li, Yan; Wagner, Erin; Yu, Hua; Seales-Bailey, Chloe; Soper, Steven A; Murphy, Michael; Baird, Alison E

    2014-09-01

    In microarray studies alterations in gene expression in circulating leukocytes have shown utility for ischemic stroke diagnosis. We studied forty candidate markers identified in three gene expression profiles to (1) quantitate individual transcript expression, (2) identify transcript clusters and (3) assess the clinical diagnostic utility of the clusters identified for ischemic stroke detection. Using high throughput next generation qPCR 16 of the 40 transcripts were significantly up-regulated in stroke patients relative to control subjects (p<0.05). Six clusters of between 5 and 7 transcripts were identified that discriminated between stroke and control (p values between 1.01e-9 and 0.03). A 7 transcript cluster containing PLBD1, PYGL, BST1, DUSP1, FOS, VCAN and FCGR1A showed high accuracy for stroke classification (AUC=0.854). These results validate and improve upon the diagnostic value of transcripts identified in microarray studies for ischemic stroke. The clusters identified show promise for acute ischemic stroke detection. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

    PubMed

    Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

    2017-02-01

    Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.

  18. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

    PubMed Central

    Poole, William; Leinonen, Kalle; Shmulevich, Ilya

    2017-01-01

    Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390

  19. Time is of the essence for ParaHox homeobox gene clustering

    PubMed Central

    2013-01-01

    ParaHox genes, and their evolutionary sisters the Hox genes, are integral to patterning the anterior-posterior axis of most animals. Like the Hox genes, ParaHox genes can be clustered and exhibit the phenomenon of colinearity - gene order within the cluster matching gene activation. Two new instances of ParaHox clustering provide the first examples of intact clusters outside chordates, with gene expression lending weight to the argument that temporal colinearity is the key to understanding clustering. See research articles: http://www.biomedcentral.com/1741-7007/11/68 and http://www.biomedcentral.com/1471-2148/13/129 PMID:23803337

  20. Combining gene annotations and gene expression data in model-based clustering: weighted method.

    PubMed

    Huang, Desheng; Wei, Peng; Pan, Wei

    2006-01-01

    It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method.

  1. Evolution of chemical diversity by coordinated gene swaps in type II polyketide gene clusters.

    PubMed

    Hillenmeyer, Maureen E; Vandova, Gergana A; Berlew, Erin E; Charkoudian, Louise K

    2015-11-10

    Natural product biosynthetic pathways generate molecules of enormous structural complexity and exquisitely tuned biological activities. Studies of natural products have led to the discovery of many pharmaceutical agents, particularly antibiotics. Attempts to harness the catalytic prowess of biosynthetic enzyme systems, for both compound discovery and engineering, have been limited by a poor understanding of the evolution of the underlying gene clusters. We developed an approach to study the evolution of biosynthetic genes on a cluster-wide scale, integrating pairwise gene coevolution information with large-scale phylogenetic analysis. We used this method to infer the evolution of type II polyketide gene clusters, tracing the path of evolution from the single ancestor to those gene clusters surviving today. We identified 10 key gene types in these clusters, most of which were swapped in from existing cellular processes and subsequently specialized. The ancestral type II polyketide gene cluster likely comprised a core set of five genes, a roster that expanded and contracted throughout evolution. A key C24 ancestor diversified into major classes of longer and shorter chain length systems, from which a C20 ancestor gave rise to the majority of characterized type II polyketide antibiotics. Our findings reveal that (i) type II polyketide structure is predictable from its gene roster, (ii) only certain gene combinations are compatible, and (iii) gene swaps were likely a key to evolution of chemical diversity. The lessons learned about how natural selection drives polyketide chemical innovation can be applied to the rational design and guided discovery of chemicals with desired structures and properties.

  2. Expression profile based gene clusters for ischemic stroke detection Whole blood gene clusters for ischemic stroke detection

    PubMed Central

    Adamski, Mateusz G; Li, Yan; Wagner, Erin; Yu, Hua; Seales-Bailey, Chloe; Soper, Steven A; Murphy, Michael; Baird, Alison E

    2014-01-01

    In microarray studies alterations in gene expression in circulating leukocytes have shown utility for ischemic stroke diagnosis. We studied forty candidate markers identified in three gene expression profiles to (1) quantitate individual transcript expression, (2) identify transcript clusters and (3) assess the clinical diagnostic utility of the clusters identified for ischemic stroke detection. Using high throughput next generation qPCR 16 of the 40 transcripts were significantly up-regulated in stroke patients relative to control subjects (p<0.05). Six clusters of between 5 and 7 transcripts discriminated between stroke and control (p values between 1.01e-9 and 0.03). A 7 transcript cluster containing PLBD1, PYGL, BST1, DUSP1, FOS, VCAN and FCGR1A showed high accuracy for stroke classification (AUC=0.854). These results validate and improve upon the diagnostic value of transcripts identified in microarray studies for ischemic stroke. The clusters identified show promise for acute ischemic stroke detection. PMID:25135788

  3. Transcription mediated insulation and interference direct gene cluster expression switches

    PubMed Central

    Nguyen, Tania; Brown, David; Murray, Struan C; Haenni, Simon; Halstead, James M; O'Connor, Leigh; Shipkovenska, Gergana; Steinmetz, Lars M; Mellor, Jane

    2014-01-01

    In yeast, many tandemly arranged genes show peak expression in different phases of the metabolic cycle (YMC) or in different carbon sources, indicative of regulation by a bi-modal switch, but it is not clear how these switches are controlled. Using native elongating transcript analysis (NET-seq), we show that transcription itself is a component of bi-modal switches, facilitating reciprocal expression in gene clusters. HMS2, encoding a growth-regulated transcription factor, switches between sense- or antisense-dominant states that also coordinate up- and down-regulation of transcription at neighbouring genes. Engineering HMS2 reveals alternative mono-, di- or tri-cistronic and antisense transcription units (TUs), using different promoter and terminator combinations, that underlie state-switching. Promoters or terminators are excluded from functional TUs by read-through transcriptional interference, while antisense TUs insulate downstream genes from interference. We propose that the balance of transcriptional insulation and interference at gene clusters facilitates gene expression switches during intracellular and extracellular environmental change. DOI: http://dx.doi.org/10.7554/eLife.03635.001 PMID:25407679

  4. Transcription mediated insulation and interference direct gene cluster expression switches.

    PubMed

    Nguyen, Tania; Fischl, Harry; Howe, Françoise S; Woloszczuk, Ronja; Serra Barros, Ana; Xu, Zhenyu; Brown, David; Murray, Struan C; Haenni, Simon; Halstead, James M; O'Connor, Leigh; Shipkovenska, Gergana; Steinmetz, Lars M; Mellor, Jane

    2014-11-19

    In yeast, many tandemly arranged genes show peak expression in different phases of the metabolic cycle (YMC) or in different carbon sources, indicative of regulation by a bi-modal switch, but it is not clear how these switches are controlled. Using native elongating transcript analysis (NET-seq), we show that transcription itself is a component of bi-modal switches, facilitating reciprocal expression in gene clusters. HMS2, encoding a growth-regulated transcription factor, switches between sense- or antisense-dominant states that also coordinate up- and down-regulation of transcription at neighbouring genes. Engineering HMS2 reveals alternative mono-, di- or tri-cistronic and antisense transcription units (TUs), using different promoter and terminator combinations, that underlie state-switching. Promoters or terminators are excluded from functional TUs by read-through transcriptional interference, while antisense TUs insulate downstream genes from interference. We propose that the balance of transcriptional insulation and interference at gene clusters facilitates gene expression switches during intracellular and extracellular environmental change.

  5. Identification of genes and gene clusters involved in mycotoxin synthesis

    USDA-ARS?s Scientific Manuscript database

    Research methods to identify and characterize genes involved in mycotoxin biosynthetic pathways have evolved considerably over the years. Before whole genome sequences were available (e.g. pre-genomics), work focused primarily on chemistry, biosynthetic mutant strains and molecular analysis of sing...

  6. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    PubMed

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  7. Reconstructing Histories of Complex Gene Clusters on a Phylogeny

    NASA Astrophysics Data System (ADS)

    Vinař, Tomáš; Brejová, Broňa; Song, Giltae; Siepel, Adam

    Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. These clusters are one of the major sources of evolutionary innovation, and they are linked to multiple diseases, including HIV and a variety of cancers. Understanding their evolutionary histories is a key to the application of comparative genomics methods in these regions of the genome. We propose a probabilistic model of gene cluster evolution on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate use of our methods in their analysis. Supplementary materials are located at http://compbio.fmph.uniba.sk/suppl/09recombcg/

  8. Functional Analysis of the Fusarielin Biosynthetic Gene Cluster.

    PubMed

    Droce, Aida; Saei, Wagma; Jørgensen, Simon Hartung; Wimmer, Reinhard; Giese, Henriette; Wollenberg, Rasmus Dam; Sondergaard, Teis Esben; Sørensen, Jens Laurids

    2016-12-13

    Fusarielins are polyketides with a decalin core produced by various species of Aspergillus and Fusarium. Although the responsible gene cluster has been identified, the biosynthetic pathway remains to be elucidated. In the present study, members of the gene cluster were deleted individually in a Fusarium graminearum strain overexpressing the local transcription factor. The results suggest that a trans-acting enoyl reductase (FSL5) assists the polyketide synthase FSL1 in biosynthesis of a polyketide product, which is released by hydrolysis by a trans-acting thioesterase (FSL2). Deletion of the epimerase (FSL3) resulted in accumulation of an unstable compound, which could be the released product. A novel compound, named prefusarielin, accumulated in the deletion mutant of the cytochrome P450 monooxygenase FSL4. Unlike the known fusarielins from Fusarium, this compound does not contain oxygenized decalin rings, suggesting that FSL4 is responsible for the oxygenation.

  9. Cyclopiazonic acid biosynthesis gene cluster gene cpaM is required for speradine A biosynthesis.

    PubMed

    Tokuoka, Masafumi; Kikuchi, Tomoki; Shinohara, Yasutomo; Koyama, Akifumi; Iio, Shin-Ichiro; Kubota, Takaaki; Kobayashi, Jun'ichi; Koyama, Yasuji; Totsuka, Akira; Shindo, Hitoshi; Sato, Kazuo

    2015-01-01

    Speradine A is a derivative of cyclopiazonic acid (CPA) found in culture of an Aspergillus tamarii isolate. Heterologous expression of a predicted methyltransferase gene, cpaM, in the cpa biosynthesis gene cluster of A. tamarii resulted in the speradine A production in a 2-oxoCPA producing A. oryzae strain, indicating cpaM is involved in the speradine A biosynthesis.

  10. Analysis of lamprey clustered Fox genes: insight into Fox gene evolution and expression in vertebrates.

    PubMed

    Wotton, Karl R; Shimeld, Sebastian M

    2011-12-01

    In the human genome, members of the FoxC, FoxF, FoxL1, and FoxQ1 gene families are found in two paralagous clusters. One cluster contains the genes FOXQ1, FOXF2, FOXC1 and the second consists of FOXF1, FOXC2, and FOXL1. In jawed vertebrates these genes are known to be expressed in different pharyngeal tissues and all, except FoxQ1, are involved in patterning the early embryonic mesoderm. We have previously traced the evolution of this cluster in the bony vertebrates, and the gene content is identical in the dogfish, a member of the most basally branching lineage of the jawed vertebrates. Here we extend these analyses to jawless vertebrates. Using genomic searches and molecular approaches we have identified homologues of these genes from lampreys. We identify two FoxC genes, two FoxF genes, two FoxQ1 genes and single FoxL1 gene. We examine the embryonic expression of one predominantly mesodermally expressed gene family, FoxC, and the endodermally expressed member of the cluster, FoxQ1. We identified FoxQ1 transcripts in the pharyngeal endoderm, while the two FoxC genes are differentially expressed in the pharyngeal mesenchyme and ectoderm. Furthermore we identify conserved expression of lamprey FoxC genes in the paraxial and intermediate mesoderms. We interpret our results through a chordate-wide comparison of expression patterns and discuss gene content in the context of theories on the evolution of the vertebrate genome.

  11. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    PubMed Central

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  12. Horizontal transfer of a large and highly toxic secondary metabolic gene cluster between fungi.

    PubMed

    Slot, Jason C; Rokas, Antonis

    2011-01-25

    Genes involved in intermediary and secondary metabolism in fungi are frequently physically linked or clustered. For example, in Aspergillus nidulans the entire pathway for the production of sterigmatocystin (ST), a highly toxic secondary metabolite and a precursor to the aflatoxins (AF), is located in a ∼54 kb, 23 gene cluster. We discovered that a complete ST gene cluster in Podospora anserina was horizontally transferred from Aspergillus. Phylogenetic analysis shows that most Podospora cluster genes are adjacent to or nested within Aspergillus cluster genes, although the two genera belong to different taxonomic classes. Furthermore, the Podospora cluster is highly conserved in content, sequence, and microsynteny with the Aspergillus ST/AF clusters and its intergenic regions contain 14 putative binding sites for AflR, the transcription factor required for activation of the ST/AF biosynthetic genes. Examination of ∼52,000 Podospora expressed sequence tags identified transcripts for 14 genes in the cluster, with several expressed at multiple life cycle stages. The presence of putative AflR-binding sites and the expression evidence for several cluster genes, coupled with the recent independent discovery of ST production in Podospora [1], suggest that this HGT event probably resulted in a functional cluster. Given the abundance of metabolic gene clusters in fungi, our finding that one of the largest known metabolic gene clusters moved intact between species suggests that such transfers might have significantly contributed to fungal metabolic diversity. PAPERFLICK:

  13. Transcriptional Analysis of Essential Genes of the Escherichia coli Fatty Acid Biosynthesis Gene Cluster by Functional Replacement with the Analogous Salmonella typhimurium Gene Cluster

    PubMed Central

    Zhang, Yan; Cronan, John E.

    1998-01-01

    The genes encoding several key fatty acid biosynthetic enzymes (called the fab cluster) are clustered in the order plsX-fabH-fabD-fabG-acpP-fabF at min 24 of the Escherichia coli chromosome. A difficulty in analysis of the fab cluster by the polar allele duplication approach (Y. Zhang and J. E. Cronan, Jr., J. Bacteriol. 178:3614–3620, 1996) is that several of these genes are essential for the growth of E. coli. We overcame this complication by use of the fab gene cluster of Salmonella typhimurium, a close relative of E. coli, to provide functions necessary for growth. The S. typhimurium fab cluster was isolated by complementation of an E. coli fabD mutant and was found to encode proteins with >94% homology to those of E. coli. However, the S. typhimurium sequences cannot recombine with the E. coli sequences required to direct polar allele duplication via homologous recombination. Using this approach, we found that although approximately 60% of the plsX transcripts initiate at promoters located far upstream and include the upstream rpmF ribosomal protein gene, a promoter located upstream of the plsX coding sequence (probably within the upstream gene, rpmF) is sufficient for normal growth. We have also found that the fabG gene is obligatorily cotranscribed with upstream genes. Insertion of a transcription terminator cassette (Ω-Cm cassette) between the fabD and fabG genes of the E. coli chromosome abolished fabG transcription and blocked cell growth, thus providing the first indication that fabG is an essential gene. Insertion of the Ω-Cm cassette between fabH and fabD caused greatly decreased transcription of the fabD and fabG genes and slower cellular growth, indicating that fabD has only a weak promoter(s). PMID:9642179

  14. Discovery of a widely distributed toxin biosynthetic gene cluster

    PubMed Central

    Lee, Shaun W.; Mitchell, Douglas A.; Markley, Andrew L.; Hensler, Mary E.; Gonzalez, David; Wohlrab, Aaron; Dorrestein, Pieter C.; Nizet, Victor; Dixon, Jack E.

    2008-01-01

    Bacteriocins represent a large family of ribosomally produced peptide antibiotics. Here we describe the discovery of a widely conserved biosynthetic gene cluster for the synthesis of thiazole and oxazole heterocycles on ribosomally produced peptides. These clusters encode a toxin precursor and all necessary proteins for toxin maturation and export. Using the toxin precursor peptide and heterocycle-forming synthetase proteins from the human pathogen Streptococcus pyogenes, we demonstrate the in vitro reconstitution of streptolysin S activity. We provide evidence that the synthetase enzymes, as predicted from our bioinformatics analysis, introduce heterocycles onto precursor peptides, thereby providing molecular insight into the chemical structure of streptolysin S. Furthermore, our studies reveal that the synthetase exhibits relaxed substrate specificity and modifies toxin precursors from both related and distant species. Given our findings, it is likely that the discovery of similar peptidic toxins will rapidly expand to existing and emerging genomes. PMID:18375757

  15. Gene clusters reflecting macrodomain structure respond to nucleoid perturbations.

    PubMed

    Scolari, Vittore F; Bassetti, Bruno; Sclavi, Bianca; Lagomarsino, Marco Cosentino

    2011-03-01

    Focusing on the DNA-bridging nucleoid proteins Fis and H-NS, and integrating several independent experimental and bioinformatic data sources, we investigate the links between chromosomal spatial organization and global transcriptional regulation. By means of a novel multi-scale spatial aggregation analysis, we uncover the existence of contiguous clusters of nucleoid-perturbation sensitive genes along the genome, whose expression is affected by a combination of topological DNA state and nucleoid-shaping protein occupancy. The clusters correlate well with the macrodomain structure of the genome. The most significant of them lay symmetrically at the edges of the Ter macrodomain and involve all of the flagellar and chemotaxis machinery, in addition to key regulators of biofilm formation, suggesting that the regulation of the physical state of the chromosome by the nucleoid proteins plays an important role in coordinating the transcriptional response leading to the switch between a motile and a biofilm lifestyle.

  16. Translating biosynthetic gene clusters into fungal armor and weaponry.

    PubMed

    Keller, Nancy P

    2015-09-01

    Filamentous fungi are renowned for the production of a diverse array of secondary metabolites (SMs) where the genetic material required for synthesis of a SM is typically arrayed in a biosynthetic gene cluster (BGC). These natural products are valued for their bioactive properties stemming from their functions in fungal biology, key among those protection from abiotic and biotic stress and establishment of a secure niche. The producing fungus must not only avoid self-harm from endogenous SMs but also deliver specific SMs at the right time to the right tissue requiring biochemical aid. This review highlights functions of BGCs beyond the enzymatic assembly of SMs, considering the timing and location of SM production and other proteins in the clusters that control SM activity. Specifically, self-protection is provided by both BGC-encoded mechanisms and non-BGC subcellular containment of toxic SM precursors; delivery and timing is orchestrated through cellular trafficking patterns and stress- and developmental-responsive transcriptional programs.

  17. Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data.

    PubMed

    Kasim, Shahreen; Deris, Safaai; Othman, Razib M

    2013-09-01

    A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clustering algorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clustering algorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-means clustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

    PubMed Central

    Picardi, Ernesto; Mignone, Flavio; Pesole, Graziano

    2009-01-01

    Background ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed loci. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an ad hoc genomic mapping. Methods EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site. Results The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human HOXA gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome

  19. Organization of a cluster of erythromycin genes in Saccharopolyspora erythraea.

    PubMed Central

    Weber, J M; Leung, J O; Maine, G T; Potenz, R H; Paulus, T J; DeWitt, J P

    1990-01-01

    We used a series of gene disruptions and gene replacements to mutagenically characterize 30 kilobases of DNA in the erythromycin resistance gene (ermE) region of the Saccharopolyspora erythraea chromosome. Five previously undiscovered loci involved in the biosynthesis of erythromycin were found, eryBI, eryBII, eryCI, eryCII, and eryH; and three known loci, eryAI, eryG, and ermE, were further characterized. The new Ery phenotype, EryH, was marked by (i) the accumulation of the intermediate 6-deoxyerythronolide B (DEB), suggesting a defect in the operation of the C-6 hydroxylase system, and (ii) a block in the synthesis or addition reactions for the first sugar group. Analyses of ermE mutants indicated that ermE is the only gene required for resistance to erythromycin, and that it is not required for production of the intermediate erythronolide B (EB) or for conversion of the intermediate 3-alpha-mycarosyl erythronolide B (MEB) to erythromycin. Mutations in the eryB and eryC loci were similar to previously reported chemically induced eryB and eryC mutations blocking synthesis or attachment of the two erythromycin sugar groups. Insertion mutations in eryAI, the macrolactone synthetase, defined the largest (at least 9-kilobase) transcription unit of the cluster. These mutants help to define the physical organization of the erythromycin gene cluster, and the eryH mutants provide a source for the production of the intermediate DEB. Images PMID:2185216

  20. Developmental expression and gene/enzyme identifications in the alpha esterase gene cluster of Drosophila melanogaster.

    PubMed

    Campbell, P M; de Q Robin, G C; Court, L N; Dorrian, S J; Russell, R J; Oakeshott, J G

    2003-10-01

    Here we show how the 10 genes of the alpha esterase cluster of Drosophila melanogaster have diverged substantially in their expression profiles. Together with previously described sequence divergence this suggests substantial functional diversification. By peptide mass fingerprinting and in vitro gene expression we have also shown that two of the genes encode the isozymes EST9 (formerly ESTC) and EST23. EST9 is the major 'alpha staining' esterase in zymograms of gut tissues in feeding stages while orthologues of EST23 confer resistance to organophosphorus insecticides in other higher Diptera. The results for EST9 and EST23 concur with previous suggestions that the products of the alpha esterase cluster function in digestion and detoxification of xenobiotic esters. However, many of the other genes in the cluster show developmental or tissue-specific expression that seems inconsistent with such roles. Furthermore, there is generally poor correspondence between the mRNA expression patterns of the remaining eight genes and isozymes previously characterized by standard techniques of electrophoresis and staining, suggesting that the alpha cluster might only account for a small minority of the esterase isozyme profile.

  1. Toward Awakening Cryptic Secondary Metabolite Gene Clusters in Filamentous Fungi

    PubMed Central

    Lim, Fang Yun; Sanchez, James F.; Wang, Clay C.C.; Keller, Nancy P.

    2013-01-01

    Mining for novel natural compounds is of eminent importance owing to the continuous need for new pharmaceuticals. Filamentous fungi are historically known to harbor the genetic capacity for an arsenal of natural compounds, both beneficial and detrimental to humans. The majority of these metabolites are still cryptic or silent under standard laboratory culture conditions. Mining for these cryptic natural products can be an excellent source for identifying new compound classes. Capitalizing on the current knowledge on how secondary metabolite gene clusters are regulated has allowed the research community to unlock many hidden fungal treasures, as described in this chapter. PMID:23084945

  2. From hormones to secondary metabolism: the emergence of metabolic gene clusters in plants.

    PubMed

    Chu, Hoi Yee; Wegel, Eva; Osbourn, Anne

    2011-04-01

    Gene clusters for the synthesis of secondary metabolites are a common feature of microbial genomes. Well-known examples include clusters for the synthesis of antibiotics in actinomycetes, and also for the synthesis of antibiotics and toxins in filamentous fungi. Until recently it was thought that genes for plant metabolic pathways were not clustered, and this is certainly true in many cases; however, five plant secondary metabolic gene clusters have now been discovered, all of them implicated in synthesis of defence compounds. An obvious assumption might be that these eukaryotic gene clusters have arisen by horizontal gene transfer from microbes, but there is compelling evidence to indicate that this is not the case. This raises intriguing questions about how widespread such clusters are, what the significance of clustering is, why genes for some metabolic pathways are clustered and those for others are not, and how these clusters form. In answering these questions we may hope to learn more about mechanisms of genome plasticity and adaptive evolution in plants. It is noteworthy that for the five plant secondary metabolic gene clusters reported so far, the enzymes for the first committed steps all appear to have been recruited directly or indirectly from primary metabolic pathways involved in hormone synthesis. This may or may not turn out to be a common feature of plant secondary metabolic gene clusters as new clusters emerge.

  3. Molecular Characterization of Neurally Expressing Genes in the Para Sodium Channel Gene Cluster of Drosophila

    PubMed Central

    Hong, C. S.; Ganetzky, B.

    1996-01-01

    To elucidate the mechanisms regulating expression of para, which encodes the major class of sodium channels in the Drosophila nervous system, we have tried to locate upstream cis-acting regulatory elements by mapping the transcriptional start site and analyzing the region immediately upstream of para in region 14D of the polytene chromosomes. From these studies, we have discovered that the region contains a cluster of neurally expressing genes. Here we report the molecular characterization of the genomic organization of the 14D region and the genes within this region, which are: calnexin (Cnx), actin related protein 14D (Arp14D), calcineurin A 14D (CnnA14D), and chromosome associated protein (Cap). The tight clustering of these genes, their neuronal expression patterns, and their potential functions related to expression, modulation, or regulation of sodium channels raise the possibility that these genes represent a functionally related group sharing some coordinate regulatory mechanism. PMID:8849894

  4. Evolutionary formation of gene clusters by reorganization: the meleagrin/roquefortine paradigm in different fungi.

    PubMed

    Martín, Juan F; Liras, Paloma

    2016-02-01

    The biosynthesis of secondary metabolites in fungi is catalyzed by enzymes encoded by genes linked in clusters that are frequently co-regulated at the transcriptional level. Formation of gene clusters may take place by de novo assembly of genes recruited from other cellular functions, but also novel gene clusters are formed by reorganization of progenitor clusters and are distributed by horizontal gene transfer. This article reviews (i) the published information on the roquefortine/meleagrin/neoxaline gene clusters of Penicillium chrysogenum (Penicillium rubens) and the short roquefortine cluster of Penicillium roqueforti, and (ii) the correlation of the genes present in those clusters with the enzymes and metabolites derived from these pathways. The P. chrysogenum roq/mel cluster consists of seven genes and includes a gene (roqT) encoding a 12-TMS transporter protein of the MFS family. Interestingly, the orthologous P. roquefortine gene cluster has only four genes and the roqT gene is present as a residual pseudogene that encodes only small peptides. Two of the genes present in the central region of the P. chrysogenum roq/mel cluster have been lost during the evolutionary formation of the short cluster and the order of the structural genes in the cluster has been rearranged. The two lost genes encode a N1 atom hydroxylase (nox) and a roquefortine scaffold-reorganizing oxygenase (sro). As a consequence P. roqueforti has lost the ability to convert the roquefortine-type carbon skeleton to the glandicoline/meleagrin-type scaffold and is unable to produce glandicoline B, meleagrin and neoxaline. The loss of this genetic information is not recent and occurred probably millions of years ago when a progenitor Penicillium strain got adapted to life in a few rich habitats such as cheese, fermented cereal grains or silage. P. roqueforti may be considered as a "domesticated" variant of a progenitor common to contemporary P. chrysogenum and related Penicillia.

  5. Functional clustering of time series gene expression data by Granger causality

    PubMed Central

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  6. Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes

    PubMed Central

    Azevedo, Analice C.; Bento, Cláudia B. P.; Ruiz, Jeronimo C.; Queiroz, Marisa V.

    2015-01-01

    Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. PMID:26253660

  7. Gene prioritization and clustering by multi-view text mining.

    PubMed

    Yu, Shi; Tranchevent, Leon-Charles; De Moor, Bart; Moreau, Yves

    2010-01-14

    Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

  8. Gene prioritization and clustering by multi-view text mining

    PubMed Central

    2010-01-01

    Background Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. Results We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. Conclusions In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. PMID:20074336

  9. Gravitation field algorithm and its application in gene cluster

    PubMed Central

    2010-01-01

    Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA. PMID:20854683

  10. Gravitation field algorithm and its application in gene cluster.

    PubMed

    Zheng, Ming; Liu, Gui-Xia; Zhou, Chun-Guang; Liang, Yan-Chun; Wang, Yan

    2010-09-20

    Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.

  11. Characterisation of the gene cluster for L-rhamnose catabolism in the yeast Scheffersomyces (Pichia) stipitis

    Treesearch

    Outi M. Koivistoinen; Mikko Arvas; Jennifer R. Headman; Martina Andberg; Merja Penttilä; Thomas W. Jeffries; Peter Richard

    2012-01-01

    In Scheffersomyces (Pichia) stipitis and related fungal species the genes for L-rhamnose catabolism RHA1, LRA2, LRA3 and LRA4 but not LADH are clustered. We find that located next to the cluster is a transcription...

  12. Arrangement of the Clostridium baratii F7 Toxin Gene Cluster with Identification of a σ Factor That Recognizes the Botulinum Toxin Gene Cluster Promoters

    SciTech Connect

    Dover, Nir; Barash, Jason R.; Burke, Julianne N.; Hill, Karen K.; Detter, John C.; Arnon, Stephen S.

    2014-05-22

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bont gene that is part of a toxin gene cluster that includes several accessory genes. In this paper, we sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. Finally, this TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.

  13. Arrangement of the Clostridium baratii F7 Toxin Gene Cluster with Identification of a σ Factor That Recognizes the Botulinum Toxin Gene Cluster Promoters

    PubMed Central

    Dover, Nir; Barash, Jason R.; Burke, Julianne N.; Hill, Karen K.; Detter, John C.; Arnon, Stephen S.

    2014-01-01

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bont gene that is part of a toxin gene cluster that includes several accessory genes. We sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. This TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii. PMID:24853378

  14. Arrangement of the Clostridium baratii F7 toxin gene cluster with identification of a σ factor that recognizes the botulinum toxin gene cluster promoters.

    PubMed

    Dover, Nir; Barash, Jason R; Burke, Julianne N; Hill, Karen K; Detter, John C; Arnon, Stephen S

    2014-01-01

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bont gene that is part of a toxin gene cluster that includes several accessory genes. We sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. This TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.

  15. Parallel evolutionary events in the haptoglobin gene clusters of rhesus monkey and human

    SciTech Connect

    Erickson, L.M.; Maeda, N.

    1994-08-01

    Parallel occurrences of evolutionary events in the haptoglobin gene clusters of rhesus monkeys and humans were studied. We found six different haplotypes among 11 individuals from two rhesus monkey families. The six haplotypes include two types of haptoglobin gene clusters: one type with a single gene and the other with two genes. DNA sequence analysis indicates that the one-gene and the two-gene clusters were both formed by unequal homologous crossovers between two genes of an ancestral three-gene cluster, near exon 5, the longest exon of the gene. This exon is also the location where a separate unequal homologous crossover occured in the human lineage, forming the human two-gene haptoglobin gene cluster from an ancestral three-gene cluster. The occurrence of independent homologous unequal crossovers in rhesus monkey and in human within the same region of DNA suggests that the evolutionary history of the haptoglobin gene cluster in primates is the consequence of frequent homologous pairings facilitated by the longest and most conserved exon of the gene. 27 refs., 7 figs., 1 tab.

  16. Identification of the cluster control region for the protocadherin-beta genes located beyond the protocadherin-gamma cluster.

    PubMed

    Yokota, Shinnichi; Hirayama, Teruyoshi; Hirano, Keizo; Kaneko, Ryosuke; Toyoda, Shunsuke; Kawamura, Yoshimi; Hirabayashi, Masumi; Hirabayashi, Takahiro; Yagi, Takeshi

    2011-09-09

    The clustered protocadherins (Pcdhs), Pcdh-α, -β, and -γ, are transmembrane proteins constituting a subgroup of the cadherin superfamily. Each Pcdh cluster is arranged in tandem on the same chromosome. Each of the three Pcdh clusters shows stochastic and combinatorial expression in individual neurons, thus generating a hugely diverse set of possible cell surface molecules. Therefore, the clustered Pcdhs are candidates for determining neuronal molecular diversity. Here, we showed that the targeted deletion of DNase I hypersensitive (HS) site HS5-1, previously identified as a Pcdh-α regulatory element in vitro, affects especially the expression of specific Pcdh-α isoforms in vivo. We also identified a Pcdh-β cluster control region (CCR) containing six HS sites (HS16, 17, 17', 18, 19, and 20) downstream of the Pcdh-γ cluster. This CCR comprehensively activates the expression of the Pcdh-β gene cluster in cis, and its deletion dramatically decreases their expression levels. Deleting the CCR nonuniformly down-regulates some Pcdh-γ isoforms and does not affect Pcdh-α expression. Thus, the CCR effect extends beyond the 320-kb region containing the Pcdh-γ cluster to activate the upstream Pcdh-β genes. Thus, we concluded that the CCR is a highly specific regulatory unit for Pcdh-β expression on the clustered Pcdh genomic locus. These findings suggest that each Pcdh cluster is controlled by distinct regulatory elements that activate their expression and that the stochastic gene regulation of the clustered Pcdhs is controlled by the complex chromatin architecture of the clustered Pcdh locus.

  17. Identification of the Cluster Control Region for the Protocadherin-β Genes Located beyond the Protocadherin-γ Cluster*

    PubMed Central

    Yokota, Shinnichi; Hirayama, Teruyoshi; Hirano, Keizo; Kaneko, Ryosuke; Toyoda, Shunsuke; Kawamura, Yoshimi; Hirabayashi, Masumi; Hirabayashi, Takahiro; Yagi, Takeshi

    2011-01-01

    The clustered protocadherins (Pcdhs), Pcdh-α, -β, and -γ, are transmembrane proteins constituting a subgroup of the cadherin superfamily. Each Pcdh cluster is arranged in tandem on the same chromosome. Each of the three Pcdh clusters shows stochastic and combinatorial expression in individual neurons, thus generating a hugely diverse set of possible cell surface molecules. Therefore, the clustered Pcdhs are candidates for determining neuronal molecular diversity. Here, we showed that the targeted deletion of DNase I hypersensitive (HS) site HS5-1, previously identified as a Pcdh-α regulatory element in vitro, affects especially the expression of specific Pcdh-α isoforms in vivo. We also identified a Pcdh-β cluster control region (CCR) containing six HS sites (HS16, 17, 17′, 18, 19, and 20) downstream of the Pcdh-γ cluster. This CCR comprehensively activates the expression of the Pcdh-β gene cluster in cis, and its deletion dramatically decreases their expression levels. Deleting the CCR nonuniformly down-regulates some Pcdh-γ isoforms and does not affect Pcdh-α expression. Thus, the CCR effect extends beyond the 320-kb region containing the Pcdh-γ cluster to activate the upstream Pcdh-β genes. Thus, we concluded that the CCR is a highly specific regulatory unit for Pcdh-β expression on the clustered Pcdh genomic locus. These findings suggest that each Pcdh cluster is controlled by distinct regulatory elements that activate their expression and that the stochastic gene regulation of the clustered Pcdhs is controlled by the complex chromatin architecture of the clustered Pcdh locus. PMID:21771796

  18. Nucleotide polymorphism in colicin E2 gene clusters: evidence for nonneutral evolution.

    PubMed

    Tan, Y; Riley, M A

    1997-06-01

    To explore the molecular mechanisms behind the diversification of colicin gene clusters, we examined DNA sequence polymorphism for the colicin gene clusters of 14 colicin E2 (ColE2) plasmids obtained from natural isolates of Escherichia coli. Two types of ColE2 plasmids are revealed, with type II gene clusters generated by recombination between type I ColE2 and ColE7 gene clusters. The levels and patterns of DNA polymorphism are different between the two types. Type I polymorphism is distributed evenly along the gene cluster, while type II accumulates polymorphism at an elevated rate in the 5' end of the colicin gene. These differences may be explained by recombinational origins of type II gene clusters. The pattern of divergence between the ColE2 gene cluster and its close relative ColE9 is not correlated with the pattern of polymorphism within ColE2, suggesting that this gene cluster is not evolving in a neutral fashion. A statistical test confirms significant departures from the predictions of neutrality. These data lend further support to the hypothesis that colicin gene clusters may evolve under the influence of nonneutral forces.

  19. Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching.

    PubMed

    Dejong, Chris A; Chen, Gregory M; Li, Haoxin; Johnston, Chad W; Edwards, Mclean R; Rees, Philip N; Skinnider, Michael A; Webster, Andrew L H; Magarvey, Nathan A

    2016-12-01

    Polyketides (PKs) and nonribosomal peptides (NRPs) are profoundly important natural products, forming the foundations of many therapeutic regimes. Decades of research have revealed over 11,000 PK and NRP structures, and genome sequencing is uncovering new PK and NRP gene clusters at an unprecedented rate. However, only ∼10% of PK and NRPs are currently associated with gene clusters, and it is unclear how many of these orphan gene clusters encode previously isolated molecules. Therefore, to efficiently guide the discovery of new molecules, we must first systematically de-orphan emergent gene clusters from genomes. Here we provide to our knowledge the first comprehensive retro-biosynthetic program, generalized retro-biosynthetic assembly prediction engine (GRAPE), for PK and NRP families and introduce a computational pipeline, global alignment for natural products cheminformatics (GARLIC), to uncover how observed biosynthetic gene clusters relate to known molecules, leading to the identification of gene clusters that encode new molecules.

  20. Lampreys have a single gene cluster for the fast skeletal myosin heavy chain gene family.

    PubMed

    Ikeda, Daisuke; Ono, Yosuke; Hirano, Shigeki; Kan-no, Nobuhiro; Watabe, Shugo

    2013-01-01

    Muscle tissues contain the most classic sarcomeric myosin, called myosin II, which consists of 2 heavy chains (MYHs) and 4 light chains. In the case of humans (tetrapod), a total of 6 fast skeletal-type MYH genes (MYHs) are clustered on a single chromosome. In contrast, torafugu (teleost) contains at least 13 fast skeletal MYHs, which are distributed in 5 genomic regions; the MYHs are clustered in 3 of these regions. In the present study, the evolutionary relationship among fast skeletal MYHs is elucidated by comparing the MYHs of teleosts and tetrapods with those of cyclostome lampreys, one of two groups of extant jawless vertebrates (agnathans). We found that lampreys contain at least 3 fast skeletal MYHs, which are clustered in a head-to-tail manner in a single genomic region. Although there was apparent synteny in the corresponding MYH cluster regions between lampreys and tetrapods, phylogenetic analysis indicated that lamprey and tetrapod MYHs have independently duplicated and diversified. Subsequent transgenic approaches showed that the 5'-flanking sequences of Japanese lamprey fast skeletal MYHs function as a regulatory sequence to drive specific reporter gene expression in the fast skeletal muscle of zebrafish embryos. Although zebrafish MYH promoters showed apparent activity to direct reporter gene expression in myogenic cells derived from mice, promoters from Japanese lamprey MYHs had no activity. These results suggest that the muscle-specific regulatory mechanisms are partially conserved between teleosts and tetrapods but not between cyclostomes and tetrapods, despite the conserved synteny.

  1. Lampreys Have a Single Gene Cluster for the Fast Skeletal Myosin Heavy Chain Gene Family

    PubMed Central

    Ikeda, Daisuke; Ono, Yosuke; Hirano, Shigeki; Kan-no, Nobuhiro; Watabe, Shugo

    2013-01-01

    Muscle tissues contain the most classic sarcomeric myosin, called myosin II, which consists of 2 heavy chains (MYHs) and 4 light chains. In the case of humans (tetrapod), a total of 6 fast skeletal-type MYH genes (MYHs) are clustered on a single chromosome. In contrast, torafugu (teleost) contains at least 13 fast skeletal MYHs, which are distributed in 5 genomic regions; the MYHs are clustered in 3 of these regions. In the present study, the evolutionary relationship among fast skeletal MYHs is elucidated by comparing the MYHs of teleosts and tetrapods with those of cyclostome lampreys, one of two groups of extant jawless vertebrates (agnathans). We found that lampreys contain at least 3 fast skeletal MYHs, which are clustered in a head-to-tail manner in a single genomic region. Although there was apparent synteny in the corresponding MYH cluster regions between lampreys and tetrapods, phylogenetic analysis indicated that lamprey and tetrapod MYHs have independently duplicated and diversified. Subsequent transgenic approaches showed that the 5′-flanking sequences of Japanese lamprey fast skeletal MYHs function as a regulatory sequence to drive specific reporter gene expression in the fast skeletal muscle of zebrafish embryos. Although zebrafish MYH promoters showed apparent activity to direct reporter gene expression in myogenic cells derived from mice, promoters from Japanese lamprey MYHs had no activity. These results suggest that the muscle-specific regulatory mechanisms are partially conserved between teleosts and tetrapods but not between cyclostomes and tetrapods, despite the conserved synteny. PMID:24376886

  2. Do cnidarians have a ParaHox cluster? Analysis of synteny around a Nematostella homeobox gene cluster.

    PubMed

    Hui, Jerome H L; Holland, Peter W H; Ferrier, David E K

    2008-01-01

    The Hox gene cluster is renowned for its role in developmental patterning of embryogenesis along the anterior-posterior axis of bilaterians. Its supposed evolutionary sister or paralog, the ParaHox cluster, is composed of Gsx, Xlox, and Cdx, and also has important roles in anterior-posterior development. There is a debate as to whether the cnidarians, as an outgroup to bilaterians, contain true Hox and ParaHox genes, or instead the Hox-like gene complement of cnidarians arose from independent duplications to those that generated the genes of the bilaterian Hox and ParaHox clusters. A recent whole genome analysis of the cnidarian Nematostella vectensis found conserved synteny between this cnidarian and vertebrates, including a region of synteny between the putative Hox cluster of N. vectensis and the Hox clusters of vertebrates. No syntenic region was identified around a potential cnidarian ParaHox cluster. Here we use different approaches to identify a genomic region in N. vectensis that is syntenic with the bilaterian ParaHox cluster. This proves that the duplication that gave rise to the Hox and ParaHox regions of bilaterians occurred before the origin of cnidarians, and the cnidarian N. vectensis has bona fide Hox and ParaHox loci.

  3. Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

    PubMed

    Bhattacharya, Anindya; De, Rajat K

    2008-06-01

    Cluster analysis (of gene-expression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering gene-expression data. However most of these algorithms have several shortcomings for gene-expression data clustering. In the present article, we focus on several shortcomings of conventional clustering algorithms and propose a new one that is able to produce better clustering solution than that produced by some others. We present the Divisive Correlation Clustering Algorithm (DCCA) that is suitable for finding a group of genes having similar pattern of variation in their expression values. To detect clusters with high correlation and biological significance, we use the correlation clustering concept introduced by Bansal et al. Our proposed algorithm DCCA produces a clustering solution without taking number of clusters to be created as an input. DCCA uses the correlation matrix in such a way that all genes in a cluster have highest average correlation with genes in that cluster. To test the performance of the DCCA, we have applied DCCA and some well-known conventional methods to an artificial dataset, and nine gene-expression datasets, and compared the performance of the algorithms. The clustering results of the DCCA are found to be more significantly relevant to the biological annotations than those of the other methods. All these facts show the superiority of the DCCA over some others for the clustering of gene-expression data. The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software.

  4. Regulator of complement activation (RCA) gene cluster in Xenopus tropicalis.

    PubMed

    Oshiumi, Hiroyuki; Suzuki, Yuzuru; Matsumoto, Misako; Seya, Tsukasa

    2009-05-01

    Genome and expressed sequence tag information of Xenopus tropicalis suggested that short-consensus repeat (SCR)-containing proteins are encoded by three genes that are mapped within a 300-kb downstream of PFKFB2, which is a marker gene for the regulator of complement activation (RCA) loci in human and chicken. Based on this observation, we cloned the three cDNAs of these proteins using 3'- or 5'-RACE technique. Since their primary structures and locations of the proximity to the PFKFB2 locus, we named them amphibian RCA protein (ARC) 1, 2, and 3. Expression in human HEK293 or CHO cells suggested that ARC1 is a soluble protein of Mr approximately 67 kDa, ARC2 is a membrane protein with Mr 44 kDa, and ARC3 a secretary protein with a putative transmembrane region. They were N-glycosylated during maturation. In human and chicken RCA clusters, the order in which genes for soluble, GPI-anchored, and membrane forms of SCR proteins are arranged is from the distant to proximity to the PFKFB2 gene. However, the amphibian ARC1, 2, and 3 resembled one another and did not reflect the same order found in human and chicken RCA genes. This may be due to self-duplication of ARCs to form a family, and it evolved after the amphibia separated from the ancestor of the amniotes, which possessed soluble, GPI-anchored, and membrane forms of SCR protein members. Taken together, frog possesses a RCA locus, but the constitution of the ARC proteins differs from that of the amniotes with a unique self-resemblance.

  5. Nucleotide sequence and transcriptional analysis of the type A2 neurotoxin gene cluster in Clostridium botulinum.

    PubMed

    Dineen, Sean S; Bradshaw, Marite; Karasek, Charles E; Johnson, Eric A

    2004-06-01

    The nucleotide sequences of the upstream regions of the botulinum neurotoxin type A1 (BoNT/A1) cluster of Clostridium botulinum strain NCTC 2916 and the BoNT/A2 cluster of strain Kyoto-F were determined. A novel gene, designated orfx3, was identified following the orfx2 gene in both clusters. ORF-X2 and ORF-X3 exhibit similarity to the BoNT cluster associated P-47 protein. The BoNT/A1 and BoNT/A2 clusters share a similar gene arrangement, but exhibit differences in the spacing between certain genes. Sequences with similarity to transposases were identified in these intergenic regions, suggesting that these differences arose from an ancestral insertion event. Transcriptional analysis of the BoNT/A2 cluster revealed that the genes of the cluster are primarily synthesized as three polycistronic transcripts. Two divergent polycistronic transcripts, one encoding the orfx1, orfx2, and orfx3 genes, the second encoding the p47, ntnh, and bont/a2 genes, are transcribed from conserved BoNT cluster promoters. The third polycistronic transcript, expressed at low levels, encodes the positive regulatory botR gene and the orfx genes. This is the first complete analysis of a botulinum toxin A2 cluster.

  6. Identification of lethal cluster of genes in the yeast transcription network

    NASA Astrophysics Data System (ADS)

    Rho, K.; Jeong, H.; Kahng, B.

    2006-05-01

    Identification of essential or lethal genes would be one of the ultimate goals in drug designs. Here we introduce an in silico method to select the cluster with a high population of lethal genes, called lethal cluster, through microarray assay. We construct a gene transcription network based on the microarray expression level. Links are added one by one in the descending order of the Pearson correlation coefficients between two genes. As the link density p increases, two meaningful link densities pm and ps are observed. At pm, which is smaller than the percolation threshold, the number of disconnected clusters is maximum, and the lethal genes are highly concentrated in a certain cluster that needs to be identified. Thus the deletion of all genes in that cluster could efficiently lead to a lethal inviable mutant. This lethal cluster can be identified by an in silico method. As p increases further beyond the percolation threshold, the power law behavior in the degree distribution of a giant cluster appears at ps. We measure the degree of each gene at ps. With the information pertaining to the degrees of each gene at ps, we return to the point pm and calculate the mean degree of genes of each cluster. We find that the lethal cluster has the largest mean degree.

  7. Phenotype-based clustering of glycosylation-related genes by RNAi-mediated gene silencing.

    PubMed

    Yamamoto-Hino, Miki; Yoshida, Hideki; Ichimiya, Tomomi; Sakamura, Sho; Maeda, Megumi; Kimura, Yoshinobu; Sasaki, Norihiko; Aoki-Kinoshita, Kiyoko F; Kinoshita-Toyoda, Akiko; Toyoda, Hidenao; Ueda, Ryu; Nishihara, Shoko; Goto, Satoshi

    2015-06-01

    Glycan structures are synthesized by a series of reactions conducted by glycosylation-related (GR) proteins such as glycosyltransferases, glycan-modifying enzymes, and nucleotide-sugar transporters. For example, the common core region of glycosaminoglycans (GAGs) is sequentially synthesized by peptide-O-xylosyltransferase, β1,4-galactosyltransferase I, β1,3-galactosyltransferase II, and β1,3-glucuronyltransferase. This raises the possibility that functional impairment of GR proteins involved in synthesis of the same glycan might result in the same phenotypic abnormality. To examine this possibility, comprehensive silencing of genes encoding GR and proteoglycan core proteins was conducted in Drosophila. Drosophila GR candidate genes (125) were classified into five functional groups for synthesis of GAGs, N-linked, O-linked, Notch-related, and unknown glycans. Spatiotemporally regulated silencing caused a range of malformed phenotypes that fell into three types: extra veins, thick veins, and depigmentation. The clustered phenotypes reflected the biosynthetic pathways of GAGs, Fringe-dependent glycan on Notch, and glycans placed at or near nonreducing ends (herein termed terminal domains of glycans). Based on the phenotypic clustering, CG33145 was predicted to be involved in formation of terminal domains. Our further analysis showed that CG33145 exhibited galactosyltransferase activity in synthesis of terminal N-linked glycans. Phenotypic clustering, therefore, has potential for the functional prediction of novel GR genes. © 2015 The Authors. Genes to Cells published by Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  8. Translating biosynthetic gene clusters into fungal armor and weaponry

    PubMed Central

    Keller, Nancy P

    2015-01-01

    Filamentous fungi are renowned for the production of a diverse array of secondary metabolites (SMs) where the genetic material required for synthesis of a SM is typically arrayed in a biosynthetic gene cluster (BGC). These natural products are valued for their bioactive properties stemming from their functions in fungal biology, key among those protection from abiotic and biotic stress and establishment of a secure niche. The producing fungus must not only avoid self-harm from endogenous SMs but also deliver specific SMs at the right time to the right tissue requiring biochemical aid. This review highlights functions of BGCs beyond the enzymatic assembly of SMs, considering the timing and location of SM production and other proteins in the clusters that control SM activity. Specifically, self-protection is provided by both BGC-encoded mechanisms and non-BGC subcellular containment of toxic SM precursors; delivery and timing is orchestrated through cellular trafficking patterns and stress- and developmental-responsive transcriptional programs. PMID:26284674

  9. Mutational and Phylogenetic Analyses of the Mycobacterial mbt Gene Cluster ▿§

    PubMed Central

    Chavadi, Sivagami Sundaram; Stirrett, Karen L.; Edupuganti, Uthamaphani R.; Vergnolle, Olivia; Sadhanandan, Gigani; Marchiano, Emily; Martin, Che; Qiu, Wei-Gang; Soll, Clifford E.; Quadri, Luis E. N.

    2011-01-01

    The mycobactin siderophore system is present in many Mycobacterium species, including M. tuberculosis and other clinically relevant mycobacteria. This siderophore system is believed to be utilized by both pathogenic and nonpathogenic mycobacteria for iron acquisition in both in vivo and ex vivo iron-limiting environments, respectively. Several M. tuberculosis genes located in a so-called mbt gene cluster have been predicted to be required for the biosynthesis of the core scaffold of mycobactin based on sequence analysis. A systematic and controlled mutational analysis probing the hypothesized essential nature of each of these genes for mycobactin production has been lacking. The degree of conservation of mbt gene cluster orthologs remains to be investigated as well. In this study, we sought to conclusively establish whether each of nine mbt genes was required for mycobactin production and to examine the conservation of gene clusters orthologous to the M. tuberculosis mbt gene cluster in other bacteria. We report a systematic mutational analysis of the mbt gene cluster ortholog found in Mycobacterium smegmatis. This mutational analysis demonstrates that eight of the nine mbt genes investigated are essential for mycobactin production. Our genome mining and phylogenetic analyses reveal the presence of orthologous mbt gene clusters in several bacterial species. These gene clusters display significant organizational differences originating from an intricate evolutionary path that might have included horizontal gene transfers. Altogether, the findings reported herein advance our understanding of the genetic requirements for the biosynthesis of an important mycobacterial secondary metabolite with relevance to virulence. PMID:21873494

  10. Enzymology of aminoglycoside biosynthesis-deduction from gene clusters.

    PubMed

    Wehmeier, Udo F; Piepersberg, Wolfgang

    2009-01-01

    The classical aminoglycosides are, with very few exceptions, typically actinobacterial secondary metabolites with antimicrobial activities all mediated by inhibiting translation on the 30S subunit of the bacterial ribosome. Some chemically related natural products inhibit glucosidases by mimicking oligo-alpha-1,4-glucosides. The biochemistry of the aminoglycoside biosynthetic pathways is still a developing field since none of the pathways has been analyzed to completeness as yet. In this chapter we treat the enzymology of aminoglycoside biosyntheses as far as it becomes apparent from recent investigations based on the availability of DNA sequence data of biosynthetic gene clusters for all major structural classes of these bacterial metabolites. We give a more general overview of the field, including descriptions of some key enzymes in various aminoglycoside pathways, whereas in Chapter 20 provides a detailed account of the better-studied enzymology thus far known for the neomycin and butirosin pathways.

  11. Heterologous expression of pikromycin biosynthetic gene cluster using Streptomyces artificial chromosome system.

    PubMed

    Pyeon, Hye-Rim; Nah, Hee-Ju; Kang, Seung-Hoon; Choi, Si-Sun; Kim, Eung-Soo

    2017-05-31

    Heterologous expression of biosynthetic gene clusters of natural microbial products has become an essential strategy for titer improvement and pathway engineering of various potentially-valuable natural products. A Streptomyces artificial chromosomal conjugation vector, pSBAC, was previously successfully applied for precise cloning and tandem integration of a large polyketide tautomycetin (TMC) biosynthetic gene cluster (Nah et al. in Microb Cell Fact 14(1):1, 2015), implying that this strategy could be employed to develop a custom overexpression scheme of natural product pathway clusters present in actinomycetes. To validate the pSBAC system as a generally-applicable heterologous overexpression system for a large-sized polyketide biosynthetic gene cluster in Streptomyces, another model polyketide compound, the pikromycin biosynthetic gene cluster, was preciously cloned and heterologously expressed using the pSBAC system. A unique HindIII restriction site was precisely inserted at one of the border regions of the pikromycin biosynthetic gene cluster within the chromosome of Streptomyces venezuelae, followed by site-specific recombination of pSBAC into the flanking region of the pikromycin gene cluster. Unlike the previous cloning process, one HindIII site integration step was skipped through pSBAC modification. pPik001, a pSBAC containing the pikromycin biosynthetic gene cluster, was directly introduced into two heterologous hosts, Streptomyces lividans and Streptomyces coelicolor, resulting in the production of 10-deoxymethynolide, a major pikromycin derivative. When two entire pikromycin biosynthetic gene clusters were tandemly introduced into the S. lividans chromosome, overproduction of 10-deoxymethynolide and the presence of pikromycin, which was previously not detected, were both confirmed. Moreover, comparative qRT-PCR results confirmed that the transcription of pikromycin biosynthetic genes was significantly upregulated in S. lividans containing tandem

  12. A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease

    PubMed Central

    Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.

    2010-01-01

    Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478

  13. Elephant shark (Callorhinchus milii) provides insights into the evolution of Hox gene clusters in gnathostomes.

    PubMed

    Ravi, Vydianathan; Lam, Kevin; Tay, Boon-Hui; Tay, Alice; Brenner, Sydney; Venkatesh, Byrappa

    2009-09-22

    We have sequenced and analyzed Hox gene clusters from elephant shark, a holocephalian cartilaginous fish. Elephant shark possesses 4 Hox clusters with 45 Hox genes that include orthologs for a higher number of ancient gnathostome Hox genes than the 4 clusters in tetrapods and the supernumerary clusters in teleost fishes. Phylogenetic analysis of elephant shark Hox genes from 7 paralogous groups that contain all of the 4 members indicated an ((AB)(CD)) topology for the order of Hox cluster duplication, providing support for the 2R hypothesis (i.e., 2 rounds of whole-genome duplication during the early evolution of vertebrates). Comparisons of noncoding sequences of the elephant shark and human Hox clusters have identified a large number of conserved noncoding elements (CNEs), which represent putative cis-regulatory elements that may be involved in the regulation of Hox genes. Interestingly, in fugu more than 50% of these ancient CNEs have diverged beyond recognition in the duplicated (HoxA, HoxB, and HoxD) as well as the singleton (HoxC) Hox clusters. Furthermore, the b-paralogs of the duplicated fugu Hox clusters are virtually devoid of unique ancient CNEs. In contrast to fugu Hox clusters, elephant shark and human Hox clusters have lost fewer ancient CNEs. If these ancient CNEs are indeed enhancers directing tissue-specific expression of Hox genes, divergence of their sequences in vertebrate lineages might have led to altered expression patterns and presumably the functions of their associated Hox genes.

  14. Elephant shark (Callorhinchus milii) provides insights into the evolution of Hox gene clusters in gnathostomes

    PubMed Central

    Ravi, Vydianathan; Lam, Kevin; Tay, Boon-Hui; Tay, Alice; Brenner, Sydney; Venkatesh, Byrappa

    2009-01-01

    We have sequenced and analyzed Hox gene clusters from elephant shark, a holocephalian cartilaginous fish. Elephant shark possesses 4 Hox clusters with 45 Hox genes that include orthologs for a higher number of ancient gnathostome Hox genes than the 4 clusters in tetrapods and the supernumerary clusters in teleost fishes. Phylogenetic analysis of elephant shark Hox genes from 7 paralogous groups that contain all of the 4 members indicated an ((AB)(CD)) topology for the order of Hox cluster duplication, providing support for the 2R hypothesis (i.e., 2 rounds of whole-genome duplication during the early evolution of vertebrates). Comparisons of noncoding sequences of the elephant shark and human Hox clusters have identified a large number of conserved noncoding elements (CNEs), which represent putative cis-regulatory elements that may be involved in the regulation of Hox genes. Interestingly, in fugu more than 50% of these ancient CNEs have diverged beyond recognition in the duplicated (HoxA, HoxB, and HoxD) as well as the singleton (HoxC) Hox clusters. Furthermore, the b-paralogs of the duplicated fugu Hox clusters are virtually devoid of unique ancient CNEs. In contrast to fugu Hox clusters, elephant shark and human Hox clusters have lost fewer ancient CNEs. If these ancient CNEs are indeed enhancers directing tissue-specific expression of Hox genes, divergence of their sequences in vertebrate lineages might have led to altered expression patterns and presumably the functions of their associated Hox genes. PMID:19805301

  15. Integrative clustering by nonnegative matrix factorization can reveal coherent functional groups from gene profile data.

    PubMed

    Brdar, Sanja; Crnojević, Vladimir; Zupan, Blaz

    2015-03-01

    Recent developments in molecular biology and techniques for genome-wide data acquisition have resulted in abundance of data to profile genes and predict their function. These datasets may come from diverse sources and it is an open question how to commonly address them and fuse them into a joint prediction model. A prevailing technique to identify groups of related genes that exhibit similar profiles is profile-based clustering. Cluster inference may benefit from consensus across different clustering models. In this paper, we propose a technique that develops separate gene clusters from each of available data sources and then fuses them by means of nonnegative matrix factorization. We use gene profile data on the budding yeast S. cerevisiae to demonstrate that this approach can successfully integrate heterogeneous datasets and yield high-quality clusters that could otherwise not be inferred by simply merging the gene profiles prior to clustering.

  16. Comparative Analysis of Cluster Validity Indices in Identifying Some Possible Genes Mediating Certain Cancers.

    PubMed

    Ghosh, Anupam; Dhara, Bibhas Chandra; De, Rajat K

    2013-04-01

    In this article, we compare the performance of 19 cluster validity indices, in identifying some possible genes mediating certain cancers, based on gene expression data. For the purpose of this comparison, we have developed a method. The proposed method involves cluster generation, selection of the best k-value or c-values, cluster identification, identifying the altered gene cluster, scoring an altered gene cluster and determining the best k-value or c-value exploring through biological repositories. The effectiveness of the method has been demonstrated on three gene expression data sets dealing with human lung cancer, colon cancer, and leukemia. Here, we have used three clustering algorithms, i.e., k-means, PAM and fuzzy c-means. We have used biochemical pathways related to these cancers and p-value statistics for validating the study. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Clustered Genes Encoding the Methyltransferases of Methanogenesis from Monomethylamine

    PubMed Central

    Burke, Stephen A.; Lo, Sam L.; Krzycki, Joseph A.

    1998-01-01

    Coenzyme M (CoM) is methylated during methanogenesis from monomethyamine in a reaction catalyzed by three proteins. Using monomethylamine, a 52-kDa polypeptide termed monomethylamine methyltransferase (MMAMT) methylates the corrinoid cofactor bound to a second polypeptide, monomethylamine corrinoid protein (MMCP). Methylated MMCP then serves as a substrate for MT2-A, which methylates CoM. The genes for these proteins are clustered on 6.8 kb of DNA in Methanosarcina barkeri MS. The gene encoding MMCP (mtmC) is located directly upstream of the gene encoding MMAMT (mtmB). The gene encoding MT2-A (mtbA) was found 1.1 kb upstream of mtmC, but no obvious open reading frame was found in the intergenic region between mtbA and mtmC. A single monocistronic transcript was found for mtbA that initiated 76 bp from the translational start. Separate transcripts of 2.4 and 4.7 kb were detected, both of which carried mtmCB. The larger transcript also encoded mtmP, which is homologous to the APC family of cationic amine permeases and may therefore encode a methylamine permease. A single transcriptional start site was found 447 bp upstream of the translational start of mtmC. MtmC possesses the corrinoid binding motif found in corrinoid proteins involved in dimethylsulfide- and methanol-dependent methanogenesis, as well as in methionine synthase. The open reading frame of mtmB was interrupted by a single in-frame, midframe, UAG codon which was also found in mtmB from M. barkeri NIH. A mechanism that circumvents UAG-directed termination of translation must operate during expression of mtmB in this methanogen. PMID:9642198

  18. Clustered genes encoding the methyltransferases of methanogenesis from monomethylamine.

    PubMed

    Burke, S A; Lo, S L; Krzycki, J A

    1998-07-01

    Coenzyme M (CoM) is methylated during methanogenesis from monomethyamine in a reaction catalyzed by three proteins. Using monomethylamine, a 52-kDa polypeptide termed monomethylamine methyltransferase (MMAMT) methylates the corrinoid cofactor bound to a second polypeptide, monomethylamine corrinoid protein (MMCP). Methylated MMCP then serves as a substrate for MT2-A, which methylates CoM. The genes for these proteins are clustered on 6.8 kb of DNA in Methanosarcina barkeri MS. The gene encoding MMCP (mtmC) is located directly upstream of the gene encoding MMAMT (mtmB). The gene encoding MT2-A (mtbA) was found 1.1 kb upstream of mtmC, but no obvious open reading frame was found in the intergenic region between mtbA and mtmC. A single monocistronic transcript was found for mtbA that initiated 76 bp from the translational start. Separate transcripts of 2.4 and 4.7 kb were detected, both of which carried mtmCB. The larger transcript also encoded mtmP, which is homologous to the APC family of cationic amine permeases and may therefore encode a methylamine permease. A single transcriptional start site was found 447 bp upstream of the translational start of mtmC. MtmC possesses the corrinoid binding motif found in corrinoid proteins involved in dimethylsulfide- and methanol-dependent methanogenesis, as well as in methionine synthase. The open reading frame of mtmB was interrupted by a single in-frame, midframe, UAG codon which was also found in mtmB from M. barkeri NIH. A mechanism that circumvents UAG-directed termination of translation must operate during expression of mtmB in this methanogen.

  19. Comparative analysis of magnetosome gene clusters in magnetotactic bacteria provides further evidence for horizontal gene transfer.

    PubMed

    Jogler, Christian; Kube, Michael; Schübbe, Sabrina; Ullrich, Susanne; Teeling, Hanno; Bazylinski, Dennis A; Reinhardt, Richard; Schüler, Dirk

    2009-05-01

    The organization of magnetosome genes was analysed in all available complete or partial genomic sequences of magnetotactic bacteria (MTB), including the magnetosome island (MAI) of the magnetotactic marine vibrio strain MV-1 determined in this study. The MAI was found to differ in gene content and organization between Magnetospirillum species and strains MV-1 or MC-1. Although a similar organization of magnetosome genes was found in all MTB, distinct variations in gene order and sequence similarity were uncovered that may account for the observed diversity of biomineralization, cell biology and magnetotaxis found in various MTB. While several magnetosome genes were present in all MTB, others were confined to Magnetospirillum species, indicating that the minimal set of genes required for magnetosome biomineralization might be smaller than previously suggested. A number of novel candidate genes were implicated in magnetosome formation by gene cluster comparison. Based on phylogenetic and compositional evidence we present a model for the evolution of magnetotaxis within the Alphaproteobacteria, which suggests the independent horizontal transfer of magnetosome genes from an unknown ancestor of magnetospirilla into strains MC-1 and MV-1.

  20. Tandem repeats 3{prime} of the IGHA genes in the human immunoglobulin heavy chain gene cluster

    SciTech Connect

    Kang, H.K.; Cox, D.W. |

    1996-07-01

    The human IGH constant region spans 350 kb and includes nine genes and two pseudogenes. All of the constant region gene cluster has been cloned except for sequences between the IGHD and IGHG3 genes, between the IGHA1 and IGHA2 gene. The regions 3{prime} of the IGHA genes, which are not cloned, are of interest since transcriptional control elements were found downstream of the IGHA genes in the rat and the mouse IGH loci. In addition, by pulsed-field gel electrophoresis mapping, CpG islands were identified approximately 30 kb downstream of each IGHA gene, within the uncloned portion of the human IGH. These findings indicate that the regions 3{prime} of the IGHA genes to be unclonable by standard cloning methods. Therefore, we applied the Inverse-PCR technique to amplify the sequences flanking the IGHA1 gene. The new sequence included tandem repeats of 20 bp, which we propose is the cause of the unclonability of this region. 39 refs., 6 figs.

  1. The gsdf gene locus harbors evolutionary conserved and clustered genes preferentially expressed in fish previtellogenic oocytes.

    PubMed

    Gautier, Aude; Le Gac, Florence; Lareyre, Jean-Jacques

    2011-02-01

    display a different cellular localization compared to that of the gsdf gene indicating that the later gene is not co-regulated. Interestingly, our study identifies new clustered genes that are specifically expressed in previtellogenic oocytes (nup54, aff1, klhl8, sdad1).

  2. Identification and Functional Analysis of the Nocardithiocin Gene Cluster in Nocardia pseudobrasiliensis

    PubMed Central

    Sakai, Kanae; Komaki, Hisayuki; Gonoi, Tohru

    2015-01-01

    Nocardithiocin is a thiopeptide compound isolated from the opportunistic pathogen Nocardia pseudobrasiliensis. It shows a strong activity against acid-fast bacteria and is also active against rifampicin-resistant Mycobacterium tuberculosis. Here, we report the identification of the nocardithiocin gene cluster in N. pseudobrasiliensis IFM 0761 based on conserved thiopeptide biosynthesis gene sequence and the whole genome sequence. The predicted gene cluster was confirmed by gene disruption and complementation. As expected, strains containing the disrupted gene did not produce nocardithiocin while gene complementation restored nocardithiocin production in these strains. The predicted cluster was further analyzed using RNA-seq which showed that the nocardithiocin gene cluster contains 12 genes within a 15.2-kb region. This finding will promote the improvement of nocardithiocin productivity and its derivatives production. PMID:26588225

  3. Characterization and expression analysis of the exopolysaccharide gene cluster in Lactobacillus fermentum TDS030603.

    PubMed

    Dan, Tong; Fukuda, Kenji; Sugai-Bannai, Michiko; Takakuwa, Naoya; Motoshima, Hidemasa; Urashima, Tadasu

    2009-12-01

    Part of the exopolysaccharide gene cluster of Lactobacillus fermentum TDS030603 was characterized. It consists of 11,890 base pairs and is located in the chromosomal DNA, 13 open reading frames of which were encoded. Out of the 13 open reading frames, six were found to be involved in exopolysaccharide synthesis; however, five were similar to transposase genes of other lactobacilli, and two were functionally unrelated. Expression analysis revealed that the exopolysaccharide synthesis-related genes were expressed during cultivation. Southern analysis using specific primers for the exopolysaccharide genes indicated that duplication of the gene cluster did not occur. The plasmid-cured strain maintained its capacity for exopolysaccharide production, confirming that the exopolysaccharide gene cluster of this strain is located in the chromosomal DNA, similarly to thermophilic lactic acid bacteria. Our results indicate that this exopolysaccharide gene cluster is likely to be functional, although extensive gene rearrangement occurs.

  4. Identification and Functional Analysis of the Nocardithiocin Gene Cluster in Nocardia pseudobrasiliensis.

    PubMed

    Sakai, Kanae; Komaki, Hisayuki; Gonoi, Tohru

    2015-01-01

    Nocardithiocin is a thiopeptide compound isolated from the opportunistic pathogen Nocardia pseudobrasiliensis. It shows a strong activity against acid-fast bacteria and is also active against rifampicin-resistant Mycobacterium tuberculosis. Here, we report the identification of the nocardithiocin gene cluster in N. pseudobrasiliensis IFM 0761 based on conserved thiopeptide biosynthesis gene sequence and the whole genome sequence. The predicted gene cluster was confirmed by gene disruption and complementation. As expected, strains containing the disrupted gene did not produce nocardithiocin while gene complementation restored nocardithiocin production in these strains. The predicted cluster was further analyzed using RNA-seq which showed that the nocardithiocin gene cluster contains 12 genes within a 15.2-kb region. This finding will promote the improvement of nocardithiocin productivity and its derivatives production.

  5. A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype data.

    PubMed

    Chen, Sui-Pi; Huang, Guan-Hua

    2014-06-01

    This paper uses a Bayesian formulation of a clustering procedure to identify gene-gene interactions under case-control studies, called the Algorithm via Bayesian Clustering to Detect Epistasis (ABCDE). The ABCDE uses Dirichlet process mixtures to model SNP marker partitions, and uses the Gibbs weighted Chinese restaurant sampling to simulate posterior distributions of these partitions. Unlike the representative Bayesian epistasis detection algorithm BEAM, which partitions markers into three groups, the ABCDE can be evaluated at any given partition, regardless of the number of groups. This study also develops permutation tests to validate the disease association for SNP subsets identified by the ABCDE, which can yield results that are more robust to model specification and prior assumptions. This study examines the performance of the ABCDE and compares it with the BEAM using various simulated data and a schizophrenia SNP dataset.

  6. Epigenetic regulation of the RHOX homeobox gene cluster and its association with human male infertility.

    PubMed

    Richardson, Marcy E; Bleiziffer, Andreas; Tüttelmann, Frank; Gromoll, Jörg; Wilkinson, Miles F

    2014-01-01

    The X-linked RHOX cluster encodes a set of homeobox genes that are selectively expressed in the reproductive tract. Members of the RHOX cluster regulate target genes important for spermatogenesis promote male fertility in mice. Studies show that demethylating agents strongly upregulate the expression of mouse Rhox genes, suggesting that they are regulated by DNA methylation. However, whether this extends to human RHOX genes, whether DNA methylation directly regulates RHOX gene transcription and how this relates to human male infertility are unknown. To address these issues, we first defined the promoter regions of human RHOX genes and performed gain- and loss-of-function experiments to determine whether human RHOX gene transcription is regulated by DNA methylation. Our results indicated that DNA methylation is necessary and sufficient to silence human RHOX gene expression. To determine whether RHOX cluster methylation associates with male infertility, we evaluated the methylation status of RHOX genes in sperm from a large cohort of infertility patients. Linear regression analysis revealed a strong association between RHOX gene cluster hypermethylation and three independent types of semen abnormalities. Hypermethylation was restricted specifically to the RHOX cluster; we did not observe it in genes immediately adjacent to it on the X chromosome. Our results strongly suggest that human RHOX homeobox genes are under an epigenetic control mechanism that is aberrantly regulated in infertility patients. We propose that hypermethylation of the RHOX gene cluster serves as a marker for idiopathic infertility and that it is a candidate to exert a causal role in male infertility.

  7. Activation and Characterization of a Cryptic Polycyclic Tetramate Macrolactam Biosynthetic Gene Cluster

    PubMed Central

    Luo, Yunzi; Huang, Hua; Liang, Jing; Wang, Meng; Lu, Lu; Shao, Zengyi; Cobb, Ryan E.; Zhao, Huimin

    2014-01-01

    Polycyclic tetramate macrolactams (PTMs) are a widely distributed class of natural products with important biological activities. However, many of them have not been characterized. Here we apply a plug and play synthetic biology strategy to activate a cryptic PTM biosynthetic gene cluster SGR810-815 from Streptomyces griseus and discover three potential PTMs. This gene cluster is highly conserved in phylogenetically diverse bacterial strains and contains an unusual hybrid polyketide synthase-nonribosomal peptide synthetase (PKS-NRPS) which resembles iterative PKSs known in fungi. To further characterize this gene cluster, we use the same synthetic biology approach to create a series of gene deletion constructs and elucidate the biosynthetic steps for the formation of the polycyclic system. The strategy we employ bypasses the traditional laborious processes to elicit gene cluster expression and should be generally applicable to many other silent or cryptic gene clusters for discovery and characterization of new natural products. PMID:24305602

  8. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  9. An Ergot Alkaloid Biosynthesis Gene and Clustered Hypothetical Genes from Aspergillus fumigatus†

    PubMed Central

    Coyle, Christine M.; Panaccione, Daniel G.

    2005-01-01

    The ergot alkaloids are a family of indole-derived mycotoxins with a variety of significant biological activities. Aspergillus fumigatus, a common airborne fungus and opportunistic human pathogen, and several fungi in the relatively distant taxon Clavicipitaceae (clavicipitaceous fungi) produce different sets of ergot alkaloids. The ergot alkaloids of these divergent fungi share a four-member ergoline ring but differ in the number, type, and position of the side chains. Several genes required for ergot alkaloid production are known in the clavicipitaceous fungi, and these genes are clustered in the genome of the ergot fungus Claviceps purpurea. We investigated whether the ergot alkaloids of A. fumigatus have a common biosynthetic and genetic origin with those of the clavicipitaceous fungi. A homolog of dmaW, the gene controlling the determinant step in the ergot alkaloid pathway of clavicipitaceous fungi, was identified in the A. fumigatus genome. Knockout of dmaW eliminated all known ergot alkaloids from A. fumigatus, and complementation of the mutation restored ergot alkaloid production. Clustered with dmaW in the A. fumigatus genome are sequences corresponding to five genes previously proposed to encode steps in the ergot alkaloid pathway of C. purpurea, as well as additional sequences whose deduced protein products are consistent with their involvement in the ergot alkaloid pathway. The corresponding genes have similarities in their nucleotide sequences, but the orientations and positions within the cluster of several of these genes differ. The data indicate that the ergot alkaloid biosynthetic capabilities in A. fumigatus and the clavicipitaceous fungi had a common origin. PMID:15933009

  10. Simcluster: clustering enumeration gene expression data on the simplex space.

    PubMed

    Vêncio, Ricardo Z N; Varuzza, Leonardo; de B Pereira, Carlos A; Brentani, Helena; Shmulevich, Ilya

    2007-07-11

    Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

  11. Simcluster: clustering enumeration gene expression data on the simplex space

    PubMed Central

    2007-01-01

    Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data. PMID:17625017

  12. Unusual mutation clusters provide insight into class I gene conversion mechanisms.

    PubMed Central

    Pease, L R; Horton, R M; Pullen, J K; Yun, T J

    1993-01-01

    Genetic diversity among the K and D alleles of the mouse major histocompatibility complex is generated by gene conversion among members of the class I multigene family. The majority of known class I mutants contain clusters of nucleotide changes that can be traced to linked family members. However, the details of the gene conversion mechanism are not known. The bm3 and bm23 mutations represent exceptions to the usual pattern and provide insight into intermediates generated during the gene conversion process. Both of these variants contain clusters of five nucleotide substitutions, but they differ from the classic conversion mutants in the important respect that no donor gene for either mutation could be identified in the parental genome. Nevertheless, both mutation clusters are composed of individual mutations that do exist within the parent. Therefore, they are not random and appear to be templated. Significantly, the bm3 and bm23 mutation clusters are divided into overlapping regions that match class I genes which have functioned as donor genes in other characterized gene conversion events. The unusual structure of the mutation clusters indicates an underlying gene conversion mechanism that can generate mutation clusters as a result of the interaction of three genes in a single genetic event. The unusual mutation clusters are consistent with a hypothetical gene conversion model involving extrachromosomal intermediates. Images PMID:8321237

  13. A hypothesis to explain how laeA specifically regulates certain secondary metabolite biosynthesis gene clusters

    USDA-ARS?s Scientific Manuscript database

    Biosynthesis of mycotoxins involves transcriptional co-regulation of sets of clustered genes. We hypothesize that specific control of transcription of genes in these clusters by LaeA, a global regulator of secondary metabolite production and development in aspergilli and other filamentous fungi, re...

  14. Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants1[OPEN

    PubMed Central

    Zhang, Peifen; Kim, Taehyong; Banf, Michael; Chavali, Arvind K.; Nilo-Poyanco, Ricardo; Bernard, Thomas

    2017-01-01

    Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters. PMID:28228535

  15. Fragmentation of an aflatoxin-like gene cluster in a forest pathogen

    USDA-ARS?s Scientific Manuscript database

    Secondary metabolic pathway genes are typically clustered in fungi. An exception to this paradigm is seen for genes required for the production of dothistromin, an aflatoxin-like virulence factor produced by the pine needle pathogen Dothistroma septosporum. In contrast to the tight clustering of gen...

  16. Rough-fuzzy clustering for grouping functionally similar genes from microarray data.

    PubMed

    Maji, Pradipta; Paul, Sushmita

    2013-01-01

    Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.

  17. Identification and analysis of a highly conserved chemotaxis gene cluster in Shewanella species.

    SciTech Connect

    Li, J.; Romine, Margaret F.; Ward, M.

    2007-08-01

    A conserved cluster of chemotaxis genes was identified from the genome sequences of fifteen Shewanella species. An in-frame deletion of the cheA-3 gene, which is located in this cluster, was created in S. oneidensis MR-1 and the gene shown to be essential for chemotactic responses to anaerobic electron acceptors. The CheA-3 protein showed strong similarity to Vibrio cholerae CheA-2 and P. aeruginosa CheA-1, two proteins that are also essential for chemotaxis. The genes encoding these proteins were shown to be located in chemotaxis gene clusters closely related to the cheA-3-containing cluster in Shewanella species. The results of this study suggest that a combination of gene neighborhood and homology analyses may be used to predict which cheA genes are essential for chemotaxis in groups of closely related microorganisms.

  18. A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

    SciTech Connect

    Dehal, Paramvir S.; Boore, Jeffrey L.

    2005-08-25

    We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

  19. Recent advances in awakening silent biosynthetic gene clusters and linking orphan clusters to natural products in microorganisms.

    PubMed

    Chiang, Yi-Ming; Chang, Shu-Lin; Oakley, Berl R; Wang, Clay C C

    2011-02-01

    Secondary metabolites from microorganisms have a broad spectrum of applications, particularly in therapeutics. The growing number of sequenced microbial genomes has revealed a remarkably large number of natural product biosynthetic clusters for which the products are still unknown. These cryptic clusters are potentially a treasure house of medically useful compounds. The recent development of new methodologies has made it possible to begin unlock this treasure house, to discover new natural products and to determine their biosynthesis pathways. This review will highlight some of the most recent strategies to activate silent biosynthetic gene clusters and to elucidate their corresponding products and pathways.

  20. Base J represses genes at the end of polycistronic gene clusters in Leishmania major by promoting RNAP II termination

    PubMed Central

    Reynolds, David L.; Hofmeister, Brigitte T.; Cliffe, Laura; Siegel, T. Nicolai; Anderson, Britta A.; Beverley, Stephen M.; Schmitz, Robert J.; Sabatini, Robert

    2016-01-01

    Summary The genomes of kinetoplastids are organized into polycistronic gene clusters that are flanked by the modified DNA base J. Previous work has established a role of base J in promoting RNA polymerase II termination in Leishmania spp. where the loss of J leads to termination defects and transcription into adjacent gene clusters. It remains unclear whether these termination defects affect gene expression and whether read through transcription is detrimental to cell growth, thus explaining the essential nature of J. We now demonstrate that reduction of base J at specific sites within polycistronic gene clusters in L. major leads to read through transcription and increased expression of downstream genes in the cluster. Interestingly, subsequent transcription into the opposing polycistronic gene cluster does not lead to downregulation of sense mRNAs. These findings indicate a conserved role for J regulating transcription termination and expression of genes within polycistronic gene clusters in trypanosomatids. In contrast to the expectations often attributed to opposing transcription, the essential nature of J in Leishmania spp. is related to its role in gene repression rather than preventing transcriptional interference resulting from read through and dual strand transcription. PMID:27125778

  1. Horizontal Transfer and Death of a Fungal Secondary Metabolic Gene Cluster

    PubMed Central

    Campbell, Matthew A.; Rokas, Antonis; Slot, Jason C.

    2012-01-01

    A cluster composed of four structural and two regulatory genes found in several species of the fungal genus Fusarium (class Sordariomycetes) is responsible for the production of the red pigment bikaverin. We discovered that the unrelated fungus Botrytis cinerea (class Leotiomycetes) contains a cluster of five genes that is highly similar in sequence and gene order to the Fusarium bikaverin cluster. Synteny conservation, nucleotide composition, and phylogenetic analyses of the cluster genes indicate that the B. cinerea cluster was acquired via horizontal transfer from a Fusarium donor. Upon or subsequent to the transfer, the B. cinerea gene cluster became inactivated; one of the four structural genes is missing, two others are pseudogenes, and the fourth structural gene shows an accelerated rate of nonsynonymous substitutions along the B. cinerea lineage, consistent with relaxation of selective constraints. Interestingly, the bik4 regulatory gene is still intact and presumably functional, whereas bik5, which is a pathway-specific regulator, also shows a mild but significant acceleration of evolutionary rate along the B. cinerea lineage. This selective preservation of the bik4 regulator suggests that its conservation is due to its likely involvement in other non–bikaverin-related biological processes in B. cinerea. Thus, in addition to novel metabolism, horizontal transfer of wholesale metabolic gene clusters might also be contributing novel regulation. PMID:22294497

  2. A modified recombineering protocol for the genetic manipulation of gene clusters in Aspergillus fumigatus.

    PubMed

    Alcazar-Fuoli, Laura; Cairns, Timothy; Lopez, Jordi F; Zonja, Bozo; Pérez, Sandra; Barceló, Damià; Igarashi, Yasuhiro; Bowyer, Paul; Bignell, Elaine

    2014-01-01

    Genomic analyses of fungal genome structure have revealed the presence of physically-linked groups of genes, termed gene clusters, where collective functionality of encoded gene products serves a common biosynthetic purpose. In multiple fungal pathogens of humans and plants gene clusters have been shown to encode pathways for biosynthesis of secondary metabolites including metabolites required for pathogenicity. In the major mould pathogen of humans Aspergillus fumigatus, multiple clusters of co-ordinately upregulated genes were identified as having heightened transcript abundances, relative to laboratory cultured equivalents, during the early stages of murine infection. The aim of this study was to develop and optimise a methodology for manipulation of gene cluster architecture, thereby providing the means to assess their relevance to fungal pathogenicity. To this end we adapted a recombineering methodology which exploits lambda phage-mediated recombination of DNA in bacteria, for the generation of gene cluster deletion cassettes. By exploiting a pre-existing bacterial artificial chromosome (BAC) library of A. fumigatus genomic clones we were able to implement single or multiple intra-cluster gene replacement events at both subtelomeric and telomere distal chromosomal locations, in both wild type and highly recombinogenic A. fumigatus isolates. We then applied the methodology to address the boundaries of a gene cluster producing a nematocidal secondary metabolite, pseurotin A, and to address the role of this secondary metabolite in insect and mammalian responses to A. fumigatus challenge.

  3. A Modified Recombineering Protocol for the Genetic Manipulation of Gene Clusters in Aspergillus fumigatus

    PubMed Central

    Alcazar-Fuoli, Laura; Cairns, Timothy; Lopez, Jordi F.; Zonja, Bozo; Pérez, Sandra; Barceló, Damià; Igarashi, Yasuhiro; Bowyer, Paul; Bignell, Elaine

    2014-01-01

    Genomic analyses of fungal genome structure have revealed the presence of physically-linked groups of genes, termed gene clusters, where collective functionality of encoded gene products serves a common biosynthetic purpose. In multiple fungal pathogens of humans and plants gene clusters have been shown to encode pathways for biosynthesis of secondary metabolites including metabolites required for pathogenicity. In the major mould pathogen of humans Aspergillus fumigatus, multiple clusters of co-ordinately upregulated genes were identified as having heightened transcript abundances, relative to laboratory cultured equivalents, during the early stages of murine infection. The aim of this study was to develop and optimise a methodology for manipulation of gene cluster architecture, thereby providing the means to assess their relevance to fungal pathogenicity. To this end we adapted a recombineering methodology which exploits lambda phage-mediated recombination of DNA in bacteria, for the generation of gene cluster deletion cassettes. By exploiting a pre-existing bacterial artificial chromosome (BAC) library of A. fumigatus genomic clones we were able to implement single or multiple intra-cluster gene replacement events at both subtelomeric and telomere distal chromosomal locations, in both wild type and highly recombinogenic A. fumigatus isolates. We then applied the methodology to address the boundaries of a gene cluster producing a nematocidal secondary metabolite, pseurotin A, and to address the role of this secondary metabolite in insect and mammalian responses to A. fumigatus challenge. PMID:25372385

  4. Identification and characterization of a novel diterpene gene cluster in Aspergillus nidulans.

    PubMed

    Bromann, Kirsi; Toivari, Mervi; Viljanen, Kaarina; Vuoristo, Anu; Ruohonen, Laura; Nakari-Setälä, Tiina

    2012-01-01

    Fungal secondary metabolites are a rich source of medically useful compounds due to their pharmaceutical and toxic properties. Sequencing of fungal genomes has revealed numerous secondary metabolite gene clusters, yet products of many of these biosynthetic pathways are unknown since the expression of the clustered genes usually remains silent in normal laboratory conditions. Therefore, to discover new metabolites, it is important to find ways to induce the expression of genes in these otherwise silent biosynthetic clusters. We discovered a novel secondary metabolite in Aspergillus nidulans by predicting a biosynthetic gene cluster with genomic mining. A Zn(II)(2)Cys(6)-type transcription factor, PbcR, was identified, and its role as a pathway-specific activator for the predicted gene cluster was demonstrated. Overexpression of pbcR upregulated the transcription of seven genes in the identified cluster and led to the production of a diterpene compound, which was characterized with GC/MS as ent-pimara-8(14),15-diene. A change in morphology was also observed in the strains overexpressing pbcR. The activation of a cryptic gene cluster by overexpression of its putative Zn(II)(2)Cys(6)-type transcription factor led to discovery of a novel secondary metabolite in Aspergillus nidulans. Quantitative real-time PCR and DNA array analysis allowed us to predict the borders of the biosynthetic gene cluster. Furthermore, we identified a novel fungal pimaradiene cyclase gene as well as genes encoding 3-hydroxy-3-methyl-glutaryl-coenzyme A (HMG-CoA) reductase and a geranylgeranyl pyrophosphate (GGPP) synthase. None of these genes have been previously implicated in the biosynthesis of terpenes in Aspergillus nidulans. These results identify the first Aspergillus nidulans diterpene gene cluster and suggest a biosynthetic pathway for ent-pimara-8(14),15-diene.

  5. Organization and differential regulation of a cluster of lignin peroxidase genes of Phanerochaete chrysosporium

    Treesearch

    Philip. Stewart; Daniel. Cullen

    1999-06-01

    The lignin peroxidases of Phanerochaete chrysosporium are encoded by a minimum of 10 closely related genes. Physical and genetic mapping of a cluster of eight lip genes revealed six genes occurring in pairs and transcriptionally convergent, suggesting that portions of the lip family arose by gene duplication events. The completed sequence of 1ipG and lipJ, together...

  6. Chromosomal position effect influences the heterologous expression of genes and biosynthetic gene clusters in Streptomyces albus J1074.

    PubMed

    Bilyk, Bohdan; Horbal, Liliya; Luzhetskyy, Andriy

    2017-01-04

    Efforts to construct the Streptomyces host strain with enhanced yields of heterologous product have focussed mostly on engineering of primary metabolism and/or the deletion of endogenous biosynthetic gene clusters. However, other factors, such as chromosome compactization, have been shown to have a significant influence on gene expression levels in bacteria and fungi. The expression of genes and biosynthetic gene clusters may vary significantly depending on their location within the chromosome. Little is known about the position effect in actinomycetes, which are important producers of various industrially relevant bioactive molecules. To demonstrate an impact of the chromosomal position effect on the heterologous expression of genes and gene clusters in Streptomyces albus J1074, a transposon mutant library with randomly distributed transposon that includes a β-glucuronidase reporter gene was generated. Reporter gene expression levels have been shown to depend on the position on the chromosome. Using a combination of the transposon system and a φC31-based vector, the aranciamycin biosynthetic cluster was introduced randomly into the S. albus genome. The production levels of aranciamycin varied up to eightfold depending on the location of the gene cluster within the chromosome of S. albus J1074. One of the isolated mutant strains with an artificially introduced attachment site produced approximately 50% more aranciamycin than strains with endogenous attBs. In this study, we demonstrate that expression of the reporter gene and aranciamycin biosynthetic cluster in Streptomyces albus J1074 varies up to eightfold depending on its position on the chromosome. The integration of the heterologous cluster into different locations on the chromosome may significantly influence the titre of the produced substance. This knowledge can be used for the more efficient engineering of Actinobacteria via the relocation of the biosynthetic gene clusters and insertion of additional

  7. Improving the computational efficiency of recursive cluster elimination for gene selection.

    PubMed

    Luo, Lin-Kai; Huang, Deng-Feng; Ye, Ling-Jun; Zhou, Qi-Feng; Shao, Gui-Fang; Peng, Hong

    2011-01-01

    The gene expression data are usually provided with a large number of genes and a relatively small number of samples, which brings a lot of new challenges. Selecting those informative genes becomes the main issue in microarray data analysis. Recursive cluster elimination based on support vector machine (SVM-RCE) has shown the better classification accuracy on some microarray data sets than recursive feature elimination based on support vector machine (SVM-RFE). However, SVM-RCE is extremely time-consuming. In this paper, we propose an improved method of SVM-RCE called ISVM-RCE. ISVM-RCE first trains a SVM model with all clusters, then applies the infinite norm of weight coefficient vector in each cluster to score the cluster, finally eliminates the gene clusters with the lowest score. In addition, ISVM-RCE eliminates genes within the clusters instead of removing a cluster of genes when the number of clusters is small. We have tested ISVM-RCE on six gene expression data sets and compared their performances with SVM-RCE and linear-discriminant-analysis-based RFE (LDA-RFE). The experiment results on these data sets show that ISVM-RCE greatly reduces the time cost of SVM-RCE, meanwhile obtains comparable classification performance as SVM-RCE, while LDA-RFE is not stable.

  8. Genes for iron-sulphur cluster assembly are targets of abiotic stress in rice, Oryza sativa.

    PubMed

    Liang, Xuejiao; Qin, Lu; Liu, Peiwei; Wang, Meihuan; Ye, Hong

    2014-03-01

    Iron-sulphur (Fe-S) cluster assembly occurs in chloroplasts, mitochondria and cytosol, involving dozens of genes in higher plants. In this study, we have identified 41 putative Fe-S cluster assembly genes in rice (Oryza sativa) genome, and the expression of all genes was verified. To investigate the role of Fe-S cluster assembly as a metabolic pathway, we applied abiotic stresses to rice seedlings and analysed Fe-S cluster assembly gene expression by qRT-PCR. Our data showed that genes for Fe-S cluster assembly in chloroplasts of leaves are particularly sensitive to heavy metal treatments, and that Fe-S cluster assembly genes in roots were up-regulated in response to iron toxicity, oxidative stress and some heavy metal assault. The effect of each stress treatment on the Fe-S cluster assembly machinery demonstrated an unexpected tissue or organelle specificity, suggesting that the physiological relevance of the Fe-S cluster assembly is more complex than thought. Furthermore, our results may reveal potential candidate genes for molecular breeding of rice.

  9. Nonlinear biosynthetic gene cluster dose effect on penicillin production by Penicillium chrysogenum.

    PubMed

    Nijland, Jeroen G; Ebbendorf, Bjorg; Woszczynska, Marta; Boer, Rémon; Bovenberg, Roel A L; Driessen, Arnold J M

    2010-11-01

    Industrial penicillin production levels by the filamentous fungus Penicillium chrysogenum increased dramatically by classical strain improvement. High-yielding strains contain multiple copies of the penicillin biosynthetic gene cluster that encodes three key enzymes of the β-lactam biosynthetic pathway. We have analyzed the gene cluster dose effect on penicillin production using the high-yielding P. chrysogenum strain DS17690 that was cured from its native clusters. The amount of penicillin V produced increased with the penicillin biosynthetic gene cluster number but was saturated at high copy numbers. Likewise, transcript levels of the biosynthetic genes pcbAB [δ-(l-α-aminoadipyl)-l-cysteinyl-d-valine synthetase], pcbC (isopenicillin N synthase), and penDE (acyltransferase) correlated with the cluster copy number. Remarkably, the protein level of acyltransferase, which localizes to peroxisomes, was saturated already at low cluster copy numbers. At higher copy numbers, intracellular levels of isopenicillin N increased, suggesting that the acyltransferase reaction presents a limiting step at a high gene dose. Since the number and appearance of the peroxisomes did not change significantly with the gene cluster copy number, we conclude that the acyltransferase activity is limiting for penicillin biosynthesis at high biosynthetic gene cluster copy numbers. These results suggest that at a high penicillin production level, productivity is limited by the peroxisomal acyltransferase import activity and/or the availability of coenzyme A (CoA)-activated side chains.

  10. CTDGFinder: A Novel Homology-Based Algorithm for Identifying Closely Spaced Clusters of Tandemly Duplicated Genes.

    PubMed

    Ortiz, Juan F; Rokas, Antonis

    2017-01-01

    Closely spaced clusters of tandemly duplicated genes (CTDGs) contribute to the diversity of many phenotypes, including chemosensation, snake venom, and animal body plans. CTDGs have traditionally been identified subjectively as genomic neighborhoods containing several gene duplicates in close proximity; however, CTDGs are often highly variable with respect to gene number, intergenic distance, and synteny. This lack of formal definition hampers the study of CTDG evolutionary dynamics and the discovery of novel CTDGs in the exponentially growing body of genomic data. To address this gap, we developed a novel homology-based algorithm, CTDGFinder, which formalizes and automates the identification of CTDGs by examining the physical distribution of individual members of families of duplicated genes across chromosomes. Application of CTDGFinder accurately identified CTDGs for many well-known gene clusters (e.g., Hox and beta-globin gene clusters) in the human, mouse and 20 other mammalian genomes. Differences between previously annotated gene clusters and our inferred CTDGs were due to the exclusion of nonhomologs that have historically been considered parts of specific gene clusters, the inclusion or absence of genes between the CTDGs and their corresponding gene clusters, and the splitting of certain gene clusters into distinct CTDGs. Examination of human genes showing tissue-specific enhancement of their expression by CTDGFinder identified members of several well-known gene clusters (e.g., cytochrome P450s and olfactory receptors) and revealed that they were unequally distributed across tissues. By formalizing and automating CTDG identification, CTDGFinder will facilitate understanding of CTDG evolutionary dynamics, their functional implications, and how they are associated with phenotypic diversity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e

  11. Ancient origin of elicitin gene clusters in Phytophthora genomes.

    PubMed

    Jiang, Rays H Y; Tyler, Brett M; Whisson, Stephen C; Hardham, Adrienne R; Govers, Francine

    2006-02-01

    The genus Phytophthora belongs to the oomycetes in the eukaryotic stramenopile lineage and is comprised of over 65 species that are all destructive plant pathogens on a wide range of dicotyledons. Phytophthora produces elicitins (ELIs), a group of extracellular elicitor proteins that cause a hypersensitive response in tobacco. Database mining revealed several new classes of elicitin-like (ELL) sequences with diverse elicitin domains in Phytophthora infestans, Phytophthora sojae, Phytophthora brassicae, and Phytophthora ramorum. ELIs and ELLs were shown to be unique to Phytophthora and Pythium species. They are ubiquitous among Phytophthora species and belong to one of the most highly conserved and complex protein families in the Phytophthora genus. Phylogeny construction with elicitin domains derived from 156 ELIs and ELLs showed that most of the diversified family members existed prior to divergence of Phytophthora species from a common ancestor. Analysis to discriminate diversifying and purifying selection showed that all 17 ELI and ELL clades are under purifying selection. Within highly similar ELI groups there was no evidence for positively selected amino acids suggesting that purifying selection contributes to the continued existence of this diverse protein family. Characteristic cysteine spacing patterns were found for each phylogenetic clade. Except for the canonical clade ELI-1, ELIs and ELLs possess C-terminal domains of variable length, many of which have a high threonine, serine, or proline content suggesting an association with the cell wall. In addition, some ELIs and ELLs have a predicted glycosylphosphatidylinositol site suggesting anchoring of the C-terminal domain to the cell membrane. The eli and ell genes belonging to different clades are clustered in the genomes. Overall, eli and ell genes are expressed at different levels and in different life cycle stages but those sharing the same phylogenetic clade appear to have similar expression patterns.

  12. Arrangement of the Clostridium baratii F7 Toxin Gene Cluster with Identification of a σ Factor That Recognizes the Botulinum Toxin Gene Cluster Promoters

    DOE PAGES

    Dover, Nir; Barash, Jason R.; Burke, Julianne N.; ...

    2014-05-22

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bontmore » gene that is part of a toxin gene cluster that includes several accessory genes. In this paper, we sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. Finally, this TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.« less

  13. Cloning and characterization of the goadsporin biosynthetic gene cluster from Streptomyces sp. TP-A0584.

    PubMed

    Onaka, Hiroyasu; Nakaho, Mizuho; Hayashi, Keiko; Igarashi, Yasuhiro; Furumai, Tamotsu

    2005-12-01

    The biosynthetic gene cluster of goadsporin, a polypeptide antibiotic containing thiazole and oxazole rings, was cloned from Streptomyces sp. TP-A0584. The cluster contains a structural gene, godA, and nine god (goadsporin) genes involved in post-translational modification, immunity and transcriptional regulation. Although the gene organization is similar to typical bacteriocin biosynthetic gene clusters, each goadsporin biosynthetic gene shows low homology to these genes. Goadsporin biosynthesis is initiated by the translation of godA, and the subsequent cyclization, dehydration and acetylation are probably catalysed by godD, godE, godF, godG and godH gene products. godI shows high similarity to the 54 kDa subunit of the signal recognition particle and plays an important role in goadsporin immunity. Furthermore, four goadsporin analogues were produced by site-directed mutagenesis of godA, suggesting that this biosynthesis machinery is used for the heterocyclization of peptides.

  14. Using Multi-Instance Hierarchical Clustering Learning System to Predict Yeast Gene Function

    PubMed Central

    Liao, Bo; Li, Yun; Jiang, Yan; Cai, Lijun

    2014-01-01

    Time-course gene expression datasets, which record continuous biological processes of genes, have recently been used to predict gene function. However, only few positive genes can be obtained from annotation databases, such as gene ontology (GO). To obtain more useful information and effectively predict gene function, gene annotations are clustered together to form a learnable and effective learning system. In this paper, we propose a novel multi-instance hierarchical clustering (MIHC) method to establish a learning system by clustering GO and compare this method with other learning system establishment methods. Multi-label support vector machine classifier and multi-label K-nearest neighbor classifier are used to verify these methods in four yeast time-course gene expression datasets. The MIHC method shows good performance, which serves as a guide to annotators or refines the annotation in detail. PMID:24621610

  15. The urease gene cluster of Vibrio parahaemolyticus does not influence the expression of the thermostable direct hemolysin (TDH) gene or the TDH-related hemolysin gene.

    PubMed

    Nakaguchi, Yoshitsugu; Okuda, Jun; Iida, Tetsuya; Nishibuchi, Mitsuaki

    2003-01-01

    In order to investigate why the thermostable direct hemolysin (TDH) and the TDH-related hemolysin (TRH) of Vibrio parahaemolyticus are produced at low levels from urease-positive strains, the effect of the functional urease gene cluster of V. parahaemolyticus on the expression of the tdh and trh genes was examined. Transcriptional lacZ fusions with the tdh1, tdh2, trh1 and trh2 genes representing variants of the tdh and trh genes were integrated into the chromosome of an Escherichia coli strain and a urease-negative V. parahaemolyticus strain. The plasmid-borne urease gene cluster introduced and expressed in these constructs did not affect expression of any of the fusion genes. The amount of TDH produced from a Kanagawa phenomenon-positive V. parahaemolyticus did not change by introduction of the urease gene cluster either. It was concluded therefore that the urease gene cluster is not involved in the regulation of tdh and trh expression.

  16. Precise cloning and tandem integration of large polyketide biosynthetic gene cluster using Streptomyces artificial chromosome system.

    PubMed

    Nah, Hee-Ju; Woo, Min-Woo; Choi, Si-Sun; Kim, Eung-Soo

    2015-09-16

    Direct cloning combined with heterologous expression of a secondary metabolite biosynthetic gene cluster has become a useful strategy for production improvement and pathway modification of potentially valuable natural products present at minute quantities in original isolates of actinomycetes. However, precise cloning and efficient overexpression of an entire biosynthetic gene cluster remains challenging due to the ineffectiveness of current genetic systems in manipulating large-sized gene clusters for heterologous as well as homologous expression. A versatile Escherichia coli-Streptomyces shuttle bacterial artificial chromosomal (BAC) conjugation vector, pSBAC, was used along with a cluster tandem integration approach to carry out homologous and heterologous overexpression of a large 80-kb polyketide biosynthetic pathway gene cluster of tautomycetin (TMC), which is a protein phosphatase PP1/PP2A inhibitor and T cell-specific immunosuppressant. Unique XbaI restriction sites were precisely inserted at both border regions of the TMC biosynthetic gene cluster within the chromosome of TMC-producing Streptomyces sp. CK4412, followed by site-specific recombination of pSBAC into the flanking region of the TMC gene cluster. The entire TMC gene cluster was then rescued as a single giant recombinant pSBAC by XbaI digestion of the chromosomal DNA as well as subsequent self-ligation. Next, the recombinant pSBAC construct containing the entire TMC cluster in E. coli was directly conjugated into model Streptomyces strains, resulting in rapid and enhanced TMC production. Moreover, introduction of the TMC cluster-containing pSBAC into wild-type Streptomyces sp. CK4412 as well as a recombinant S. coelicolor strain resulted in a chromosomal tandem repeat of the entire TMC cluster with 14-fold and 5.4-fold enhanced TMC productivities, respectively. The 80-kb TMC biosynthetic gene cluster was isolated in a single integration vector, pSBAC. Introduction of TMC biosynthetic gene cluster

  17. A recently transferred cluster of bacterial genes in Trichomonas vaginalis - lateral gene transfer and the fate of acquired genes

    PubMed Central

    2014-01-01

    Background Lateral Gene Transfer (LGT) has recently gained recognition as an important contributor to some eukaryote proteomes, but the mechanisms of acquisition and fixation in eukaryotic genomes are still uncertain. A previously defined norm for LGTs in microbial eukaryotes states that the majority are genes involved in metabolism, the LGTs are typically localized one by one, surrounded by vertically inherited genes on the chromosome, and phylogenetics shows that a broad collection of bacterial lineages have contributed to the transferome. Results A unique 34 kbp long fragment with 27 clustered genes (TvLF) of prokaryote origin was identified in the sequenced genome of the protozoan parasite Trichomonas vaginalis. Using a PCR based approach we confirmed the presence of the orthologous fragment in four additional T. vaginalis strains. Detailed sequence analyses unambiguously suggest that TvLF is the result of one single, recent LGT event. The proposed donor is a close relative to the firmicute bacterium Peptoniphilus harei. High nucleotide sequence similarity between T. vaginalis strains, as well as to P. harei, and the absence of homologs in other Trichomonas species, suggests that the transfer event took place after the radiation of the genus Trichomonas. Some genes have undergone pseudogenization and degradation, indicating that they may not be retained in the future. Functional annotations reveal that genes involved in informational processes are particularly prone to degradation. Conclusions We conclude that, although the majority of eukaryote LGTs are single gene occurrences, they may be acquired in clusters of several genes that are subsequently cleansed of evolutionarily less advantageous genes. PMID:24898731

  18. High-throughput platform for the discovery of elicitors of silent bacterial gene clusters

    PubMed Central

    Seyedsayamdost, Mohammad R.

    2014-01-01

    Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as “cryptic” or “silent” to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria. PMID:24808135

  19. Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data.

    PubMed

    Liu, Weixiang; Yuan, Kehong

    2008-01-01

    Nonnegative Matrix Factorization (NMF) is a powerful tool for gene expression data analysis as it reduces thousands of genes to a few compact metagenes, especially in clustering gene expression samples for cancer class discovery. Enhancing sparseness of the factorisation can find only a few dominantly coexpressed metagenes and improve the clustering effectiveness. Sparse p-norm (p > 1) Nonnegative Matrix Factorization (Sp-NMF) is a more sparse representation method using high order norm to normalise the decomposed components. In this paper, we investigate the benefit of high order normalisation for clustering cancer-related gene expression samples. Experimental results demonstrate that Sp-NMF leads to robust and effective clustering in both automatically determining the cluster number, and achieving high accuracy.

  20. Engineered Streptomyces avermitilis host for heterologous expression of biosynthetic gene cluster for secondary metabolites

    PubMed Central

    KOMATSU, MAMORU; KOMATSU, KYOKO; KOIWAI, HANAE; YAMADA, YUUKI; KOZONE, IKUKO; IZUMIKAWA, MIHO; HASHIMOTO, JUNKO; TAKAGI, MOTOKI; OMURA, SATOSHI; SHIN-YA, KAZUO; CANE, DAVID E.; IKEDA, HARUO

    2014-01-01

    An industrial microorganism Streptomyces avermitilis, which is a producer of anthelmintic macrocyclic lactones, avermectins, has been constructed as a versatile model host for heterologous expression of genes encoding secondary metabolite biosynthesis. Twenty of the entire biosynthetic gene clusters for secondary metabolites were successively cloned and introduced into a versatile model host S. avermitilis SUKA17 or 22. Almost all S. avermitilis transformants carrying the entire gene cluster produced metabolites as a result of the expression of biosynthetic gene clusters introduced. A few transformants were unable to produce metabolites but their production was restored by the expression of biosynthetic genes using an alternative promoter or the expression of a regulatory gene in the gene cluster that controls the expression of biosynthetic genes in the cluster using an alternative promoter. Production of metabolites in some transformants of the versatile host was higher than that of the original producers and cryptic biosynthetic gene clusters in the original producer were also expressed in a versatile host. PMID:23654282

  1. The cylindrospermopsin gene cluster of Aphanizomenon sp. strain 10E6: organization and recombination.

    PubMed

    Stüken, Anke; Jakobsen, Kjetill S

    2010-08-01

    Cylindrospermopsin (CYN), a potent hepatoxin, occurs in freshwaters worldwide. Several cyanobacterial species produce the toxin, but the producing species vary between geographical regions. Aphanizomenon flos-aquae, a common algae species in temperate fresh and brackish waters, is one of the three well-documented CYN producers in European waters. So far, no genetic information on the CYN genes of this species has been available. Here, we describe the complete CYN gene cluster, including flanking regions from the German Aphanizomenon sp. strain 10E6 using a full genome sequencing approach by 454 pyrosequencing and bioinformatic identification of the gene cluster. In addition, we have sequenced a approximately 7 kb fragment covering the genes cyrC (partially), cyrA and cyrB (partially) of the same gene cluster in the CYN-producing Aphanizomenon sp. strains 10E9 and 22D11. Comparisons with the orthologous gene clusters of the Australian Cylindrospermopsis raciborskii strains AWT205 and CS505 and the partial gene cluster of the Israeli Aphanizomenon ovalisporum strain ILC-146 revealed a high gene sequence similarity, but also extensive rearrangements of gene order. The high sequence similarity (generally higher than that of 16S rRNA gene fragments from the same strains), atypical GC-content and signs of transposase activities support the suggestion that the CYN genes have been horizontally transferred.

  2. Improvement of gougerotin and nikkomycin production by engineering their biosynthetic gene clusters.

    PubMed

    Du, Deyao; Zhu, Yu; Wei, Junhong; Tian, Yuqing; Niu, Guoqing; Tan, Huarong

    2013-07-01

    Nikkomycins and gougerotin are peptidyl nucleoside antibiotics with broad biological activities. The nikkomycin biosynthetic gene cluster comprises one pathway-specific regulatory gene (sanG) and 21 structural genes, whereas the gene cluster for gougerotin biosynthesis includes one putative regulatory gene, one major facilitator superfamily transporter gene, and 13 structural genes. In the present study, we introduced sanG driven by six different promoters into Streptomyces ansochromogenes TH322. Nikkomycin production was increased significantly with the highest increase in engineered strain harboring hrdB promoter-driven sanG. In the meantime, we replaced the native promoter of key structural genes in the gougerotin (gou) gene cluster with the hrdB promoters. The heterologous producer Streptomyces coelicolor M1146 harboring the modified gene cluster produced gougerotin up to 10-fold more than strains carrying the unmodified cluster. Therefore, genetic manipulations of genes involved in antibiotics biosynthesis with the constitutive hrdB promoter present a robust, easy-to-use system generally useful for the improvement of antibiotics production in Streptomyces.

  3. Discovery and synthetic refactoring of tryptophan dimer gene clusters from the environment.

    PubMed

    Chang, Fang-Yuan; Ternei, Melinda A; Calle, Paula Y; Brady, Sean F

    2013-11-27

    Here we investigate bacterial tryptophan dimer (TD) biosynthesis by probing environmental DNA (eDNA) libraries for chromopyrrolic acid (CPA) synthase genes. Functional and bioinformatics analyses of TD clusters indicate that CPA synthase gene sequences diverge in concert with the functional output of their respective clusters, making this gene a powerful tool for guiding the discovery of novel TDs from the environment. Twelve unprecedented TD biosynthetic gene clusters that can be arranged into five groups (A-E) based on their ability to generate distinct TD core substructures were recovered from eDNA libraries. Four of these groups contain clusters from both cultured and culture independent studies, while the remaining group consists entirely of eDNA-derived clusters. The complete synthetic refactoring of a representative gene cluster from the latter eDNA specific group led to the characterization of the erdasporines, cytotoxins with a novel carboxy-indolocarbazole TD substructure. Analysis of CPA synthase genes in crude eDNA suggests the presence of additional TD gene clusters in soil environments.

  4. Sphingolipids regulate telomere clustering by affecting the transcription of genes involved in telomere homeostasis.

    PubMed

    Ikeda, Atsuko; Muneoka, Tetsuya; Murakami, Suguru; Hirota, Ayaka; Yabuki, Yukari; Karashima, Takefumi; Nakazono, Kota; Tsuruno, Masahiro; Pichler, Harald; Shirahige, Katsuhiko; Kodama, Yukiko; Shimamoto, Toshi; Mizuta, Keiko; Funato, Kouichi

    2015-07-15

    In eukaryotic organisms, including mammals, nematodes and yeasts, the ends of chromosomes, telomeres are clustered at the nuclear periphery. Telomere clustering is assumed to be functionally important because proper organization of chromosomes is necessary for proper genome function and stability. However, the mechanisms and physiological roles of telomere clustering remain poorly understood. In this study, we demonstrate a role for sphingolipids in telomere clustering in the budding yeast Saccharomyces cerevisiae. Because abnormal sphingolipid metabolism causes downregulation of expression levels of genes involved in telomere organization, sphingolipids appear to control telomere clustering at the transcriptional level. In addition, the data presented here provide evidence that telomere clustering is required to protect chromosome ends from DNA-damage checkpoint signaling. As sphingolipids are found in all eukaryotes, we speculate that sphingolipid-based regulation of telomere clustering and the protective role of telomere clusters in maintaining genome stability might be conserved in eukaryotes.

  5. Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

    PubMed Central

    Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

    2015-01-01

    The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694

  6. Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster.

    PubMed

    Cary, J W; Han, Z; Yin, Y; Lohmar, J M; Shantappa, S; Harris-Coward, P Y; Mack, B; Ehrlich, K C; Wei, Q; Arroyo-Manzanares, N; Uka, V; Vanhaecke, L; Bhatnagar, D; Yu, J; Nierman, W C; Johns, M A; Sorensen, D; Shen, H; De Saeger, S; Diana Di Mavungu, J; Calvo, A M

    2015-10-01

    The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  7. The B-type lamin is required for somatic repression of testis-specific gene clusters

    PubMed Central

    Shevelyov, Y. Y.; Lavrov, S. A.; Mikhaylova, L. M.; Nurminsky, I. D.; Kulathinal, R. J.; Egorova, K. S.; Rozovsky, Y. M.; Nurminsky, D. I.

    2009-01-01

    Large clusters of coexpressed tissue-specific genes are abundant on chromosomes of diverse species. The genes coordinately misexpressed in diverse diseases are also found in similar clusters, suggesting that evolutionarily conserved mechanisms regulate expression of large multigenic regions both in normal development and in its pathological disruptions. Studies on individual loci suggest that silent clusters of coregulated genes are embedded in repressed chromatin domains, often localized to the nuclear periphery. To test this model at the genome-wide scale, we studied transcriptional regulation of large testis-specific gene clusters in somatic tissues of Drosophila. These gene clusters showed a drastic paucity of known expressed transgene insertions, indicating that they indeed are embedded in repressed chromatin. Bioinformatics analysis suggested the major role for the B-type lamin, LamDmo, in repression of large testis-specific gene clusters, showing that in somatic cells as many as three-quarters of these clusters interact with LamDmo. Ablation of LamDmo by using mutants and RNAi led to detachment of testis-specific clusters from nuclear envelope and to their selective transcriptional up-regulation in somatic cells, thus providing the first direct evidence for involvement of the B-type lamin in tissue-specific gene repression. Finally, we found that transcriptional activation of the lamina-bound testis-specific gene cluster in male germ line is coupled with its translocation away from the nuclear envelope. Our studies, which directly link nuclear architecture with coordinated regulation of tissue-specific genes, advance understanding of the mechanisms underlying both normal cell differentiation and developmental disorders caused by lesions in the B-type lamins and interacting proteins. PMID:19218438

  8. Birth of Four Chimeric Plastid Gene Clusters in Japanese Umbrella Pine

    PubMed Central

    Hsu, Chih-Yao; Wu, Chung-Shien; Chaw, Shu-Miaw

    2016-01-01

    Many genes in the plastid genomes (plastomes) of plants are organized as gene clusters, in which genes are co-transcribed, resembling bacterial operons. These plastid operons are highly conserved, even among conifers, whose plastomes are highly rearranged relative to other seed plants. We have determined the complete plastome sequence of Sciadopitys verticillata (Japanese umbrella pine), the sole member of Sciadopityaceae. The Sciadopitys plastome is characterized by extensive inversions, pseudogenization of four tRNA genes after tandem duplications, and a unique pair of 370-bp inverted repeats involved in the formation of isomeric plastomes. We showed that plastomic inversions in Sciadopitys have led to shuffling of the remote conserved operons, resulting in the birth of four chimeric gene clusters. Our data also demonstrated that the relocated genes can be co-transcribed in these chimeric gene clusters. The plastome of Sciadopitys advances our current understanding of how the conifer plastomes have evolved toward increased diversity and complexity. PMID:27269365

  9. Improved efficiency in amplification of Escherichia coli o-antigen gene clusters using genome-wide sequence comparison

    USDA-ARS?s Scientific Manuscript database

    Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...

  10. Isolation and Characterization of the Gibberellin Biosynthetic Gene Cluster in Sphaceloma manihoticola▿ †

    PubMed Central

    Bömke, Christiane; Rojas, Maria Cecilia; Gong, Fan; Hedden, Peter; Tudzynski, Bettina

    2008-01-01

    Gibberellins (GAs) are tetracyclic diterpenoid phytohormones that were first identified as secondary metabolites of the fungus Fusarium fujikuroi (teleomorph, Gibberella fujikuroi). GAs were also found in the cassava pathogen Sphaceloma manihoticola, but the spectrum of GAs differed from that in F. fujikuroi. In contrast to F. fujikuroi, the GA biosynthetic pathway has not been studied in detail in S. manihoticola, and none of the GA biosynthetic genes have been cloned from the species. Here, we present the identification of the GA biosynthetic gene cluster from S. manihoticola consisting of five genes encoding a bifunctional ent-copalyl/ent-kaurene synthase (CPS/KS), a pathway-specific geranylgeranyl diphosphate synthase (GGS2), and three cytochrome P450 monooxygenases. The functions of all of the genes were analyzed either by a gene replacement approach or by complementing the corresponding F. fujikuroi mutants. The cluster organization and gene functions are similar to those in F. fujikuroi. However, the two border genes in the Fusarium cluster encoding the GA4 desaturase (DES) and the 13-hydroxylase (P450-3) are absent in the S. manihoticola GA gene cluster, consistent with the spectrum of GAs produced by this fungus. The close similarity between the two GA gene clusters, the identical gene functions, and the conserved intron positions suggest a common evolutionary origin despite the distant relatedness of the two fungi. PMID:18567680

  11. Comparative and Genetic Analyses of the Putative Vibrio cholerae Lipopolysaccharide Core Oligosaccharide Biosynthesis (wav) Gene Cluster

    PubMed Central

    Nesper, Jutta; Kraiß, Anita; Schild, Stefan; Blaβ, Julia; Klose, Karl E.; Bockemühl, Jochen; Reidl, Joachim

    2002-01-01

    We identified five different putative wav gene cluster types, which are responsible for the synthesis of the core oligosaccharide (OS) region of Vibrio cholerae lipopolysaccharide. Preliminary evidence that the genes encoded by this cluster are involved in core OS biosynthesis came from analysis of the recently released O1 El Tor V. cholerae genome sequence and sodium dodecyl sulfate-polyacrylamide gel electrophoresis analysis of O1 El Tor mutant strains defective in three genes (waaF, waaL, and wavB). Investigations of 38 different V. cholerae strains by Southern blotting, PCR, and sequencing analyses showed that the O1 El Tor wav gene cluster type is prevalent among clinical isolates of different serogroups associated with cholera and environmental O1 strains. In contrast, we found differences in the wav gene contents of 19 unrelated non-O1, non-O139 environmental and human isolates not associated with cholera. These strains contained four new wav gene cluster types that differ from each other in distinct gene loci, providing evidence for horizontal transfer of wav genes and for limited structural diversity of the core OS among V. cholerae isolates. Our results show genetic diversity in the core OS biosynthesis gene cluster and predominance of the type 1 wav gene locus in strains associated with clinical cholera, suggesting that a specific core OS structure could contribute to V. cholerae virulence. PMID:11953379

  12. Measuring Cluster Stability in a Large Scale Phylogenetic Analysis of Functional Genes in Metagenomes Using pplacer.

    PubMed

    Land, Tyler A; Fizzano, Perry; Kodner, Robin B

    2016-01-01

    Analysis of metagenomic sequence data requires a multi-stage workflow. The results of each intermediate step possess an inherent uncertainty and potentially impact the as-yet-unmeasured statistical significance of downstream analyses. Here, we describe our phylogenetic analysis pipeline which uses the pplacer program to place many shotgun sequences corresponding to a single functional gene onto a fixed phylogenetic tree. We then use the squash clustering method to compare multiple samples with respect to that gene. We approximate the statistical significance of each gene's clustering result by measuring its cluster stability, the consistency of that clustering result when the probabilistic placements made by pplacer are systematically reassigned and then clustered again, as measured by the adjusted Rand Index. We find that among the genes investigated, the majority of analyses are stable, based on the average adjusted Rand Index. We investigated properties of each gene that may explain less stable results. These genes tended to have less convex reference trees, less total reads recruited to the gene, and a greater Expected Distance between Placement Locations as given by pplacer when examined in aggregate. However, for an individual functional gene, these measures alone do not predict cluster stability.

  13. A rough set based rational clustering framework for determining correlated genes.

    PubMed

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters.

  14. Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.

    PubMed

    Rasmussen, Carl Edward; de la Cruz, Bernard J; Ghahramani, Zoubin; Wild, David L

    2009-01-01

    Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.

  15. Characterization of a plasmid-encoded urease gene cluster found in members of the family Enterobacteriaceae.

    PubMed

    D'Orazio, S E; Collins, C M

    1993-03-01

    Plasmid-encoded urease gene clusters found in uropathogenic isolates of Escherichia coli, Providencia stuartii, and Salmonella cubana demonstrated DNA homology, similar positions of restriction endonuclease cleavage sites, and manners of urease expression and therefore represent the same locus. DNA sequence analysis indicated that the plasmid-encoded urease genes are closely related to the Proteus mirabilis urease genes.

  16. Leveraging long sequencing reads to investigate R-gene clustering and variation in sugar beet

    USDA-ARS?s Scientific Manuscript database

    Host-pathogen interactions are of prime importance to modern agriculture. Plants utilize various types of resistance genes to mitigate pathogen damage. Identification of the specific gene responsible for a specific resistance can be difficult due to duplication and clustering within R-gene families....

  17. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary

  18. A cross-species bi-clustering approach to identifying conserved co-regulated genes

    PubMed Central

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-01-01

    Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. Results: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on

  19. Picocyanobacteria containing a novel pigment gene cluster dominate the brackish water Baltic Sea.

    PubMed

    Larsson, John; Celepli, Narin; Ininbergs, Karolina; Dupont, Christopher L; Yooseph, Shibu; Bergman, Bigitta; Ekman, Martin

    2014-09-01

    Photoautotrophic picocyanobacteria harvest light via phycobilisomes (PBS) consisting of the pigments phycocyanin (PC) and phycoerythrin (PE), encoded by genes in conserved gene clusters. The presence and arrangement of these gene clusters give picocyanobacteria characteristic light absorption properties and allow the colonization of specific ecological niches. To date, a full understanding of the evolution and distribution of the PBS gene cluster in picocyanobacteria has been hampered by the scarcity of genome sequences from fresh- and brackish water-adapted strains. To remediate this, we analysed genomes assembled from metagenomic samples collected along a natural salinity gradient, and over the course of a growth season, in the Baltic Sea. We found that while PBS gene clusters in picocyanobacteria sampled in marine habitats were highly similar to known references, brackish-adapted genotypes harboured a novel type not seen in previously sequenced genomes. Phylogenetic analyses showed that the novel gene cluster belonged to a clade of uncultivated picocyanobacteria that dominate the brackish Baltic Sea throughout the summer season, but are uncommon in other examined aquatic ecosystems. Further, our data suggest that the PE genes were lost in the ancestor of PC-containing coastal picocyanobacteria and that multiple horizontal gene transfer events have re-introduced PE genes into brackish-adapted strains, including the novel clade discovered here.

  20. Bacillus cereus-type polyhydroxyalkanoate biosynthetic gene cluster contains R-specific enoyl-CoA hydratase gene.

    PubMed

    Kihara, Takahiro; Hiroe, Ayaka; Ishii-Hyakutake, Manami; Mizuno, Kouhei; Tsuge, Takeharu

    2017-08-01

    Bacillus cereus and Bacillus megaterium both accumulate polyhydroxyalkanoate (PHA) but their PHA biosynthetic gene (pha) clusters that code for proteins involved in PHA biosynthesis are different. Namely, a gene encoding MaoC-like protein exists in the B. cereus-type pha cluster but not in the B. megaterium-type pha cluster. MaoC-like protein has an R-specific enoyl-CoA hydratase (R-hydratase) activity and is referred to as PhaJ when involved in PHA metabolism. In this study, the pha cluster of B. cereus YB-4 was characterized in terms of PhaJ's function. In an in vitro assay, PhaJ from B. cereus YB-4 (PhaJYB4) exhibited hydration activity toward crotonyl-CoA. In an in vivo assay using Escherichia coli as a host for PHA accumulation, the recombinant strain expressing PhaJYB4 and PHA synthase led to increased PHA accumulation, suggesting that PhaJYB4 functioned as a monomer supplier. The monomer composition of the accumulated PHA reflected the substrate specificity of PhaJYB4, which appeared to prefer short chain-length substrates. The pha cluster from B. cereus YB-4 functioned to accumulate PHA in E. coli; however, it did not function when the phaJYB4 gene was deleted. The B. cereus-type pha cluster represents a new example of a pha cluster that contains the gene encoding PhaJ.

  1. Association of interleukin 1 gene cluster and interleukin 1 receptor gene polymorphisms with ischemic heart failure.

    PubMed

    Mahmoudi, M J; Taghvaei, M; Harsini, S; Amirzargar, A A; Hedayat, M; Mahmoudi, M; Nematipour, E; Farhadi, E; Esfahanian, N; Sadr, M; Nourijelyani, K; Rezaei, N

    Proinflammatory cytokines have been known to play a considerable part in the pathomechanisms of chronic heart failure (CHF). Given the importance of proinflammatory cytokines in the context of the failing heart, we assessed whether the polymorphisms of interleukin (IL)-1 gene cluster, including IL-1α, IL-1β, and IL-1 receptor antagonist (IL-1RA) and IL-1R gene are predictors of CHF due to ischemic heart disease. Forty- three patients with ischemic heart failure were recruited in this study as patients group and compared with 140 healthy unrelated control subjects. Using polymerase chain reaction with sequence-specific primers method, the allele and genotype frequency of 5 single nucleotide polymorphisms (SNPs) within the IL-1α (-889), IL-1β (-511, +3962), IL-1R (psti 1970), and IL-1RA (mspa1 11100) genes were determined. The frequency of the IL-1β -511/C allele was significantly higher in the patient group compared to that in the control group (p = 0.031). The IL-1β (-511) C/C genotype was significantly overrepresented in patients compared to controls (p = 0.022). Particular allele and genotype in IL-1β gene were overrepresented in patients with ischemic heart failure, possibly affecting the individual susceptibility to this disease (Tab. 1, Ref. 27).

  2. High or low correlation between co-occuring gene clusters and 16S rRNA gene phylogeny.

    PubMed

    Rudi, Knut; Sekelja, Monika

    2013-02-01

    Ribosomal RNA (rRNA) genes are universal for all living organisms. Yet, the correspondence between genome composition and rRNA phylogeny remains poorly known. The aim of this study was to use the information from genome sequence databases to address the correlation between rRNA gene phylogeny and total gene composition in bacteria. This was done by analysing 327 genomes with TIGRFAM functional gene annotations. Our approach consisted of two steps. First, we searched for discriminatory clusters of co-occurring genes. Using a multivariate statistical approach, we identified 11 such clusters which contain genes that were co-occurring only in a subset of genomes and contributed to explain the gene content differences between genome subsets. Second, we mapped the discovered clusters to 16S rRNA-based phylogeny and calculated the correlation between co-occuring genes and phylogeny. Six of the 11 clusters exhibited significant correlation with 16S rRNA gene phylogeny. The most distinct phylogenetic finding was a high correlation between iron-sulfur oxidoreductases in combination with carbon nitrogen ligases and Chlorobium. The other correlations identified covered relatively large phylogroups: Actinobacteria were positively associated with kinases, while Gammaproteobacteria were positively associated with methylases and acyltransferases. The suggested functional differences between higher phylogroups, however, need experimental verification.

  3. The evolutionary life cycle of the polysaccharide biosynthetic gene cluster based on the Sphingomonadaceae.

    PubMed

    Wu, Mengmeng; Huang, Haidong; Li, Guoqiang; Ren, Yi; Shi, Zhong; Li, Xiaoyan; Dai, Xiaohui; Gao, Ge; Ren, Mengnan; Ma, Ting

    2017-04-21

    Although clustering of genes from the same metabolic pathway is a widespread phenomenon, the evolution of the polysaccharide biosynthetic gene cluster remains poorly understood. To determine the evolution of this pathway, we identified a scattered production pathway of the polysaccharide sanxan by Sphingomonas sanxanigenens NX02, and compared the distribution of genes between sphingan-producing and other Sphingomonadaceae strains. This allowed us to determine how the scattered sanxan pathway developed, and how the polysaccharide gene cluster evolved. Our findings suggested that the evolution of microbial polysaccharide biosynthesis gene clusters is a lengthy cyclic process comprising cluster 1 → scatter → cluster 2. The sanxan biosynthetic pathway proved the existence of a dispersive process. We also report the complete genome sequence of NX02, in which we identified many unstable genetic elements and powerful secretion systems. Furthermore, nine enzymes for the formation of activated precursors, four glycosyltransferases, four acyltransferases, and four polymerization and export proteins were identified. These genes were scattered in the NX02 genome, and the positive regulator SpnA of sphingans synthesis could not regulate sanxan production. Finally, we concluded that the evolution of the sanxan pathway was independent. NX02 evolved naturally as a polysaccharide producing strain over a long-time evolution involving gene acquisitions and adaptive mutations.

  4. Efficient transfer of two large secondary metabolite pathway gene clusters into heterologous hosts by transposition

    PubMed Central

    Fu, Jun; Wenzel, Silke C.; Perlova, Olena; Wang, Junping; Gross, Frank; Tang, Zhiru; Yin, Yulong; Stewart, A. Francis; Zhang, Youming

    2008-01-01

    Horizontal gene transfer by transposition has been widely used for transgenesis in prokaryotes. However, conjugation has been preferred for transfer of large transgenes, despite greater restrictions of host range. We examine the possibility that transposons can be used to deliver large transgenes to heterologous hosts. This possibility is particularly relevant to the expression of large secondary metabolite gene clusters in various heterologous hosts. Recently, we showed that the engineering of large gene clusters like type I polyketide/nonribosomal peptide pathways for heterologous expression is no longer a bottleneck. Here, we apply recombineering to engineer either the epothilone (epo) or myxochromide S (mchS) gene cluster for transpositional delivery and expression in heterologous hosts. The 58-kb epo gene cluster was fully reconstituted from two clones by stitching. Then, the epo promoter was exchanged for a promoter active in the heterologous host, followed by engineering into the MycoMar transposon. A similar process was applied to the mchS gene cluster. The engineered gene clusters were transferred and expressed in the heterologous hosts Myxococcus xanthus and Pseudomonas putida. We achieved the largest transposition yet reported for any system and suggest that delivery by transposon will become the method of choice for delivery of large transgenes, particularly not only for metabolic engineering but also for general transgenesis in prokaryotes and eukaryotes. PMID:18701643

  5. Identification of Nocobactin NA Biosynthetic Gene Clusters in Nocardia farcinica▿ §

    PubMed Central

    Hoshino, Yasutaka; Chiba, Kazuhiro; Ishino, Keiko; Fukai, Toshio; Igarashi, Yasuhiro; Yazawa, Katsukiyo; Mikami, Yuzuru; Ishikawa, Jun

    2011-01-01

    We identified the biosynthetic gene clusters of the siderophore nocobactin NA. The nbt clusters, which were discovered as genes highly homologous to the mycobactin biosynthesis genes by the genomic sequencing of Nocardia farcinica IFM 10152, consist of 10 genes separately located at two genomic regions. The gene organization of the nbt clusters and the predicted functions of the nbt genes, particularly the cyclization and epimerization domains, were in good agreement with the chemical structure of nocobactin NA. Disruptions of the nbtA and nbtE genes, respectively, reduced and abolished the productivity of nocobactin NA. The heterologous expression of the nbtS gene revealed that this gene encoded a salicylate synthase. These results indicate that the nbt clusters are responsible for the biosynthesis of nocobactin NA. We also found putative IdeR-binding sequences upstream of the nbtA, -G, -H, -S, and -T genes, whose expression was more than 10-fold higher in the low-iron condition than in the high-iron condition. These results suggest that nbt genes are regulated coordinately by IdeR protein in an iron-dependent manner. The ΔnbtE mutant was found to be impaired in cytotoxicity against J774A.1 cells, suggesting that nocobactin NA production is required for virulence of N. farcinica. PMID:21097631

  6. Characterization of the fumonisin B2 biosynthetic gene cluster in Aspergillus niger and A. awamori.

    USDA-ARS?s Scientific Manuscript database

    Aspergillus niger and A. awamori strains isolated from grapes cultivated in Mediterranean basin were examined for fumonisin B2 (FB2) production and presence/absence of sequences within the fumonisin biosynthetic gene (fum) cluster. Presence of 13 regions in the fum cluster was evaluated by PCR assay...

  7. Comparative genomic analysis of secondary metabolite biosynthetic gene clusters in 207 isolates of Fusarium

    USDA-ARS?s Scientific Manuscript database

    Fusarium species are known for their ability to produce secondary metabolites (SMs), including plant hormones, pigments, mycotoxins, and other compounds with potential agricultural, pharmaceutical, and biotechnological impact. Understanding the distribution of SM biosynthetic gene clusters across th...

  8. Clusters of antibiotic resistance genes enriched together stay together in swine agriculture

    SciTech Connect

    Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; Cole, James R.; Hashsham, Syed A.; Looft, Torey; Zhu, Yong -Guan; Tiedje, James M.

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of

  9. Clusters of antibiotic resistance genes enriched together stay together in swine agriculture

    DOE PAGES

    Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; ...

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance

  10. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

    PubMed

    Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if

  11. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    SciTech Connect

    Data Analysis and Visualization and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  12. Chordate Hox and ParaHox gene clusters differ dramatically in their repetitive element content.

    PubMed

    Osborne, Peter W; Ferrier, David E K

    2010-02-01

    The ParaHox and Hox gene clusters control aspects of animal anterior-posterior development and are related as paralogous evolutionary sisters. Despite this relationship, it is not clear if the clusters operate in similar ways, with similar constraints. To compare clusters, we examined the transposable-element (TE) content of amphioxus and mammalian ParaHox and Hox clusters. Chordate Hox clusters are known to be largely devoid of TEs, possibly due to gene regulation and constraints on clustering in these animals. Here, we describe several novel amphioxus TEs and show that the amphioxus ParaHox cluster is a hotspot for TE insertion. TE contents of mammalian ParaHox loci are at background levels, in stark contrast to chordate Hox clusters. This marks a significant difference between Hox and ParaHox clusters. The presence of so many potentially disruptive elements implies selection constrains these ParaHox clusters as they have not dispersed despite 500 My of evolution for each lineage.

  13. Visualization of mappings between the gene ontology and cluster trees

    NASA Astrophysics Data System (ADS)

    Jusufi, Ilir; Kerren, Andreas; Aleksakhin, Vladyslav; Schreiber, Falk

    2012-01-01

    Ontologies and hierarchical clustering are both important tools in biology and medicine to study high-throughput data such as transcriptomics and metabolomics data. Enrichment of ontology terms in the data is used to identify statistically overrepresented ontology terms, giving insight into relevant biological processes or functional modules. Hierarchical clustering is a standard method to analyze and visualize data to find relatively homogeneous clusters of experimental data points. Both methods support the analysis of the same data set, but are usually considered independently. However, often a combined view is desired: visualizing a large data set in the context of an ontology under consideration of a clustering of the data. This paper proposes a new visualization method for this task.

  14. Regulation of transcription of cell division genes in the Escherichia coli dcw cluster.

    PubMed

    Vicente, M; Gomez, M J; Ayala, J A

    1998-04-01

    The Escherichia coli dcw cluster contains cell division genes, such as the phylogenetically ubiquitous ftsZ, and genes involved in peptidoglycan synthesis. Transcription in the cluster proceeds in the same direction as the progress of the replication fork along the chromosome. Regulation is exerted at the transcriptional and post-transcriptional levels. The absence of transcriptional termination signals may, in principle, allow extension of the transcripts initiated at the up-stream promoter (mraZ1p) even to the furthest down-stream gene (envA). Complementation tests suggest that they extend into ftsW in the central part of the cluster. In addition, the cluster contains other promoters individually regulated by cis- and trans-acting signals. Dissociation of the expression of the ftsZ gene, located after ftsQ and A near the 3' end of the cluster, from its natural regulatory signals leads to an alteration in the physiology of cell division. The complexities observed in the regulation of gene expression in the cluster may then have an important biological role. Among them, LexA-binding SOS boxes have been found at the 5' end of the cluster, preceding promoters which direct the expression of ftsI (coding for PBP3, the penicillin-binding protein involved in septum formation). A gearbox promoter, ftsQ1p, forms part of the signals regulating the transcription of ftsQ, A and Z. It is an inversely growth-dependent mechanism driven by RNA polymerase containing sigma s, the factor involved in the expression of stationary phase-specific genes. Although the dcw cluster is conserved to a different extent in a variety of bacteria, the regulation of gene expression, the presence or absence of individual genes, and even the essentiality of some of them, show variations in the phylogenetic scale which may reflect adaptation to specific life cycles.

  15. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.

    PubMed

    Datta, Susmita; Datta, Somnath

    2006-08-31

    A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species. In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI). As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI). For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically

  16. United we stand: big roles for small RNA gene clusters.

    PubMed

    Felden, Brice; Paillard, Luc

    2017-02-01

    Prokaryotes and eukaryotes evolved relatively similar RNA-based molecular mechanisms to fight potentially deleterious nucleic acids coming from phages, transposons, or viruses. Short RNAs guide effector complexes toward their targets to be silenced or eliminated. These short immunity RNAs are transcribed from clustered loci. Unexpectedly and strikingly, bacterial and eukaryotic immunity RNA clusters share substantial functional and mechanistic resemblances in fighting nucleic acid intruders.

  17. A stationary wavelet entropy-based clustering approach accurately predicts gene expression.

    PubMed

    Nguyen, Nha; Vo, An; Choi, Inchan; Won, Kyoung-Jae

    2015-03-01

    Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation.

  18. Characterization of the ars gene cluster from extremely arsenic-resistant Microbacterium sp. strain A33.

    PubMed

    Achour-Rokbani, Asma; Cordi, Audrey; Poupin, Pascal; Bauda, Pascale; Billard, Patrick

    2010-02-01

    The arsenic resistance gene cluster of Microbacterium sp. A33 contains a novel pair of genes (arsTX) encoding a thioredoxin system that are cotranscribed with an unusual arsRC2 fusion gene, ACR3, and arsC1 in an operon divergent from arsC3. The whole ars gene cluster is required to complement an Escherichia coli ars mutant. ArsRC2 negatively regulates the expression of the pentacistronic operon. ArsC1 and ArsC3 are related to thioredoxin-dependent arsenate reductases; however, ArsC3 lacks the two distal catalytic cysteine residues of this class of enzymes.

  19. Identification of a 12-gene fusaric acid biosynthetic gene cluster in Fusarium species through comparative and functional genomics

    USDA-ARS?s Scientific Manuscript database

    In fungi, genes involved in biosynthesis of a secondary metabolite (SM) are often located adjacent to one another in the genome and are coordinately regulated. These SM biosynthetic gene clusters typically encode enzymes, one or more transcription factors, and a transport protein. Fusaric acid is a ...

  20. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    PubMed

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  1. Clustered brachiopod Hox genes are not expressed collinearly and are associated with lophotrochozoan novelties.

    PubMed

    Schiemann, Sabrina M; Martín-Durán, José M; Børve, Aina; Vellutini, Bruno C; Passamaneck, Yale J; Hejnol, Andreas

    2017-02-22

    Temporal collinearity is often considered the main force preserving Hox gene clusters in animal genomes. Studies that combine genomic and gene expression data are scarce, however, particularly in invertebrates like the Lophotrochozoa. As a result, the temporal collinearity hypothesis is currently built on poorly supported foundations. Here we characterize the complement, cluster, and expression of Hox genes in two brachiopod species, Terebratalia transversa and Novocrania anomalaT. transversa has a split cluster with 10 genes (lab, pb, Hox3, Dfd, Scr, Lox5, Antp, Lox4, Post2, and Post1), whereas N. anomala has 9 genes (apparently missing Post1). Our in situ hybridization, real-time quantitative PCR, and stage-specific transcriptomic analyses show that brachiopod Hox genes are neither strictly temporally nor spatially collinear; only pb (in T. transversa), Hox3 (in both brachiopods), and Dfd (in both brachiopods) show staggered mesodermal expression. Thus, our findings support the idea that temporal collinearity might contribute to keeping Hox genes clustered. Remarkably, expression of the Hox genes in both brachiopod species demonstrates cooption of Hox genes in the chaetae and shell fields, two major lophotrochozoan morphological novelties. The shared and specific expression of Hox genes, together with Arx, Zic, and Notch pathway components in chaetae and shell fields in brachiopods, mollusks, and annelids provide molecular evidence supporting the conservation of the molecular basis for these lophotrochozoan hallmarks.

  2. Unusual Gene Order and Organization of the Sea Urchin HoxCluster

    SciTech Connect

    Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew; Rowen,Lee; Nesbitt, Ryan; Bloom, Scott; Rast, Jonathan P.; Berney, Kevin; Arenas-Mena, Cesar; Martinez, Pedro; Davidson, Eric H.; Peterson, KevinJ.; Hood, Leroy

    2005-05-10

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is : 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  3. Unusual Gene Order and Organization of the Sea Urchin Hox Cluster

    SciTech Connect

    Cameron, R A; Rowen, L; Nesbitt, R; Bloom, S; Rast, J P; Berney, K; Arenas-Mena, C; Martinez, P; Lucas, S; Richardson, P M; Davidson, E H; Peterson, K J; Hood, L

    2005-10-11

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3 gene is Hox5. (The gene order is : 5-Hox1, 2, 3, 11/13c, 11/13b, 11/13a, 9/10, 8, 7, 6, 5 - 3). The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  4. Clustered brachiopod Hox genes are not expressed collinearly and are associated with lophotrochozoan novelties

    PubMed Central

    Schiemann, Sabrina M.; Martín-Durán, José M.; Børve, Aina; Passamaneck, Yale J.

    2017-01-01

    Temporal collinearity is often considered the main force preserving Hox gene clusters in animal genomes. Studies that combine genomic and gene expression data are scarce, however, particularly in invertebrates like the Lophotrochozoa. As a result, the temporal collinearity hypothesis is currently built on poorly supported foundations. Here we characterize the complement, cluster, and expression of Hox genes in two brachiopod species, Terebratalia transversa and Novocrania anomala. T. transversa has a split cluster with 10 genes (lab, pb, Hox3, Dfd, Scr, Lox5, Antp, Lox4, Post2, and Post1), whereas N. anomala has 9 genes (apparently missing Post1). Our in situ hybridization, real-time quantitative PCR, and stage-specific transcriptomic analyses show that brachiopod Hox genes are neither strictly temporally nor spatially collinear; only pb (in T. transversa), Hox3 (in both brachiopods), and Dfd (in both brachiopods) show staggered mesodermal expression. Thus, our findings support the idea that temporal collinearity might contribute to keeping Hox genes clustered. Remarkably, expression of the Hox genes in both brachiopod species demonstrates cooption of Hox genes in the chaetae and shell fields, two major lophotrochozoan morphological novelties. The shared and specific expression of Hox genes, together with Arx, Zic, and Notch pathway components in chaetae and shell fields in brachiopods, mollusks, and annelids provide molecular evidence supporting the conservation of the molecular basis for these lophotrochozoan hallmarks. PMID:28228521

  5. The naphthalene catabolic (nag) genes of Polaromonas naphthalenivorans CJ2: Evolutionary implications for two gene clusters and novel regulatory control

    SciTech Connect

    Jeon, C.O.; Park, M.; Ro, H.S.; Park, W.; Madsen, E.L.

    2006-02-15

    Polaromonas naphthalenivorans CJ2, found to be responsible for the degradation of naphthalene in situ at a coal tar waste-contaminated site, is able to grow on mineral salts agar media with naphthalene as the sole carbon source. Beginning from a 484-bp nagAc-like region, we used a genome walking strategy to sequence genes encoding the entire naphthalene degradation pathway and additional flanking regions. We found that the naphthalene catabolic genes in P. naphthalenivorans CJ2 were divided into one large and one small gene cluster, separated by an unknown distance. The large gene cluster is bounded by a LysR-type regulator (nagR). The small cluster is bounded by a MarR-type regulator (nagR2). The catabolic genes of P. naphthalenivorans CJ2 were homologous to many of those of Ralstonia U2, which uses the gentisate pathway to convert naphthalene to central metabolites. However, three open reading frames (nagY, nagM, and nagN), present in Ralstonia U2, were absent. Also, P. naphthalenivorans carries two copies of gentisate dioxygenase (nagI) with 77.4% DNA sequence identity to one another and 82% amino acid identity to their homologue in Ralstonia sp. strain U2. Investigation of the operons using reverse transcription PCR showed that each cluster was controlled independently by its respective promoter. Insertional inactivation and lacZ reporter assays showed that nagR2 is a negative regulator and that expression of the small cluster is not induced by naphthalene, salicylate, or gentisate. Association of two putative Azoarcus-related transposases with the large cluster and one Azoarcus-related putative salicylate 5-hydroxylase gene (ORF2) in the small cluster suggests that mobile genetic elements were likely involved in creating the novel arrangement of catabolic and regulatory genes in P. naphthalenivorans.

  6. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    PubMed

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  7. Engineering Streptomyces coelicolor for heterologous expression of secondary metabolite gene clusters

    PubMed Central

    Gomez‐Escribano, Juan Pablo; Bibb, Mervyn J.

    2011-01-01

    Summary We have constructed derivatives of Streptomyces coelicolor M145 as hosts for the heterologous expression of secondary metabolite gene clusters. To remove potentially competitive sinks of carbon and nitrogen, and to provide a host devoid of antibiotic activity, we deleted four endogenous secondary metabolite gene clusters from S. coelicolor M145 – those for actinorhodin, prodiginine, CPK and CDA biosynthesis. We then introduced point mutations into rpoB and rpsL to pleiotropically increase the level of secondary metabolite production. Introduction of the native actinorhodin gene cluster and of gene clusters for the heterologous production of chloramphenicol and congocidine revealed dramatic increases in antibiotic production compared with the parental strain. In addition to lacking antibacterial activity, the engineered strains possess relatively simple extracellular metabolite profiles. When combined with liquid chromatography and mass spectrometry, we believe that these genetically engineered strains will markedly facilitate the discovery of new compounds by heterologous expression of cloned gene clusters, particularly the numerous cryptic secondary metabolic gene clusters that are prevalent within actinomycete genome sequences. PMID:21342466

  8. Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni

    PubMed Central

    Hoegg, Simone; Boore, Jeffrey L; Kuehl, Jennifer V; Meyer, Axel

    2007-01-01

    Background Teleost fish have seven paralogous clusters of Hox genes stemming from two complete genome duplications early in vertebrate evolution, and an additional genome duplication during the evolution of ray-finned fish, followed by the secondary loss of one cluster. Gene duplications on the one hand, and the evolution of regulatory sequences on the other, are thought to be among the most important mechanisms for the evolution of new gene functions. Cichlid fish, the largest family of vertebrates with about 2500 species, are famous examples of speciation and morphological diversity. Since this diversity could be based on regulatory changes, we chose to study the coding as well as putative regulatory regions of their Hox clusters within a comparative genomic framework. Results We sequenced and characterized all seven Hox clusters of Astatotilapia burtoni, a haplochromine cichlid fish. Comparative analyses with data from other teleost fish such as zebrafish, two species of pufferfish, stickleback and medaka were performed. We traced losses of genes and microRNAs of Hox clusters, the medaka lineage seems to have lost more microRNAs than the other fish lineages. We found that each teleost genome studied so far has a unique set of Hox genes. The hoxb7a gene was lost independently several times during teleost evolution, the most recent event being within the radiation of East African cichlid fish. The conserved non-coding sequences (CNS) encompass a surprisingly large part of the clusters, especially in the HoxAa, HoxCa, and HoxDa clusters. Across all clusters, we observe a trend towards an increased content of CNS towards the anterior end. Conclusion The gene content of Hox clusters in teleost fishes is more variable than expected, with each species studied so far having a different set. Although the highest loss rate of Hox genes occurred immediately after whole genome duplications, our analyses showed that gene loss continued and is still ongoing in all teleost

  9. Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters.

    PubMed

    Zhang, Huixian; Ravi, Vydianathan; Tay, Boon-Hui; Tohari, Sumanty; Pillai, Nisha E; Prasad, Aravind; Lin, Qiang; Brenner, Sydney; Venkatesh, Byrappa

    2017-08-22

    ParaHox genes (Gsx, Pdx, and Cdx) are an ancient family of developmental genes closely related to the Hox genes. They play critical roles in the patterning of brain and gut. The basal chordate, amphioxus, contains a single ParaHox cluster comprising one member of each family, whereas nonteleost jawed vertebrates contain four ParaHox genomic loci with six or seven ParaHox genes. Teleosts, which have experienced an additional whole-genome duplication, contain six ParaHox genomic loci with six ParaHox genes. Jawless vertebrates, represented by lampreys and hagfish, are the most ancient group of vertebrates and are crucial for understanding the origin and evolution of vertebrate gene families. We have previously shown that lampreys contain six Hox gene loci. Here we report that lampreys contain only two ParaHox gene clusters (designated as α- and β-clusters) bearing five ParaHox genes (Gsxα, Pdxα, Cdxα, Gsxβ, and Cdxβ). The order and orientation of the three genes in the α-cluster are identical to that of the single cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homologs in jawed vertebrates, which are expressed mainly in the brain. The lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate Pdx genes, indicating that the pancreatic expression of Pdx was acquired before the divergence of jawless and jawed vertebrate lineages. It is likely that the lamprey Pdxα plays a crucial role in pancreas specification and insulin production similar to the Pdx of jawed vertebrates.

  10. Cytokine Gene Polymorphisms Associated With Symptom Clusters in Oncology Patients Undergoing Radiation Therapy.

    PubMed

    Miaskowski, Christine; Conley, Yvette P; Mastick, Judy; Paul, Steven M; Cooper, Bruce A; Levine, Jon D; Knisely, Mitchell; Kober, Kord M

    2017-09-01

    Most of the reviews on the biological basis for symptom clusters suggest that inflammatory processes are involved in the development and maintenance of the symptom clusters. However, no studies have evaluated for associations between genetic polymorphisms and common symptom clusters (e.g., mood disturbance, sickness behavior). Examine the associations between cytokine gene polymorphisms and the severity of three distinct symptom clusters (i.e., mood-cognitive, sickness-behavior, treatment-related) in a sample of patients with breast and prostate cancer (n = 157) at the completion of radiation therapy. Symptom severity was assessed using the Memorial Symptom Assessment Scale. Symptom clusters were created using exploratory factor analysis. The associations between cytokine gene polymorphisms and the symptom cluster severity scores were evaluated using regression analyses. Polymorphisms in C-X-C motif chemokine ligand 8 (CXCL8), interleukin (IL13), and nuclear factor kappa beta 2 (NFKB2) were associated with severity scores for the mood-cognitive symptom cluster. In addition to interferon gamma (IFNG1), the same polymorphism in NFKB2 (i.e., rs1056890) that was associated with the mood-cognitive symptom cluster score was associated with the sickness-behavior symptom cluster. Polymorphisms in interleukin 1 receptor 1 (IL1R1), IL6, and NFKB1 were associated with severity factor scores for the treatment-related symptom cluster. Our findings support the hypotheses that symptoms that cluster together have a common underlying mechanism and the most common symptom clusters in oncology patients are associated polymorphisms in genes involved in a variety of inflammatory processes. Copyright © 2017 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  11. Orthologous Gene Clusters and Taxon Signature Genes for Viruses of Prokaryotes

    PubMed Central

    Kristensen, David M.; Waller, Alison S.; Yamada, Takuji; Bork, Peer; Mushegian, Arcady R.

    2013-01-01

    Viruses are the most abundant biological entities on earth and encompass a vast amount of genetic diversity. The recent rapid increase in the number of sequenced viral genomes has created unprecedented opportunities for gaining new insight into the structure and evolution of the virosphere. Here, we present an update of the phage orthologous groups (POGs), a collection of 4,542 clusters of orthologous genes from bacteriophages that now also includes viruses infecting archaea and encompasses more than 1,000 distinct virus genomes. Analysis of this expanded data set shows that the number of POGs keeps growing without saturation and that a substantial majority of the POGs remain specific to viruses, lacking homologues in prokaryotic cells, outside known proviruses. Thus, the great majority of virus genes apparently remains to be discovered. A complementary observation is that numerous viral genomes remain poorly, if at all, covered by POGs. The genome coverage by POGs is expected to increase as more genomes are sequenced. Taxon-specific, single-copy signature genes that are not observed in prokaryotic genomes outside detected proviruses were identified for two-thirds of the 57 taxa (those with genomes available from at least 3 distinct viruses), with half of these present in all members of the respective taxon. These signatures can be used to specifically identify the presence and quantify the abundance of viruses from particular taxa in metagenomic samples and thus gain new insights into the ecology and evolution of viruses in relation to their hosts. PMID:23222723

  12. Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species

    PubMed Central

    Glenn, Anthony E.; Davis, C. Britton; Gao, Minglu; Gold, Scott E.; Mitchell, Trevor R.; Proctor, Robert H.; Stewart, Jane E.; Snook, Maurice E.

    2016-01-01

    Microbes encounter a broad spectrum of antimicrobial compounds in their environments and often possess metabolic strategies to detoxify such xenobiotics. We have previously shown that Fusarium verticillioides, a fungal pathogen of maize known for its production of fumonisin mycotoxins, possesses two unlinked loci, FDB1 and FDB2, necessary for detoxification of antimicrobial compounds produced by maize, including the γ-lactam 2-benzoxazolinone (BOA). In support of these earlier studies, microarray analysis of F. verticillioides exposed to BOA identified the induction of multiple genes at FDB1 and FDB2, indicating the loci consist of gene clusters. One of the FDB1 cluster genes encoded a protein having domain homology to the metallo-β-lactamase (MBL) superfamily. Deletion of this gene (MBL1) rendered F. verticillioides incapable of metabolizing BOA and thus unable to grow on BOA-amended media. Deletion of other FDB1 cluster genes, in particular AMD1 and DLH1, did not affect BOA degradation. Phylogenetic analyses and topology testing of the FDB1 and FDB2 cluster genes suggested two horizontal transfer events among fungi, one being transfer of FDB1 from Fusarium to Colletotrichum, and the second being transfer of the FDB2 cluster from Fusarium to Aspergillus. Together, the results suggest that plant-derived xenobiotics have exerted evolutionary pressure on these fungi, leading to horizontal transfer of genes that enhance fitness or virulence. PMID:26808652

  13. Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species.

    PubMed

    Glenn, Anthony E; Davis, C Britton; Gao, Minglu; Gold, Scott E; Mitchell, Trevor R; Proctor, Robert H; Stewart, Jane E; Snook, Maurice E

    2016-01-01

    Microbes encounter a broad spectrum of antimicrobial compounds in their environments and often possess metabolic strategies to detoxify such xenobiotics. We have previously shown that Fusarium verticillioides, a fungal pathogen of maize known for its production of fumonisin mycotoxins, possesses two unlinked loci, FDB1 and FDB2, necessary for detoxification of antimicrobial compounds produced by maize, including the γ-lactam 2-benzoxazolinone (BOA). In support of these earlier studies, microarray analysis of F. verticillioides exposed to BOA identified the induction of multiple genes at FDB1 and FDB2, indicating the loci consist of gene clusters. One of the FDB1 cluster genes encoded a protein having domain homology to the metallo-β-lactamase (MBL) superfamily. Deletion of this gene (MBL1) rendered F. verticillioides incapable of metabolizing BOA and thus unable to grow on BOA-amended media. Deletion of other FDB1 cluster genes, in particular AMD1 and DLH1, did not affect BOA degradation. Phylogenetic analyses and topology testing of the FDB1 and FDB2 cluster genes suggested two horizontal transfer events among fungi, one being transfer of FDB1 from Fusarium to Colletotrichum, and the second being transfer of the FDB2 cluster from Fusarium to Aspergillus. Together, the results suggest that plant-derived xenobiotics have exerted evolutionary pressure on these fungi, leading to horizontal transfer of genes that enhance fitness or virulence.

  14. β-globin gene cluster haplotypes in ethnic minority populations of southwest China

    PubMed Central

    Sun, Hao; Liu, Hongxian; Huang, Kai; Lin, Keqin; Huang, Xiaoqin; Chu, Jiayou; Ma, Shaohui; Yang, Zhaoqing

    2017-01-01

    The genetic diversity and relationships among ethnic minority populations of southwest China were investigated using seven polymorphic restriction enzyme sites in the β-globin gene cluster. The haplotypes of 1392 chromosomes from ten ethnic populations living in southwest China were determined. Linkage equilibrium and recombination hotspot were found between the 5′ sites and 3′ sites of the β-globin gene cluster. 5′ haplotypes 2 (+−−−), 6 (−++−+), 9 (−++++) and 3′ haplotype FW3 (−+) were the predominant haplotypes. Notably, haplotype 9 frequency was significantly high in the southwest populations, indicating their difference with other Chinese. The interpopulation differentiation of southwest Chinese minority populations is less than those in populations of northern China and other continents. Phylogenetic analysis shows that populations sharing same ethnic origin or language clustered to each other, indicating current β-globin cluster diversity in the Chinese populations reflects their ethnic origin and linguistic affiliations to a great extent. This study characterizes β-globin gene cluster haplotypes in southwest Chinese minorities for the first time, and reveals the genetic variability and affinity of these populations using β-globin cluster haplotype frequencies. The results suggest that ethnic origin plays an important role in shaping variations of the β-globin gene cluster in the southwestern ethnic populations of China. PMID:28205625

  15. Molecular analysis of the cercosporin biosynthetic gene cluster in Cercospora nicotianae.

    PubMed

    Chen, Huiqin; Lee, Miin-Huey; Daub, Margret E; Chung, Kuang-Ren

    2007-05-01

    We describe a core gene cluster, comprised of eight genes (designated CTB1-8), and associated with cercosporin toxin production in Cercospora nicotianae. Sequence analysis identified 10 putative open reading frames (ORFs) flanking the previously characterized CTB1 and CTB3 genes that encode, respectively, the polyketide synthase and a dual methyltransferase/monooxygenase required for cercosporin production. Expression of eight of the genes was co-ordinately induced under cercosporin-producing conditions and was regulated by the Zn(II)Cys(6) transcriptional activator, CTB8. Expression of the genes, affected by nitrogen and carbon sources and pH, was also controlled by another transcription activator, CRG1, previously shown to regulate cercosporin production and resistance. Disruption of the CTB2 gene encoding a methyltransferase or the CTB8 gene yielded mutants that were completely defective in cercosporin production and inhibitory expression of the other CTB cluster genes. Similar 'feedback' transcriptional inhibition was observed when the CTB1, or CTB3 but not CTB4 gene was inactivated. Expression of four ORFs located on the two distal ends of the cluster did not correlate with cercosporin biosynthesis and did not show regulation by CTB8, suggesting that the biosynthetic cluster was limited to CTB1-8. A biosynthetic pathway and a regulatory network leading to cercosporin formation are proposed.

  16. Revealing gene clusters associated with the development of cholangiocarcinoma, based on a time series analysis.

    PubMed

    Wu, Jianyu; Xiao, Zhifu; Zhao, Xiulei; Wu, Xiangsong

    2015-05-01

    Cholangiocarcinoma (CC) is a rapidly lethal malignancy and currently is considered to be incurable. Biomarkers related to the development of CC remain unclear. The present study aimed to identify differentially expressed genes (DEGs) between normal tissue and intrahepatic CC, as well as specific gene expression patterns that changed together with the development of CC. By using a two‑way analysis of variance test, the biomarkers that could distinguish between normal tissue and intrahepatic CC dissected from different days were identified. A k‑means cluster method was used to identify gene clusters associated with the development of CC according to their changing expression pattern. Functional enrichment analysis was used to infer the function of each of the gene sets. A time series analysis was constructed to reveal gene signatures that were associated with the development of CC based on gene expression profile changes. Genes related to CC were shown to be involved in 'mitochondrion' and 'focal adhesion'. Three interesting gene groups were identified by the k‑means cluster method. Gene clusters with a unique expression pattern are related with the development of CC. The data of this study will facilitate novel discoveries regarding the genetic study of CC by further work.

  17. The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

    PubMed Central

    Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.

    2013-01-01

    The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role

  18. The genome of tolypocladium inflatum: evolution, organization, and expression of the cyclosporin biosynthetic gene cluster.

    PubMed

    Bushley, Kathryn E; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S; Nonogaki, Mariko; Boyd, Alexander E; Owensby, C Alisha; Knaus, Brian J; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L; Spatafora, Joseph W

    2013-06-01

    The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role

  19. Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts.

    PubMed

    Yang, Jianji; Cohen, Aaron M; Hersh, William

    2007-10-11

    Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. The task of finding information on sets of genes is common for genomic researchers, and PubMed is still the first choice because the most recent and original information can only be found in the unstructured, free text biomedical literature. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. We built and evaluated a query-based automatic summarizer of information on mouse genes studied in microarray experiments. The system clusters a set of genes by MeSH, GO and free text features and presents summaries for each gene by ranked sentences extracted from MEDLINE abstracts. Evaluation showed that the system seems to provide meaningful clusters and informative sentences are ranked higher by the algorithm.

  20. Automatic Summarization of Mouse Gene Information by Clustering and Sentence Extraction from MEDLINE Abstracts

    PubMed Central

    Yang, Jianji; Cohen, Aaron M.; Hersh, William

    2007-01-01

    Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. The task of finding information on sets of genes is common for genomic researchers, and PubMed is still the first choice because the most recent and original information can only be found in the unstructured, free text biomedical literature. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. We built and evaluated a query-based automatic summarizer of information on mouse genes studied in microarray experiments. The system clusters a set of genes by MeSH, GO and free text features and presents summaries for each gene by ranked sentences extracted from MEDLINE abstracts. Evaluation showed that the system seems to provide meaningful clusters and informative sentences are ranked higher by the algorithm. PMID:18693953

  1. Beta-lactam antibiotic biosynthetic genes have been conserved in clusters in prokaryotes and eukaryotes.

    PubMed Central

    Smith, D J; Burnham, M K; Bull, J H; Hodgson, J E; Ward, J M; Browne, P; Brown, J; Barton, B; Earl, A J; Turner, G

    1990-01-01

    A cosmid clone containing closely linked beta-lactam antibiotic biosynthetic genes was isolated from a gene library of Flavobacterium sp. SC 12,154. The location within the cluster of the DNA thought to contain the gene for delta-(L-alpha-aminoadipyl)-L-cysteinyl-D-valine synthetase (ACVS), the first step in the beta-lactam antibiotic biosynthetic pathway, was identified by a novel method. This DNA facilitated the isolation, by cross-hybridization, of the corresponding DNA from Streptomyces clavuligerus ATCC 27064, Penicillium chrysogenum Oli13 and Aspergillus nidulans R153. Evidence was obtained which confirmed that the cross-hybridizing sequences contained the ACVS gene. In each case the ACVS gene was found to be closely linked to other beta-lactam biosynthetic genes and constituted part of a gene cluster. Images Fig. 1. Fig. 2. Fig. 4. Fig. 5. Fig. 6. Fig. 7. PMID:2107074

  2. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

    PubMed Central

    Cimermancic, Peter; Medema, Marnix H.; Claesen, Jan; Kurita, Kenji; Wieland Brown, Laura C.; Mavrommatis, Konstantinos; Pati, Amrita; Godfrey, Paul A.; Koehrsen, Michael; Clardy, Jon; Birren, Bruce W.; Takano, Eriko; Sali, Andrej; Linington, Roger G.; Fischbach, Michael A.

    2014-01-01

    Summary Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the predicted BGCs revealed large gene cluster families, the vast majority uncharacterized. We experimentally characterized the most prominent family, consisting of two subfamilies of hundreds of BGCs distributed throughout the Proteobacteria; their products are aryl polyenes, lipids with an aryl head group conjugated to a polyene tail. We identified a distant relationship to a third subfamily of aryl polyene BGCs, and together the three subfamilies represent the largest known family of biosynthetic gene clusters, with more than 1,000 members. Although these clusters are widely divergent in sequence, their small molecule products are remarkably conserved, indicating for the first time the important roles these compounds play in Gram-negative cell biology. PMID:25036635

  3. Hox genes of the Japanese eel Anguilla japonica and Hox cluster evolution in teleosts.

    PubMed

    Guo, Baocheng; Gan, Xiaoni; He, Shunping

    2010-03-15

    Compared with other diploid teleosts (2n=48), anguilloid fish have a specialized karyotype (2n=38) and remarkable morphological variation, and represent one basal group species of teleosts. To investigate the Hox gene/cluster inventory in basal teleosts, a PCR-based survey of Hox genes in the Japanese eel (Anguilla japonica) was conducted with both gene-specific and homeobox-targeted degenerate primers. Our data provide evidence that at least 34 distinct Hox genes exist in the Japanese eel genome and that they represent eight Hox clusters. Duplication of Hox genes in the Japanese eel appears to be the result of the fish-specific genome duplication (FSGD) event. The Japanese eel shared the FSGD event with other teleosts such as zebrafish and pufferfish. A member of Hox paralog group one (HoxA1b) was preserved in the Japanese eel but was lost in other teleosts. Available Hox data revealed that the Hox cluster evolved distinctly in different teleost lineages. All duplicated Hox clusters were retained after the FSGD event in basal teleosts like in the Japanese eel, whereas crown teleosts lost one cluster (HoxCb or HoxDb). Based on current teleostean phylogeny, the HoxDb cluster was lost independently in the teleost lineages Otocephala and Euteleostei.

  4. Cloning and engineering of the cinnamycin biosynthetic gene cluster from Streptomyces cinnamoneus cinnamoneus DSM 40005

    PubMed Central

    Widdick, D. A.; Dodd, H. M.; Barraille, P.; White, J.; Stein, T. H.; Chater, K. F.; Gasson, M. J.; Bibb, M. J.

    2003-01-01

    Lantibiotics are ribosomally synthesized oligopeptide antibiotics that contain lanthionine bridges derived by the posttranslational modification of amino acid residues. Here, we describe the cinnamycin biosynthetic gene cluster (cin) from Streptomyces cinnamoneus cinnamoneus DSM 40005, the first, to our knowledge, lantibiotic gene cluster from a high G+C bacterium to be cloned and sequenced. The cin cluster contains many genes not found in lantibiotic clusters from low G+C Gram-positive bacteria, including a Streptomyces antibiotic regulatory protein regulatory gene, and lacks others found in such clusters, such as a LanT-type transporter and a LanP-type protease. Transfer of the cin cluster to Streptomyces lividans resulted in heterologous production of cinnamycin. Furthermore, modification of the cinnamycin structural gene (cinA) led to production of two naturally occurring lantibiotics, duramycin and duramycin B, closely resembling cinnamycin, whereas attempts to make a more widely diverged derivative, duramycin C, failed to generate biologically active material. These results provide a basis for future attempts to construct extensive libraries of cinnamycin variants. PMID:12642677

  5. Selfish Operons: Horizontal Transfer May Drive the Evolution of Gene Clusters

    PubMed Central

    Lawrence, J. G.; Roth, J. R.

    1996-01-01

    A model is presented whereby the formation of gene clusters in bacteria is mediated by transfer of DNA within and among taxa. Bacterial operons are typically composed of genes whose products contribute to a single function. If this function is subject to weak selection or to long periods with no selection, the contributing genes may accumulate mutations and be lost by genetic drift. From a cell's perspective, once several genes are lost, the function can be restored only if all missing genes were acquired simultaneously by lateral transfer. The probability of transfer of multiple genes increases when genes are physically proximate. From a gene's perspective, horizontal transfer provides a way to escape evolutionary loss by allowing colonization of organisms lacking the encoded functions. Since organisms bearing clustered genes are more likely to act as successful donors, clustered genes would spread among bacterial genomes. The physical proximity of genes may be considered a selfish property of the operon since it affects the probability of successful horizontal transfer but may provide no physiological benefit to the host. This process predicts a mosaic structure of modern genomes in which ancestral chromosomal material is interspersed with novel, horizontally transferred operons providing peripheral metabolic functions. PMID:8844169

  6. Shared Gene Structures and Clusters of Mutually Exclusive Spliced Exons within the Metazoan Muscle Myosin Heavy Chain Genes

    PubMed Central

    Kollmar, Martin; Hatje, Klas

    2014-01-01

    Multicellular animals possess two to three different types of muscle tissues. Striated muscles have considerable ultrastructural similarity and contain a core set of proteins including the muscle myosin heavy chain (Mhc) protein. The ATPase activity of this myosin motor protein largely dictates muscle performance at the molecular level. Two different solutions to adjusting myosin properties to different muscle subtypes have been identified so far: Vertebrates and nematodes contain many independent differentially expressed Mhc genes while arthropods have single Mhc genes with clusters of mutually exclusive spliced exons (MXEs). The availability of hundreds of metazoan genomes now allowed us to study whether the ancient bilateria already contained MXEs, how MXE complexity subsequently evolved, and whether additional scenarios to control contractile properties in different muscles could be proposed, By reconstructing the Mhc genes from 116 metazoans we showed that all intron positions within the motor domain coding regions are conserved in all bilateria analysed. The last common ancestor of the bilateria already contained a cluster of MXEs coding for part of the loop-2 actin-binding sequence. Subsequently the protostomes and later the arthropods gained many further clusters while MXEs got completely lost independently in several branches (vertebrates and nematodes) and species (for example the annelid Helobdella robusta and the salmon louse Lepeophtheirus salmonis). Several bilateria have been found to encode multiple Mhc genes that might all or in part contain clusters of MXEs. Notable examples are a cluster of six tandemly arrayed Mhc genes, of which two contain MXEs, in the owl limpet Lottia gigantea and four Mhc genes with three encoding MXEs in the predatory mite Metaseiulus occidentalis. Our analysis showed that similar solutions to provide different myosin isoforms (multiple genes or clusters of MXEs or both) have independently been developed several times

  7. Genomics-driven discovery of the pneumocandin biosynthetic gene cluster in the fungus Glarea lozoyensis

    PubMed Central

    2013-01-01

    Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the Δglnrps4 and Δglpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom. PMID:23688303

  8. Phylogenomics of the benzoxazinoid biosynthetic pathway of Poaceae: gene duplications and origin of the Bx cluster

    PubMed Central

    2012-01-01

    Background The benzoxazinoids 2,4-dihydroxy-1,4-benzoxazin-3-one (DIBOA) and 2,4-dihydroxy-7- methoxy-1,4-benzoxazin-3-one (DIMBOA), are key defense compounds present in major agricultural crops such as maize and wheat. Their biosynthesis involves nine enzymes thought to form a linear pathway leading to the storage of DI(M)BOA as glucoside conjugates. Seven of the genes (Bx1-Bx6 and Bx8) form a cluster at the tip of the short arm of maize chromosome 4 that includes four P450 genes (Bx2-5) belonging to the same CYP71C subfamily. The origin of this cluster is unknown. Results We show that the pathway appeared following several duplications of the TSA gene (α-subunit of tryptophan synthase) and of a Bx2-like ancestral CYP71C gene and the recruitment of Bx8 before the radiation of Poaceae. The origins of Bx6 and Bx7 remain unclear. We demonstrate that the Bx2-like CYP71C ancestor was not committed to the benzoxazinoid pathway and that after duplications the Bx2-Bx5 genes were under positive selection on a few sites and underwent functional divergence, leading to the current specific biochemical properties of the enzymes. The absence of synteny between available Poaceae genomes involving the Bx gene regions is in contrast with the conserved synteny in the TSA gene region. Conclusions These results demonstrate that rearrangements following duplications of an IGL/TSA gene and of a CYP71C gene probably resulted in the clustering of the new copies (Bx1 and Bx2) at the tip of a chromosome in an ancestor of grasses. Clustering favored cosegregation and tip chromosomal location favored gene rearrangements that allowed the further recruitment of genes to the pathway. These events, a founding event and elongation events, may have been the key to the subsequent evolution of the benzoxazinoid biosynthetic cluster. PMID:22577841

  9. Phylogenomics of the benzoxazinoid biosynthetic pathway of Poaceae: gene duplications and origin of the Bx cluster.

    PubMed

    Dutartre, Leslie; Hilliou, Frédérique; Feyereisen, René

    2012-05-11

    The benzoxazinoids 2,4-dihydroxy-1,4-benzoxazin-3-one (DIBOA) and 2,4-dihydroxy-7- methoxy-1,4-benzoxazin-3-one (DIMBOA), are key defense compounds present in major agricultural crops such as maize and wheat. Their biosynthesis involves nine enzymes thought to form a linear pathway leading to the storage of DI(M)BOA as glucoside conjugates. Seven of the genes (Bx1-Bx6 and Bx8) form a cluster at the tip of the short arm of maize chromosome 4 that includes four P450 genes (Bx2-5) belonging to the same CYP71C subfamily. The origin of this cluster is unknown. We show that the pathway appeared following several duplications of the TSA gene (α-subunit of tryptophan synthase) and of a Bx2-like ancestral CYP71C gene and the recruitment of Bx8 before the radiation of Poaceae. The origins of Bx6 and Bx7 remain unclear. We demonstrate that the Bx2-like CYP71C ancestor was not committed to the benzoxazinoid pathway and that after duplications the Bx2-Bx5 genes were under positive selection on a few sites and underwent functional divergence, leading to the current specific biochemical properties of the enzymes. The absence of synteny between available Poaceae genomes involving the Bx gene regions is in contrast with the conserved synteny in the TSA gene region. These results demonstrate that rearrangements following duplications of an IGL/TSA gene and of a CYP71C gene probably resulted in the clustering of the new copies (Bx1 and Bx2) at the tip of a chromosome in an ancestor of grasses. Clustering favored cosegregation and tip chromosomal location favored gene rearrangements that allowed the further recruitment of genes to the pathway. These events, a founding event and elongation events, may have been the key to the subsequent evolution of the benzoxazinoid biosynthetic cluster.

  10. Degeneration of aflatoxin gene cluster in Aspergillus flavus from Africa and North America

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus is the primary causal agent of food and feed contamination with the toxic fungal metabolites aflatoxins. Aflatoxin-producing potential of A. flavus is known to vary among isolates. The genes involved in aflatoxin biosynthesis are clustered together and the order of genes within th...

  11. Variation in the fumonisin biosynthetic gene cluster in fumonisin-producing and nonproducing black aspergilli

    USDA-ARS?s Scientific Manuscript database

    The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack...

  12. A block mixture model to map eQTLs for gene clustering and networking.

    PubMed

    Wang, Ningtao; Gosik, Kirk; Li, Runze; Lindsay, Bruce; Wu, Rongling

    2016-02-19

    To study how genes function in a cellular and physiological process, a general procedure is to classify gene expression profiles into categories based on their similarity and reconstruct a regulatory network for functional elements. However, this procedure has not been implemented with the genetic mechanisms that underlie the organization of gene clusters and networks, despite much effort made to map expression quantitative trait loci (eQTLs) that affect the expression of individual genes. Here we address this issue by developing a computational approach that integrates gene clustering and network reconstruction with genetic mapping into a unifying framework. The approach can not only identify specific eQTLs that control how genes are clustered and organized toward biological functions, but also enable the investigation of the biological mechanisms that individual eQTLs perturb in a signaling pathway. We applied the new approach to characterize the effects of eQTLs on the structure and organization of gene clusters in Caenorhabditis elegans. This study provides the first characterization, to our knowledge, of the effects of genetic variants on the regulatory network of gene expression. The approach developed can also facilitate the genetic dissection of other dynamic processes, including development, physiology and disease progression in any organisms.

  13. Molecular analysis of the hrp gene cluster in Xanthomonas oryzae pathovar oryzae KACC10859.

    PubMed

    Cho, Hee-Jung; Park, Young-Jin; Noh, Tae-Hwan; Kim, Yeong-Tae; Kim, Jeong-Gu; Song, Eun-Sung; Lee, Dong-Hee; Lee, Byoung-Moo

    2008-06-01

    Xanthomonas oryzae pathovar oryzae is the causal agent of rice bacterial blight. The plant pathogenic bacterium X. oryzae pv. oryzae expresses a type III secretion system that is necessary for both the pathogenicity in susceptible hosts and the induction of the hypersensitive response in resistant plants. This specialized protein transport system is encoded by a 32.18kb hrp (hypersensitive response and pathogenicity) gene cluster. The hrp gene cluster is composed of nine hrp, nine hrc (hrp conserved) and eight hpa (hrp-associated) genes and is controlled by HrpG and HrpX, which are known as regulators of the hrp gene cluster. Before mutational analysis of these hrp genes, the transcriptional linkages of the core region of the hrp gene cluster from hpaB to hrcC of the X. oryzae pv. oryzae KACC10859 was determined and the non-polarity of EZTn5 insertional mutagenesis was demonstrated by reverse transcription polymerase chain reaction. Pathogenicity assays of these non-polar hrp mutants were carried out on the susceptible rice cultivar, Milyang-23. According to the results of these assays, all hrp-hrc, except hrpF, and hpaB mutants lost their pathogenicity, which indicates that most hrp-hrc genes encode essential pathogenicity factors. On the other hand, most hpa mutants showed decreased virulence in a different pattern, i.e., hpa genes are not essential but are important for pathogenicity.

  14. Fine mapping of disease genes via haplotype clustering.

    PubMed

    Waldron, E R B; Whittaker, J C; Balding, D J

    2006-02-01

    We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales.

  15. Sequence breakpoints in the aflatoxin biosynthesis gene cluster and flanking regions in nonaflatoxigenic Aspergillus flavus isolates.

    PubMed

    Chang, Perng-Kuang; Horn, Bruce W; Dorner, Joe W

    2005-11-01

    Aspergillus flavus populations are genetically diverse. Isolates that produce either, neither, or both aflatoxins and cyclopiazonic acid (CPA) are present in the field. We investigated defects in the aflatoxin gene cluster in 38 nonaflatoxigenic A. flavus isolates collected from southern United States. PCR assays using aflatoxin-gene-specific primers grouped these isolates into eight (A-H) deletion patterns. Patterns C, E, G, and H, which contain 40 kb deletions, were examined for their sequence breakpoints. Pattern C has one breakpoint in the cypA 3' untranslated region (UTR) and another in the verA coding region. Pattern E has a breakpoint in the amdA coding region and another in the ver1 5'UTR. Pattern G contains a deletion identical to the one found in pattern C and has another deletion that extends from the cypA coding region to one end of the chromosome as suggested by the presence of telomeric sequence repeats, CCCTAATGTTGA. Pattern H has a deletion of the entire aflatoxin gene cluster from the hexA coding region in the sugar utilization gene cluster to the telomeric region. Thus, deletions in the aflatoxin gene cluster among A. flavus isolates are not rare, and the patterns appear to be diverse. Genetic drift may be a driving force that is responsible for the loss of the entire aflatoxin gene cluster in nonaflatoxigenic A. flavus isolates when aflatoxins have lost their adaptive value in nature.

  16. Clustering change patterns using Fourier transformation with time-course gene expression data.

    PubMed

    Kim, Jaehee

    2011-01-01

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.

  17. Cluster headache is associated with the alcohol dehydrogenase 4 (ADH4) gene.

    PubMed

    Rainero, Innocenzo; Rubino, Elisa; Gallone, Salvatore; Fenoglio, Pierpaola; Negro, Elisa; De Martino, Paola; Savi, Lidia; Pinessi, Lorenzo

    2010-01-01

    Alcohol is a well-known trigger factor for cluster headache attacks during the active phases of the disease. The alcohol dehydrogenase (ADH) pathway, which converts alcohol to the toxic substance acetaldehyde, is responsible for most of the alcohol breakdown in the liver. Humans have 7 ADH genes, tightly clustered on chromosome 4q21-q25, that encode different ADH isoforms. The ADH4 gene encodes the class II ADH4 pi subunit, which contributes, in addition to alcohol, to the metabolization of a wide variety of substrates, including retinol, other aliphatic alcohols, hydroxysteroids, and biogenic amines. The purpose of this study was to investigate the association of genetic variants within the ADH4 gene with cluster headache susceptibility and phenotype. A total of 110 consecutive unrelated cluster headache patients and 203 age- and sex-matched healthy controls of Caucasian origin were involved in the study. Patients and controls were genotyped for 2 bi-allelic single nucleotide polymorphisms (SNPs) of the ADH4 gene: SNP1 - rs1800759 and SNP2 - rs1126671. Allele, genotype, and haplotype frequencies of the examined polymorphisms were compared between cases and controls. Genotype frequencies of the rs1126671 polymorphism resulted significantly different between cluster headache patients and controls (chi(2) = 10.269, P = .006). The carriage of the AA genotype, in comparison with remaining genotypes, was associated with a significantly increased disease risk (OR = 2.33, 95% CI: 1.25-4.37). Haplotype analysis confirmed the association between the ADH4 gene and the disease. No association between different clinical characteristics of cluster headache and the examined polymorphisms was found. Our data suggest that cluster headache is associated with the ADH4 gene or a linked locus. Additional studies are warranted to elucidate the role of this gene in the etiopathogenesis of the disease.

  18. Characteristics and clustering of human ribosomal protein genes

    PubMed Central

    Ishii, Kyota; Washio, Takanori; Uechi, Tamayo; Yoshihama, Maki; Kenmochi, Naoya; Tomita, Masaru

    2006-01-01

    Background The ribosome is a central player in the translation system, which in mammals consists of four RNA species and 79 ribosomal proteins (RPs). The control mechanisms of gene expression and the functions of RPs are believed to be identical. Most RP genes have common promoters and were therefore assumed to have a unified gene expression control mechanism. Results We systematically analyzed the homogeneity and heterogeneity of RP genes on the basis of their expression profiles, promoter structures, encoded amino acid compositions, and codon compositions. The results revealed that (1) most RP genes are coordinately expressed at the mRNA level, with higher signals in the spleen, lymph node dissection (LND), and fetal brain. However, 17 genes, including the P protein genes (RPLP0, RPLP1, RPLP2), are expressed in a tissue-specific manner. (2) Most promoters have GC boxes and possible binding sites for nuclear respiratory factor 2, Yin and Yang 1, and/or activator protein 1. However, they do not have canonical TATA boxes. (3) Analysis of the amino acid composition of the encoded proteins indicated a high lysine and arginine content. (4) The major RP genes exhibit a characteristic synonymous codon composition with high rates of G or C in the third-codon position and a high content of AAG, CAG, ATC, GAG, CAC, and CTG. Conclusion Eleven of the RP genes are still identified as being unique and did not exhibit at least some of the above characteristics, indicating that they may have unknown functions not present in other RP genes. Furthermore, we found sequences conserved between human and mouse genes around the transcription start sites and in the intronic regions. This study suggests certain overall trends and characteristic features of human RP genes. PMID:16504170

  19. Structural variation of the ribosomal gene cluster within the class Insecta

    SciTech Connect

    Mukha, D.V.; Sidorenko, A.P.; Lazebnaya, I.V.

    1995-09-01

    General estimation of ribosomal DNA variation within the class Insecta is presented. It is shown that, using blot-hybridization, one can detect differences in the structure of the ribosomal gene cluster not only between genera within an order, but also between species within a genera, including sibling species. Structure of the ribosomal gene cluster of the Coccinellidae family (ladybirds) is analyzed. It is shown that cloned highly conservative regions of ribosomal DNA of Tetrahymena pyriformis can be used as probes for analyzing ribosomal genes in insects. 24 refs., 4 figs.

  20. Biosynthesis of a natural polyketide-isoprenoid hybrid compound, furaquinocin A: identification and heterologous expression of the gene cluster.

    PubMed

    Kawasaki, Takashi; Hayashi, Yutaka; Kuzuyama, Tomohisa; Furihata, Kazuo; Itoh, Nobuya; Seto, Haruo; Dairi, Tohru

    2006-02-01

    Furaquinocin (FQ) A, produced by Streptomyces sp. strain KO-3988, is a natural polyketide-isoprenoid hybrid compound that exhibits a potent antitumor activity. As a first step toward understanding the biosynthetic machinery of this unique and pharmaceutically useful compound, we have cloned an FQ A biosynthetic gene cluster by taking advantage of the fact that an isoprenoid biosynthetic gene cluster generally exists in flanking regions of the mevalonate (MV) pathway gene cluster in actinomycetes. Interestingly, Streptomyces sp. strain KO-3988 was the first example of a microorganism equipped with two distinct mevalonate pathway gene clusters. We were able to localize a 25-kb DNA region that harbored FQ A biosynthetic genes (fur genes) in both the upstream and downstream regions of one of the MV pathway gene clusters (MV2) by using heterologous expression in Streptomyces lividans TK23. This was the first example of a gene cluster responsible for the biosynthesis of a polyketide-isoprenoid hybrid compound. We have also confirmed that four genes responsible for viguiepinol [3-hydroxypimara-9(11),15-diene] biosynthesis exist in the upstream region of the other MV pathway gene cluster (MV1), which had previously been cloned from strain KO-3988. This was the first example of prokaryotic enzymes with these biosynthetic functions. By phylogenetic analysis, these two MV pathway clusters were identified as probably being independently distributed in strain KO-3988 (orthologs), rather than one cluster being generated by the duplication of the other cluster (paralogs).

  1. Identification and manipulation of the pleuromutilin gene cluster from Clitopilus passeckerianus for increased rapid antibiotic production

    NASA Astrophysics Data System (ADS)

    Bailey, Andy M.; Alberti, Fabrizio; Kilaru, Sreedhar; Collins, Catherine M.; de Mattos-Shipley, Kate; Hartley, Amanda J.; Hayes, Patrick; Griffin, Alison; Lazarus, Colin M.; Cox, Russell J.; Willis, Christine L.; O’Dwyer, Karen; Spence, David W.; Foster, Gary D.

    2016-05-01

    Semi-synthetic derivatives of the tricyclic diterpene antibiotic pleuromutilin from the basidiomycete Clitopilus passeckerianus are important in combatting bacterial infections in human and veterinary medicine. These compounds belong to the only new class of antibiotics for human applications, with novel mode of action and lack of cross-resistance, representing a class with great potential. Basidiomycete fungi, being dikaryotic, are not generally amenable to strain improvement. We report identification of the seven-gene pleuromutilin gene cluster and verify that using various targeted approaches aimed at increasing antibiotic production in C. passeckerianus, no improvement in yield was achieved. The seven-gene pleuromutilin cluster was reconstructed within Aspergillus oryzae giving production of pleuromutilin in an ascomycete, with a significant increase (2106%) in production. This is the first gene cluster from a basidiomycete to be successfully expressed in an ascomycete, and paves the way for the exploitation of a metabolically rich but traditionally overlooked group of fungi.

  2. Structure of human type-I interferon gene cluster determined from a YAC clone contig

    SciTech Connect

    Diaz, M.O.; Pomykala, H.M.; Bohlander, S.K.

    1994-08-01

    A map of the type-I interferon gene cluster located on the short arm of human chromosome 9 (9p) has been constructed using a contig of YAC clones. This map contains 26 interferon (IFN) genes and pseudogenes, and it accounts for all, except one, of the IFN sequences previously reported by other authors, plus a new IFNW pseudogene. The most distal gene on 9p is IFNB, and the most proximal one is IFNWP19. The direction of transcription for the 20 most distal IFN sequences is toward the telomere and for the 6 most proximal sequences, toward the centromere. Several regions of the cluster show evidence of ancestral duplication events. Some of these events may be explained by unequal crossing over between adjacent tandem genes. The location of several breakpoints within the cluster, from deletions associated with leukemias and gliomas, was also determined. 41 refs., 5 figs., 2 tabs.

  3. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  4. Identification and manipulation of the pleuromutilin gene cluster from Clitopilus passeckerianus for increased rapid antibiotic production.

    PubMed

    Bailey, Andy M; Alberti, Fabrizio; Kilaru, Sreedhar; Collins, Catherine M; de Mattos-Shipley, Kate; Hartley, Amanda J; Hayes, Patrick; Griffin, Alison; Lazarus, Colin M; Cox, Russell J; Willis, Christine L; O'Dwyer, Karen; Spence, David W; Foster, Gary D

    2016-05-04

    Semi-synthetic derivatives of the tricyclic diterpene antibiotic pleuromutilin from the basidiomycete Clitopilus passeckerianus are important in combatting bacterial infections in human and veterinary medicine. These compounds belong to the only new class of antibiotics for human applications, with novel mode of action and lack of cross-resistance, representing a class with great potential. Basidiomycete fungi, being dikaryotic, are not generally amenable to strain improvement. We report identification of the seven-gene pleuromutilin gene cluster and verify that using various targeted approaches aimed at increasing antibiotic production in C. passeckerianus, no improvement in yield was achieved. The seven-gene pleuromutilin cluster was reconstructed within Aspergillus oryzae giving production of pleuromutilin in an ascomycete, with a significant increase (2106%) in production. This is the first gene cluster from a basidiomycete to be successfully expressed in an ascomycete, and paves the way for the exploitation of a metabolically rich but traditionally overlooked group of fungi.

  5. Identification and manipulation of the pleuromutilin gene cluster from Clitopilus passeckerianus for increased rapid antibiotic production

    PubMed Central

    Bailey, Andy M.; Alberti, Fabrizio; Kilaru, Sreedhar; Collins, Catherine M.; de Mattos-Shipley, Kate; Hartley, Amanda J.; Hayes, Patrick; Griffin, Alison; Lazarus, Colin M.; Cox, Russell J.; Willis, Christine L.; O’Dwyer, Karen; Spence, David W.; Foster, Gary D.

    2016-01-01

    Semi-synthetic derivatives of the tricyclic diterpene antibiotic pleuromutilin from the basidiomycete Clitopilus passeckerianus are important in combatting bacterial infections in human and veterinary medicine. These compounds belong to the only new class of antibiotics for human applications, with novel mode of action and lack of cross-resistance, representing a class with great potential. Basidiomycete fungi, being dikaryotic, are not generally amenable to strain improvement. We report identification of the seven-gene pleuromutilin gene cluster and verify that using various targeted approaches aimed at increasing antibiotic production in C. passeckerianus, no improvement in yield was achieved. The seven-gene pleuromutilin cluster was reconstructed within Aspergillus oryzae giving production of pleuromutilin in an ascomycete, with a significant increase (2106%) in production. This is the first gene cluster from a basidiomycete to be successfully expressed in an ascomycete, and paves the way for the exploitation of a metabolically rich but traditionally overlooked group of fungi. PMID:27143514

  6. Co-clustering phenome–genome for phenotype classification and disease gene discovery

    PubMed Central

    Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui

    2012-01-01

    Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways. PMID:22735708

  7. A putative greigite-type magnetosome gene cluster from the candidate phylum Latescibacteria.

    PubMed

    Lin, Wei; Pan, Yongxin

    2015-04-01

    The intracellular biomineralization of magnetite and/or greigite magnetosomes in magnetotactic bacteria (MTB) is strictly controlled by a group of conserved genes, termed magnetosome genes, which are organized as clusters (or islands) in MTB genomes. So far, all reported MTB are affiliated within the Proteobacteria phylum, the Nitrospirae phylum and the candidate division OP3. Here, we report the discovery of a putative magnetosome gene cluster structure from the draft genome of an uncultivated bacterium belonging to the candidate phylum Latescibacteria (formerly candidate division WS3) recently recovered by Rinke and colleagues, which contains 10 genes with homology to magnetosome mam genes of magnetotactic Proteobacteria and Nitrospirae. Moreover, these genes are phylogenetically closely related to greigite-type magnetosome genes that were only found from the Deltaproteobacteria MTB before, suggesting that the greigite genes may originate earlier than previously imagined. These findings indicate that some members of Latescibacteria may be capable of forming greigite magnetosomes, and thus may play previously unrecognized roles in environmental iron and sulfur cycles. The conserved genomic structure of magnetosome gene cluster in Latescibacteria phylum supports the hypothesis of horizontal transfer of these genes among distantly related bacterial groups in nature.

  8. Co-clustering phenome-genome for phenotype classification and disease gene discovery.

    PubMed

    Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui

    2012-10-01

    Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.

  9. Paradigm of Tunable Clustering Using Binarization of Consensus Partition Matrices (Bi-CoPaM) for Gene Discovery

    PubMed Central

    Abu-Jamous, Basel; Fa, Rui; Roberts, David J.; Nandi, Asoke K.

    2013-01-01

    Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies. PMID:23409186

  10. Organization of Biogenesis Genes for Aggregative Adherence Fimbria II Defines a Virulence Gene Cluster in Enteroaggregative Escherichia coli

    PubMed Central

    Elias, Waldir P.; Czeczulin, John R.; Henderson, Ian R.; Trabulsi, Luiz R.; Nataro, James P.

    1999-01-01

    Several virulence-related genes have been described for prototype enteroaggregative Escherichia coli (EAEC) strain 042, which has been shown to cause diarrhea in human volunteers. Among these factors are the enterotoxins Pet and EAST and the fimbrial antigen aggregative adherence fimbria II (AAF/II), all of which are encoded on the 65-MDa virulence plasmid pAA2. Using nucleotide sequence analysis and insertional mutagenesis, we have found that the genes required for the expression of each of these factors, as well as the transcriptional activator of fimbrial expression AggR, map to a distinct cluster on the pAA2 plasmid map. The cluster is 23 kb in length and includes two regions required for expression of the AAF/II fimbria. These fimbrial biogenesis genes feature a unique organization in which the chaperone, subunit, and transcriptional activator lie in one cluster, whereas the second, unlinked cluster comprises a silent chaperone gene, usher, and invasin reminiscent of Dr family fimbrial clusters. This plasmid-borne virulence locus may represent an important set of virulence determinants in EAEC strains. PMID:10074069

  11. Global identification of genes affecting iron-sulfur cluster biogenesis and iron homeostasis.

    PubMed

    Hidese, Ryota; Mihara, Hisaaki; Kurihara, Tatsuo; Esaki, Nobuyoshi

    2014-03-01

    Iron-sulfur (Fe-S) clusters are ubiquitous cofactors that are crucial for many physiological processes in all organisms. In Escherichia coli, assembly of Fe-S clusters depends on the activity of the iron-sulfur cluster (ISC) assembly and sulfur mobilization (SUF) apparatus. However, the underlying molecular mechanisms and the mechanisms that control Fe-S cluster biogenesis and iron homeostasis are still poorly defined. In this study, we performed a global screen to identify the factors affecting Fe-S cluster biogenesis and iron homeostasis using the Keio collection, which is a library of 3,815 single-gene E. coli knockout mutants. The approach was based on radiolabeling of the cells with [2-(14)C]dihydrouracil, which entirely depends on the activity of an Fe-S enzyme, dihydropyrimidine dehydrogenase. We identified 49 genes affecting Fe-S cluster biogenesis and/or iron homeostasis, including 23 genes important only under microaerobic/anaerobic conditions. This study defines key proteins associated with Fe-S cluster biogenesis and iron homeostasis, which will aid further understanding of the cellular mechanisms that coordinate the processes. In addition, we applied the [2-(14)C]dihydrouracil-labeling method to analyze the role of amino acid residues of an Fe-S cluster assembly scaffold (IscU) as a model of the Fe-S cluster assembly apparatus. The analysis showed that Cys37, Cys63, His105, and Cys106 are essential for the function of IscU in vivo, demonstrating the potential of the method to investigate in vivo function of proteins involved in Fe-S cluster assembly.

  12. The major chemotaxis gene cluster of Rhizobium leguminosarum bv. viciae is essential for competitive nodulation.

    PubMed

    Miller, Lance D; Yost, Christopher K; Hynes, Michael F; Alexandre, Gladys

    2007-01-01

    Rhizobium leguminosarum biovar viciae strain 3841 is a motile alpha-proteobacterium that can establish a nitrogen-fixing symbiosis within the roots of pea plants. In order to determine the contribution of chemotaxis to the lifestyle of R. leguminosarum, we have characterized the function of two chemotaxis gene clusters (che1 and che2) in controlling motility behaviour. We have found that both chemotaxis gene clusters modulate the motility swimming bias of R. leguminosarum cells and that the che1 cluster is the major pathway controlling swimming bias and chemotaxis. The che2 cluster also contributes to swimming bias, but has a minor effect on chemotaxis. Using competitive nodulation assays, we have demonstrated that a functional che1 cluster, but not the che2 cluster, promotes competitive nodulation of the peas. This finding implies that the environmental cue(s) triggering chemotaxis of R. leguminosarum bv. viciae cells towards the roots of pea and facilitating colonization are likely to be processed through the che1 cluster despite the contribution of both che clusters to swimming behaviour. A phylogenetic analysis of the distribution of che1 and che2 orthologues in the alpha-proteobacteria together with our results allow us to propose that che1 homologues are major controllers of chemotaxis and host association in the Rhizobiaceae.

  13. Haemophilus ducreyi Requires the flp Gene Cluster for Microcolony Formation In Vitro

    PubMed Central

    Nika, Joseph R.; Latimer, Jo L.; Ward, Christine K.; Blick, Robert J.; Wagner, Nikki J.; Cope, Leslie D.; Mahairas, Gregory G.; Munson, Robert S.; Hansen, Eric J.

    2002-01-01

    Haemophilus ducreyi, the etiologic agent of chancroid, has been shown to form microcolonies when cultured in the presence of human foreskin fibroblasts. We identified a 15-gene cluster in H. ducreyi that encoded predicted protein products with significant homology to those encoded by the tad (for tight adhesion) locus in Actinobacillus actinomycetemcomitans that is involved in the production of fimbriae by this periodontal pathogen. The first three open reading frames in this H. ducreyi gene cluster encoded predicted proteins with a high degree of identity to the Flp (fimbria-like protein) encoded by the first open reading frame of the tad locus; this 15-gene cluster in H. ducreyi was designated flp. RT-PCR analysis indicated that the H. ducreyi flp gene cluster was likely to be a polycistronic operon. Mutations within the flp gene cluster resulted in an inability to form microcolonies in the presence of human foreskin fibroblasts. In addition, the same mutants were defective in the ability to attach to both plastic and human foreskin fibroblasts in vitro. An H. ducreyi mutant with an inactivated tadA gene exhibited a small decrease in virulence in the temperature-dependent rabbit model for experimental chancroid, whereas another H. ducreyi mutant with inactivated flp-1 and flp-2 genes was as virulent as the wild-type parent strain. These results indicate that the flp gene cluster is essential for microcolony formation by H. ducreyi, whereas this phenotypic trait is not linked to the virulence potential of the pathogen, at least in this animal model of infection. PMID:12010986

  14. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture

    PubMed Central

    Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; Cole, James R.; Hashsham, Syed A.; Looft, Torey; Zhu, Yong-Guan

    2016-01-01

    ABSTRACT   Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. PMID:27073098

  15. Trajectory Clustering: a Non-Parametric Method for Grouping Gene Expression Time Courses, with Applications to Mammary Development

    PubMed Central

    Phang, T.L.; Neville, M.C.; Rudolph, M.; Hunter, L.

    2008-01-01

    Trajectory clustering is a novel and statistically well-founded method for clustering time series data from gene expression arrays. Trajectory clustering uses non-parametric statistics and is hence not sensitive to the particular distributions underlying gene expression data. Each cluster is clearly defined in terms of direction of change of expression for successive time points (its ‘trajectory’), and therefore has easily appreciated biological meaning. Applying the method to a dataset from mouse mammary gland development, we demonstrate that it produces different clusters than Hierarchical, K-means, and Jackknife clustering methods, even when those methods are applied to differences between successive time points. Compared to all of the other methods, trajectory clustering was better able to match a manual clustering by a domain expert, and was better able to cluster groups of genes with known related functions. PMID:12603041

  16. Discovery of Unusual Biaryl Polyketides by Activation of a Silent Streptomyces venezuelae Biosynthetic Gene Cluster

    PubMed Central

    Thanapipatsiri, Anyarat; Gomez‐Escribano, Juan Pablo; Song, Lijiang; Bibb, Maureen J.; Al‐Bassam, Mahmoud; Chandra, Govind

    2016-01-01

    Abstract Comparative transcriptional profiling of a ΔbldM mutant of Streptomyces venezuelae with its unmodified progenitor revealed that the expression of a cryptic biosynthetic gene cluster containing both type I and type III polyketide synthase genes is activated in the mutant. The 29.5 kb gene cluster, which was predicted to encode an unusual biaryl metabolite, which we named venemycin, and potentially halogenated derivatives, contains 16 genes including one—vemR—that encodes a transcriptional activator of the large ATP‐binding LuxR‐like (LAL) family. Constitutive expression of vemR in the ΔbldM mutant led to the production of sufficient venemycin for structural characterisation, confirming its unusual biaryl structure. Co‐expression of the venemycin biosynthetic gene cluster and vemR in the heterologous host Streptomyces coelicolor also resulted in venemycin production. Although the gene cluster encodes two halogenases and a flavin reductase, constitutive expression of all three genes led to the accumulation only of a monohalogenated venemycin derivative, both in the native producer and the heterologous host. A competition experiment in which equimolar quantities of sodium chloride and sodium bromide were fed to the venemycin‐producing strains resulted in the preferential incorporation of bromine, thus suggesting that bromide is the preferred substrate for one or both halogenases. PMID:27605017

  17. Discovery of Unusual Biaryl Polyketides by Activation of a Silent Streptomyces venezuelae Biosynthetic Gene Cluster.

    PubMed

    Thanapipatsiri, Anyarat; Gomez-Escribano, Juan Pablo; Song, Lijiang; Bibb, Maureen J; Al-Bassam, Mahmoud; Chandra, Govind; Thamchaipenet, Arinthip; Challis, Gregory L; Bibb, Mervyn J

    2016-11-17

    Comparative transcriptional profiling of a ΔbldM mutant of Streptomyces venezuelae with its unmodified progenitor revealed that the expression of a cryptic biosynthetic gene cluster containing both type I and type III polyketide synthase genes is activated in the mutant. The 29.5 kb gene cluster, which was predicted to encode an unusual biaryl metabolite, which we named venemycin, and potentially halogenated derivatives, contains 16 genes including one-vemR-that encodes a transcriptional activator of the large ATP-binding LuxR-like (LAL) family. Constitutive expression of vemR in the ΔbldM mutant led to the production of sufficient venemycin for structural characterisation, confirming its unusual biaryl structure. Co-expression of the venemycin biosynthetic gene cluster and vemR in the heterologous host Streptomyces coelicolor also resulted in venemycin production. Although the gene cluster encodes two halogenases and a flavin reductase, constitutive expression of all three genes led to the accumulation only of a monohalogenated venemycin derivative, both in the native producer and the heterologous host. A competition experiment in which equimolar quantities of sodium chloride and sodium bromide were fed to the venemycin-producing strains resulted in the preferential incorporation of bromine, thus suggesting that bromide is the preferred substrate for one or both halogenases.

  18. Unbiased Functional Clustering of Gene Variants with a Phenotypic-Linkage Network

    PubMed Central

    Honti, Frantisek; Meader, Stephen; Webber, Caleb

    2014-01-01

    Groupwise functional analysis of gene variants is becoming standard in next-generation sequencing studies. As the function of many genes is unknown and their classification to pathways is scant, functional associations between genes are often inferred from large-scale omics data. Such data types—including protein–protein interactions and gene co-expression networks—are used to examine the interrelations of the implicated genes. Statistical significance is assessed by comparing the interconnectedness of the mutated genes with that of random gene sets. However, interconnectedness can be affected by confounding bias, potentially resulting in false positive findings. We show that genes implicated through de novo sequence variants are biased in their coding-sequence length and longer genes tend to cluster together, which leads to exaggerated p-values in functional studies; we present here an integrative method that addresses these bias. To discern molecular pathways relevant to complex disease, we have inferred functional associations between human genes from diverse data types and assessed them with a novel phenotype-based method. Examining the functional association between de novo gene variants, we control for the heretofore unexplored confounding bias in coding-sequence length. We test different data types and networks and find that the disease-associated genes cluster more significantly in an integrated phenotypic-linkage network than in other gene networks. We present a tool of superior power to identify functional associations among genes mutated in the same disease even after accounting for significant sequencing study bias and demonstrate the suitability of this method to functionally cluster variant genes underlying polygenic disorders. PMID:25166029

  19. The long noncoding RNA Gm15055 represses Hoxa gene expression by recruiting PRC2 to the gene cluster.

    PubMed

    Liu, Guo-You; Zhao, Guang-Nian; Chen, Xiao-Feng; Hao, De-Long; Zhao, Xiang; Lv, Xiang; Liu, De-Pei

    2016-04-07

    The Hox genes encode transcription factors that determine embryonic pattern formation. In embryonic stem cells, the Hox genes are silenced by PRC2. Recent studies have reported a role for long noncoding RNAs in PRC2 recruitment in vertebrates. However, little is known about how PRC2 is recruited to the Hox genes in ESCs. Here, we used stable knockdown and knockout strategies to characterize the function of the long noncoding RNAGm15055 in the regulation of Hoxa genes in mouse ESCs. We found that Gm15055 is highly expressed in mESCs and its expression is maintained by OCT4.Gm15055 represses Hoxa gene expression by recruiting PRC2 to the cluster and maintaining the H3K27me3 modification on Hoxa promoters. A chromosome conformation capture assay revealed the close physical association of the Gm15055 locus to multiple sites at the Hoxa gene cluster in mESCs, which may facilitate the in cis targeting of Gm15055RNA to the Hoxa genes. Furthermore, an OCT4-responsive positive cis-regulatory element is found in the Gm15055 gene locus, which potentially regulates both Gm15055 itself and the Hoxa gene activation. This study suggests how PRC2 is recruited to the Hoxa locus in mESCs, and implies an elaborate mechanism for Hoxa gene regulation in mESCs. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

    PubMed Central

    Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

    2015-01-01

    Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180

  1. MS/MS networking guided analysis of molecule and gene cluster families

    PubMed Central

    Nguyen, Don Duy; Wu, Cheng-Hsuan; Moree, Wilna J.; Lamsa, Anne; Medema, Marnix H.; Zhao, Xiling; Gavilan, Ronnie G.; Aparicio, Marystella; Atencio, Librada; Jackson, Chanaye; Ballesteros, Javier; Sanchez, Joel; Watrous, Jeramie D.; Phelan, Vanessa V.; van de Wiel, Corine; Kersten, Roland D.; Mehnaz, Samina; De Mot, René; Shank, Elizabeth A.; Charusanti, Pep; Nagarajan, Harish; Duggan, Brendan M.; Moore, Bradley S.; Bandeira, Nuno; Palsson, Bernhard Ø.; Pogliano, Kit; Gutiérrez, Marcelino; Dorrestein, Pieter C.

    2013-01-01

    The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779T. The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family–gene cluster families of hundreds or more diverse organisms in one single MS/MS network. PMID:23798442

  2. Bacterial Biosynthetic Gene Clusters Encoding the Anti-cancer Haterumalide Class of Molecules

    PubMed Central

    Matilla, Miguel A.; Stöckmann, Henning; Leeper, Finian J.; Salmond, George P. C.

    2012-01-01

    Haterumalides are halogenated macrolides with strong antitumor properties, making them attractive targets for chemical synthesis. Unfortunately, current synthetic routes to these molecules are inefficient. The potent haterumalide, oocydin A, was previously identified from two plant-associated bacteria through its high bioactivity against plant pathogenic fungi and oomycetes. In this study, we describe oocydin A (ooc) biosynthetic gene clusters identified by genome sequencing, comparative genomics, and chemical analysis in four plant-associated enterobacteria of the Serratia and Dickeya genera. Disruption of the ooc gene cluster abolished oocydin A production and bioactivity against fungi and oomycetes. The ooc gene clusters span between 77 and 80 kb and encode five multimodular polyketide synthase (PKS) proteins, a hydroxymethylglutaryl-CoA synthase cassette and three flavin-dependent tailoring enzymes. The presence of two free-standing acyltransferase proteins classifies the oocydin A gene cluster within the growing family of trans-AT PKSs. The amino acid sequences and organization of the PKS domains are consistent with the chemical predictions and functional peculiarities associated with trans-acyltransferase PKS. Based on extensive in silico analysis of the gene cluster, we propose a biosynthetic model for the production of oocydin A and, by extension, for other members of the haterumalide family of halogenated macrolides exhibiting anti-cancer, anti-fungal, and other interesting biological properties. PMID:23012376

  3. MS/MS networking guided analysis of molecule and gene cluster families.

    PubMed

    Nguyen, Don Duy; Wu, Cheng-Hsuan; Moree, Wilna J; Lamsa, Anne; Medema, Marnix H; Zhao, Xiling; Gavilan, Ronnie G; Aparicio, Marystella; Atencio, Librada; Jackson, Chanaye; Ballesteros, Javier; Sanchez, Joel; Watrous, Jeramie D; Phelan, Vanessa V; van de Wiel, Corine; Kersten, Roland D; Mehnaz, Samina; De Mot, René; Shank, Elizabeth A; Charusanti, Pep; Nagarajan, Harish; Duggan, Brendan M; Moore, Bradley S; Bandeira, Nuno; Palsson, Bernhard Ø; Pogliano, Kit; Gutiérrez, Marcelino; Dorrestein, Pieter C

    2013-07-09

    The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779(T). The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family-gene cluster families of hundreds or more diverse organisms in one single MS/MS network.

  4. Genome mining demonstrates the widespread occurrence of gene clusters encoding bacteriocins in cyanobacteria.

    PubMed

    Wang, Hao; Fewer, David P; Sivonen, Kaarina

    2011-01-01

    Cyanobacteria are a rich source of natural products with interesting biological activities. Many of these are peptides and the end products of a non-ribosomal pathway. However, several cyanobacterial peptide classes were recently shown to be produced through the proteolytic cleavage and post-translational modification of short precursor peptides. A new class of bacteriocins produced through the proteolytic cleavage and heterocyclization of precursor proteins was recently identified from marine cyanobacteria. Here we show the widespread occurrence of bacteriocin gene clusters in cyanobacteria through comparative analysis of 58 cyanobacterial genomes. A total of 145 bacteriocin gene clusters were discovered through genome mining. These clusters encoded 290 putative bacteriocin precursors. They ranged in length from 28 to 164 amino acids with very little sequence conservation of the core peptide. The gene clusters could be classified into seven groups according to their gene organization and domain composition. This classification is supported by phylogenetic analysis, which further indicated independent evolutionary trajectories of gene clusters in different groups. Our data suggests that cyanobacteria are a prolific source of low-molecular weight post-translationally modified peptides.

  5. Close linkage of the two keratin gene clusters in the human genome

    SciTech Connect

    Milisavljevic, V.; Freedberg, I.M.; Blumenberg, M.

    1996-05-15

    Mapping studies of functional keratin genes in the human genome have localized most of the acidic keratin genes to chromosome 17q12-q21 and the basic keratin genes to chromosome 12 q11-q13. Within the acidic keratin locus two clusters were identified, one containing the genes for K15 and K19, the other the genes for K14, K16, and K17. The relative positions and the distance between the two clusters have not been determined previously. In this paper we describe our analysis of P1 clones containing multiple acidic keratin genes, which were studied using restriction analysis and Southern blot hybridization with PCR-amplified probes specific for functional human keratin genes 15, 17, and 19. Our results show that the two clusters are very closely linked to each other, within a 55-kb region in the human genome. The genes are organized 5{prime} to 3{prime} in the following order: 5{prime}-K19-K15-K17-K16-K14. Between K15 and K17 at least one additional, unidentified keratin gene is present. 30 refs., 2 figs.

  6. Operon and non-operon gene clusters in the C. elegans genome.

    PubMed

    Blumenthal, Thomas; Davis, Paul; Garrido-Lecca, Alfonso

    2015-04-28

    Nearly 15% of the ~20,000 C. elegans genes are contained in operons, multigene clusters controlled by a single promoter. The vast majority of these are of a type where the genes in the cluster are ~100 bp apart and the pre-mRNA is processed by 3' end formation accompanied by trans-splicing. A spliced leader, SL2, is specialized for operon processing. Here we summarize current knowledge on several variations on this theme including: (1) hybrid operons, which have additional promoters between genes; (2) operons with exceptionally long (> 1 kb) intercistronic regions; (3) operons with a second 3' end formation site close to the trans-splice site; (4) alternative operons, in which the exons are sometimes spliced as a single gene and sometimes as two genes; (5) SL1-type operons, which use SL1 instead of SL2 to trans-splice and in which there is no intercistronic space; (6) operons that make dicistronic mRNAs; and (7) non-operon gene clusters, in which either two genes use a single exon as the 3' end of one and the 5' end of the next, or the 3' UTR of one gene serves as the outron of the next. Each of these variations is relatively infrequent, but together they show a remarkable variety of tight-linkage gene arrangements in the C. elegans genome.

  7. The amt gene cluster of the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120.

    PubMed

    Paz-Yepes, Javier; Merino-Puerto, Victoria; Herrero, Antonia; Flores, Enrique

    2008-10-01

    The genome of the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120 bears a gene cluster including three amt genes that, based on homology of their protein products, we designate amt4, amt1, and amtB. Expression of the three genes took place upon ammonium withdrawal in combined nitrogen-free medium and was NtcA dependent. The genes were transcribed independently, but an amt4-amt1 dicistronic transcript was also produced, and expression was highest for the amt1 gene. A mutant with the whole amt region removed could grow under laboratory conditions using ammonium, nitrate, or dinitrogen as the nitrogen source.

  8. The amt Gene Cluster of the Heterocyst-Forming Cyanobacterium Anabaena sp. Strain PCC 7120 ▿

    PubMed Central

    Paz-Yepes, Javier; Merino-Puerto, Victoria; Herrero, Antonia; Flores, Enrique

    2008-01-01

    The genome of the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120 bears a gene cluster including three amt genes that, based on homology of their protein products, we designate amt4, amt1, and amtB. Expression of the three genes took place upon ammonium withdrawal in combined nitrogen-free medium and was NtcA dependent. The genes were transcribed independently, but an amt4-amt1 dicistronic transcript was also produced, and expression was highest for the amt1 gene. A mutant with the whole amt region removed could grow under laboratory conditions using ammonium, nitrate, or dinitrogen as the nitrogen source. PMID:18689479

  9. Genetic localization and in vivo characterization of a Monascus azaphilone pigment biosynthetic gene cluster.

    PubMed

    Balakrishnan, Bijinu; Karki, Suman; Chiu, Shih-Hau; Kim, Hyun-Ju; Suh, Jae-Won; Nam, Bora; Yoon, Yeo-Min; Chen, Chien-Chi; Kwon, Hyung-Jin

    2013-07-01

    Monascus spp. produce several well-known polyketides such as monacolin K, citrinin, and azaphilone pigments. In this study, the azaphilone pigment biosynthetic gene cluster was identified through T-DNA random mutagenesis in Monascus purpureus. The albino mutant W13 bears a T-DNA insertion upstream of a transcriptional regulator gene (mppR1). The transcription of mppR1 and the nearby polyketide synthase gene (MpPKS5) was significantly repressed in the W13 mutant. Targeted inactivation of MpPKS5 also gave rise to an albino mutant, confirming that mppR1 and MpPKS5 belong to an azaphilone pigment biosynthetic gene cluster. This M. purpureus sequence was used to identify the whole biosynthetic gene cluster in the Monascus pilosus genome. MpPKS5 contains SAT/KS/AT/PT/ACP/MT/R domains, and this domain organization is preserved in other azaphilone polyketide synthases. This biosynthetic gene cluster also encodes fatty acid synthase (FAS), which is predicted to assist the synthesis of 3-oxooactanoyl-CoA and 3-oxodecanoyl-CoA. These 3-oxoacyl compounds are proposed to be incorporated into the azaphilone backbone to complete the pigment biosynthesis. A monooxygenase gene (an azaH and tropB homolog) that is located far downstream of the FAS gene is proposed to be involved in pyrone ring formation. A homology search on other fungal genome sequences suggests that this azaphilone pigment gene cluster also exists in the Penicillium marneffei and Talaromyces stipitatus genomes.

  10. Molecular cloning and identification of the laspartomycin biosynthetic gene cluster from Streptomyces viridochromogenes

    PubMed Central

    Wang, Yang; Chen, Ying; Shen, Qirong; Yin, Xihou

    2011-01-01

    The biosynthetic gene cluster for laspartomycins, a family of 11 amino acid peptide antibiotics, has been cloned and sequenced from Streptomyces viridochromogenes ATCC 29814. Annotation of a segment of 88912 bp of S. viridochromogenes genomic sequence revealed the putative las cluster and its flanking regions which harbor 43 open reading frames. The lpm cluster, which spans approximately 60 kb, consists of 21 open reading frames. Those include four NRPS genes (lpmA/orf18, lpmB/orf25, lpmC/orf26 and lpmD/orf27), four genes (orfs 21, 22, 24 and 29) involved in the lipid tail biosynthesis and attachment, four regulatory genes (orfs 13, 19, 32 and 33) and three putative exporters or self-resistance genes (orfs 14, 20 and 30). In addition, the gene involved in the biosynthesis of the nonproteinogenic amino acid Pip was also identified in the lpm cluster while the genes necessary for the biosynthesis of the rare residue diaminopropionic acid (Dap) were found to reside elsewhere on the chromosome. Interestingly, the dabA, dabB and dabC genes predicted to code for the biosynthesis of the unusual amino acid diaminobutyric acid (Dab) are organized into the lpm cluster even though the Dab residue was not found in the laspartomycins. Disruption of the NRPS lpmC gene completely abolished laspartomycin production in the corresponding mutant strain. These findings will allow molecular engineering and combinatorial biosynthesis approaches to expand the structural diversity of the amphomycin-group peptide antibiotics including the laspartomycins and friulimicins. PMID:21640802

  11. Clustering by fast search and merge of local density peaks for gene expression microarray data.

    PubMed

    Mehmood, Rashid; El-Ashram, Saeed; Bie, Rongfang; Dawood, Hussain; Kos, Anton

    2017-04-19

    Clustering is an unsupervised approach to classify elements based on their similarity, and it is used to find the intrinsic patterns of data. There are enormous applications of clustering in bioinformatics, pattern recognition, and astronomy. This paper presents a clustering approach based on the idea that density wise single or multiple connected regions make a cluster, in which density maxima point represents the center of the corresponding density region. More precisely, our approach firstly finds the local density regions and subsequently merges the density connected regions to form the meaningful clusters. This idea empowers the clustering procedure, in which outliers are automatically detected, higher dense regions are intuitively determined and merged to form clusters of arbitrary shape, and clusters are identified regardless the dimensionality of space in which they are embedded. Extensive experiments are performed on several complex data sets to analyze and compare our approach with the state-of-the-art clustering methods. In addition, we benchmarked the algorithm on gene expression microarray data sets for cancer subtyping; to distinguish normal tissues from tumor; and to classify multiple tissue data sets.

  12. Clustering by fast search and merge of local density peaks for gene expression microarray data

    PubMed Central

    Mehmood, Rashid; El-Ashram, Saeed; Bie, Rongfang; Dawood, Hussain; Kos, Anton

    2017-01-01

    Clustering is an unsupervised approach to classify elements based on their similarity, and it is used to find the intrinsic patterns of data. There are enormous applications of clustering in bioinformatics, pattern recognition, and astronomy. This paper presents a clustering approach based on the idea that density wise single or multiple connected regions make a cluster, in which density maxima point represents the center of the corresponding density region. More precisely, our approach firstly finds the local density regions and subsequently merges the density connected regions to form the meaningful clusters. This idea empowers the clustering procedure, in which outliers are automatically detected, higher dense regions are intuitively determined and merged to form clusters of arbitrary shape, and clusters are identified regardless the dimensionality of space in which they are embedded. Extensive experiments are performed on several complex data sets to analyze and compare our approach with the state-of-the-art clustering methods. In addition, we benchmarked the algorithm on gene expression microarray data sets for cancer subtyping; to distinguish normal tissues from tumor; and to classify multiple tissue data sets. PMID:28422088

  13. Identification of a Cellobiose Utilization Gene Cluster with Cryptic β-Galactosidase Activity in Vibrio fischeri▿

    PubMed Central

    Adin, Dawn M.; Visick, Karen L.; Stabb, Eric V.

    2008-01-01

    Cellobiose utilization is a variable trait that is often used to differentiate members of the family Vibrionaceae. We investigated how Vibrio fischeri ES114 utilizes cellobiose and found a cluster of genes required for growth on this β-1,4-linked glucose disaccharide. This cluster includes genes annotated as a phosphotransferase system II (celA, celB, and celC), a glucokinase (celK), and a glucosidase (celG). Directly downstream of celCBGKA is celI, which encodes a LacI family regulator that represses cel transcription in the absence of cellobiose. When the celCBGKAI gene cluster was transferred to cellobiose-negative strains of Vibrio and Photobacterium, the cluster conferred the ability to utilize cellobiose. Genomic analyses of naturally cellobiose-positive Vibrio species revealed that V. salmonicida has a homolog of the celCBGKAI cluster, but V. vulnificus does not. Moreover, bioinformatic analyses revealed that CelG and CelK share the greatest homology with glucosidases and glucokinases in the phylum Firmicutes. These observations suggest that distinct genes for cellobiose utilization have been acquired by different lineages within the family Vibrionaceae. In addition, the loss of the celI regulator, but not the structural genes, attenuated the ability of V. fischeri to compete for colonization of its natural host, Euprymna scolopes, suggesting that repression of the cel gene cluster is important in this symbiosis. Finally, we show that the V. fischeri cellobioase (CelG) preferentially cleaves β-d-glucose linkages but also cleaves β-d-galactose-linked substrates such as 5-bromo-4-chloro-3-indolyl-β-d-galactoside (X-gal), a finding that has important implications for the use of lacZ as a marker or reporter gene in V. fischeri. PMID:18487409

  14. The Eucalyptus grandis NBS-LRR Gene Family: Physical Clustering and Expression Hotspots

    PubMed Central

    Christie, Nanette; Tobias, Peri A.; Naidoo, Sanushka; Külheim, Carsten

    2016-01-01

    Eucalyptus grandis is a commercially important hardwood species and is known to be susceptible to a number of pests and pathogens. Determining mechanisms of defense is therefore a research priority. The published genome for E. grandis has aided the identification of one important class of resistance (R) genes that incorporate nucleotide binding sites and leucine-rich repeat domains (NBS-LRR). Using an iterative search process we identified NBS-LRR gene models within the E. grandis genome. We characterized the gene models and identified their genomic arrangement. The gene expression patterns were examined in E. grandis clones, challenged with a fungal pathogen (Chrysoporthe austroafricana) and insect pest (Leptocybe invasa). One thousand two hundred and fifteen putative NBS-LRR coding sequences were located which aligned into two large classes, Toll or interleukin-1 receptor (TIR) and coiled-coil (CC) based on NB-ARC domains. NBS-LRR gene-rich regions were identified with 76% organized in clusters of three or more genes. A further 272 putative incomplete resistance genes were also identified. We determined that E. grandis has a higher ratio of TIR to CC classed genes compared to other woody plant species as well as a smaller percentage of single NBS-LRR genes. Transcriptome profiles indicated expression hotspots, within physical clusters, including expression of many incomplete genes. The clustering of putative NBS-LRR genes correlates with differential expression responses in resistant and susceptible plants indicating functional relevance for the physical arrangement of this gene family. This analysis of the repertoire and expression of E. grandis putative NBS-LRR genes provides an important resource for the identification of novel and functional R-genes; a key objective for strategies to enhance resilience. PMID:26793216

  15. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    SciTech Connect

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  16. Regulation of Three Nitrogenase Gene Clusters in the Cyanobacterium Anabaena variabilis ATCC 29413

    PubMed Central

    Thiel, Teresa; Pratte, Brenda S.

    2014-01-01

    The filamentous cyanobacterium Anabaena variabilis ATCC 29413 fixes nitrogen under aerobic conditions in specialized cells called heterocysts that form in response to an environmental deficiency in combined nitrogen. Nitrogen fixation is mediated by the enzyme nitrogenase, which is very sensitive to oxygen. Heterocysts are microxic cells that allow nitrogenase to function in a filament comprised primarily of vegetative cells that produce oxygen by photosynthesis. A. variabilis is unique among well-characterized cyanobacteria in that it has three nitrogenase gene clusters that encode different nitrogenases, which function under different environmental conditions. The nif1 genes encode a Mo-nitrogenase that functions only in heterocysts, even in filaments grown anaerobically. The nif2 genes encode a different Mo-nitrogenase that functions in vegetative cells, but only in filaments grown under anoxic conditions. An alternative V-nitrogenase is encoded by vnf genes that are expressed only in heterocysts in an environment that is deficient in Mo. Thus, these three nitrogenases are expressed differentially in response to environmental conditions. The entire nif1 gene cluster, comprising at least 15 genes, is primarily under the control of the promoter for the first gene, nifB1. Transcriptional control of many of the downstream nif1 genes occurs by a combination of weak promoters within the coding regions of some downstream genes and by RNA processing, which is associated with increased transcript stability. The vnf genes show a similar pattern of transcriptional and post-transcriptional control of expression suggesting that the complex pattern of regulation of the nif1 cluster is conserved in other cyanobacterial nitrogenase gene clusters. PMID:25513762

  17. Epigenetic Characterization of the Growth Hormone Gene Identifies SmcHD1 as a Regulator of Autosomal Gene Clusters

    PubMed Central

    Massah, Shabnam; Hollebakken, Robert; Labrecque, Mark P.; Kolybaba, Addie M.; Beischlag, Timothy V.; Prefontaine, Gratien G.

    2014-01-01

    Regulatory elements for the mouse growth hormone (GH) gene are located distally in a putative locus control region (LCR) in addition to key elements in the promoter proximal region. The role of promoter DNA methylation for GH gene regulation is not well understood. Pit-1 is a POU transcription factor required for normal pituitary development and obligatory for GH gene expression. In mammals, Pit-1 mutations eliminate GH production resulting in a dwarf phenotype. In this study, dwarf mice illustrated that Pit-1 function was obligatory for GH promoter hypomethylation. By monitoring promoter methylation levels during developmental GH expression we found that the GH promoter became hypomethylated coincident with gene expression. We identified a promoter differentially methylated region (DMR) that was used to characterize a methylation-dependent DNA binding activity. Upon DNA affinity purification using the DMR and nuclear extracts, we identified structural maintenance of chromosomes hinge domain containing -1 (SmcHD1). To better understand the role of SmcHD1 in genome-wide gene expression, we performed microarray analysis and compared changes in gene expression upon reduced levels of SmcHD1 in human cells. Knock-down of SmcHD1 in human embryonic kidney (HEK293) cells revealed a disproportionate number of up-regulated genes were located on the X-chromosome, but also suggested regulation of genes on non-sex chromosomes. Among those, we identified several genes located in the protocadherin β cluster. In addition, we found that imprinted genes in the H19/Igf2 cluster associated with Beckwith-Wiedemann and Silver-Russell syndromes (BWS & SRS) were dysregulated. For the first time using human cells, we showed that SmcHD1 is an important regulator of imprinted and clustered genes. PMID:24818964

  18. Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles.

    PubMed

    Tien, Yin-Jing; Lee, Yun-Shien; Wu, Han-Ming; Chen, Chun-Houh

    2008-03-20

    The hierarchical clustering tree (HCT) with a dendrogram 1 and the singular value decomposition (SVD) with a dimension-reduced representative map 2 are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures. This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose) seriation by Chen 3 as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends. We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP.

  19. Bacillus subtilis acyl carrier protein is encoded in a cluster of lipid biosynthesis genes.

    PubMed Central

    Morbidoni, H R; de Mendoza, D; Cronan, J E

    1996-01-01

    A cluster of Bacillus subtilis fatty acid synthetic genes was isolated by complementation of an Escherichia coli fabD mutant encoding a thermosensitive malonyl coenzyme A-acyl carrier protein transacylase. The B. subtilis genomic segment contains genes that encode three fatty acid synthetic proteins, malonyl coenzyme A-acyl carrier protein transacylase (fabD), 3-ketoacyl-acyl carrier protein reductase (fabG), and the N-terminal 14 amino acid residues of acyl carrier protein (acpP). Also present is a sequence that encodes a homolog of E. coli plsX, a gene that plays a poorly understood role in phospholipid synthesis. The B. subtilis plsX gene weakly complemented an E. coli plsX mutant. The order of genes in the cluster is plsX fabD fabG acpP, the same order found in E. coli, except that in E. coli the fabH gene lies between plsX and fabD. The absence of fabH in the B. subtilis cluster is consistent with the different fatty acid compositions of the two organisms. The amino acid sequence of B. subtilis acyl carrier protein was obtained by sequencing the purified protein, and the sequence obtained strongly resembled that of E. coli acyl carrier protein, except that most of the protein retained the initiating methionine residue. The B. subtilis fab cluster was mapped to the 135 to 145 degrees region of the chromosome. PMID:8759840

  20. Hessian regularization based non-negative matrix factorization for gene expression data clustering.

    PubMed

    Liu, Xiao; Shi, Jun; Wang, Congzhi

    2015-01-01

    Since a key step in the analysis of gene expression data is to detect groups of genes that have similar expression patterns, clustering technique is then commonly used to analyze gene expression data. Data representation plays an important role in clustering analysis. The non-negative matrix factorization (NMF) is a widely used data representation method with great success in machine learning. Although the traditional manifold regularization method, Laplacian regularization (LR), can improve the performance of NMF, LR still suffers from the problem of its weak extrapolating power. Hessian regularization (HR) is a newly developed manifold regularization method, whose natural properties make it more extrapolating, especially for small sample data. In this work, we propose the HR-based NMF (HR-NMF) algorithm, and then apply it to represent gene expression data for further clustering task. The clustering experiments are conducted on five commonly used gene datasets, and the results indicate that the proposed HR-NMF outperforms LR-based NMM and original NMF, which suggests the potential application of HR-NMF for gene expression data.

  1. Functional Gene Networks: R/Bioc package to generate and analyse gene networks derived from functional enrichment and clustering

    PubMed Central

    Aibar, Sara; Fontanillo, Celia; Droste, Conrad; De Las Rivas, Javier

    2015-01-01

    Summary: Functional Gene Networks (FGNet) is an R/Bioconductor package that generates gene networks derived from the results of functional enrichment analysis (FEA) and annotation clustering. The sets of genes enriched with specific biological terms (obtained from a FEA platform) are transformed into a network by establishing links between genes based on common functional annotations and common clusters. The network provides a new view of FEA results revealing gene modules with similar functions and genes that are related to multiple functions. In addition to building the functional network, FGNet analyses the similarity between the groups of genes and provides a distance heatmap and a bipartite network of functionally overlapping genes. The application includes an interface to directly perform FEA queries using different external tools: DAVID, GeneTerm Linker, TopGO or GAGE; and a graphical interface to facilitate the use. Availability and implementation: FGNet is available in Bioconductor, including a tutorial. URL: http://bioconductor.org/packages/release/bioc/html/FGNet.html Contact: jrivas@usal.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25600944

  2. Organization and Differential Regulation of a Cluster of Lignin Peroxidase Genes of Phanerochaete chrysosporium

    PubMed Central

    Stewart, Philip; Cullen, Daniel

    1999-01-01

    The lignin peroxidases of Phanerochaete chrysosporium are encoded by a minimum of 10 closely related genes. Physical and genetic mapping of a cluster of eight lip genes revealed six genes occurring in pairs and transcriptionally convergent, suggesting that portions of the lip family arose by gene duplication events. The completed sequence of lipG and lipJ, together with previously published sequences, allowed phylogenetic and intron/exon classifications, indicating two main branches within the lip family. Competitive reverse transcription-PCR was used to assess lip transcript levels in both carbon- and nitrogen-limited media. Transcript patterns showed differential regulation of lip genes in response to medium composition. No apparent correlation was observed between genomic organization and transcript levels. Both constitutive and upregulated transcripts, structurally unrelated to peroxidases, were identified within the lip cluster. PMID:10348854

  3. [Isolation PQQ biosynthesis gene cluster from Gluconobacter oxydans based on sorbose-dehydrogenase activity].

    PubMed

    Gao, Shuying; Xiong, Xionghua; Wang, Jianhua; Zhang, Weicai

    2010-08-01

    To isolate PQQ biosynthesis gene cluster from Gluconobacter oxydans H24 based on sorbose-dehydrogenase activity. A library of Gluconobacter oxydans H24 genomic DNA was constructed with host strains Escherichia coli JM109s, which was integrated of sdh gene at the ptsG site on the chromosome of JM109. By detecting sorbose-dehydrogenase activity, clone of PQQ biosynthesis was isolated and subcloned. A positive clone was isolated from Gluconobacter oxydans H24 genomic DNA library. Within the 5,400-base-pair DNA fragment five reading frames are presented, corresponding to five of the pqq genes (pqqABCDE). The nucleotide and amino acid sequence showed highly homology to pqq genes of other bacteria. The pqqABCDE gene cluster was successfully isolated from Gluconobacter oxydans H24 by sorbose dehydrogenase activity.

  4. Comparative genome sequencing reveals chemotype-specific gene clusters in the toxigenic black mold Stachybotrys.

    PubMed

    Semeiks, Jeremy; Borek, Dominika; Otwinowski, Zbyszek; Grishin, Nick V

    2014-07-12

    The fungal genus Stachybotrys produces several diverse toxins that affect human health. Its strains comprise two mutually-exclusive toxin chemotypes, one producing satratoxins, which are a subclass of trichothecenes, and the other producing the less-toxic atranones. To determine the genetic basis for chemotype-specific differences in toxin production, the genomes of four Stachybotrys strains were sequenced and assembled de novo. Two of these strains produce atranones and two produce satratoxins. Comparative analysis of these four 35-Mbp genomes revealed several chemotype-specific gene clusters that are predicted to make secondary metabolites. The largest, which was named the core atranone cluster, encodes 14 proteins that may suffice to produce all observed atranone compounds via reactions that include an unusual Baeyer-Villiger oxidation. Satratoxins are suggested to be made by products of multiple gene clusters that encode 21 proteins in all, including polyketide synthases, acetyltransferases, and other enzymes expected to modify the trichothecene skeleton. One such satratoxin chemotype-specific cluster is adjacent to the core trichothecene cluster, which has diverged from those of other trichothecene producers to contain a unique polyketide synthase. The results suggest that chemotype-specific gene clusters are likely the genetic basis for the mutually-exclusive toxin chemotypes of Stachybotrys. A unified biochemical model for Stachybotrys toxin production is presented. Overall, the four genomes described here will be useful for ongoing studies of this mold's diverse toxicity mechanisms.

  5. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    NASA Astrophysics Data System (ADS)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  6. Whole genome sequence of Desulfovibrio magneticus strain RS-1 revealed common gene clusters in magnetotactic bacteria

    PubMed Central

    Nakazawa, Hidekazu; Arakaki, Atsushi; Narita-Yamada, Sachiko; Yashiro, Isao; Jinno, Koji; Aoki, Natsuko; Tsuruyama, Ai; Okamura, Yoshiko; Tanikawa, Satoshi; Fujita, Nobuyuki; Takeyama, Haruko; Matsunaga, Tadashi

    2009-01-01

    Magnetotactic bacteria are ubiquitous microorganisms that synthesize intracellular magnetite particles (magnetosomes) by accumulating Fe ions from aquatic environments. Recent molecular studies, including comprehensive proteomic, transcriptomic, and genomic analyses, have considerably improved our hypotheses of the magnetosome-formation mechanism. However, most of these studies have been conducted using pure-cultured bacterial strains of α-proteobacteria. Here, we report the whole-genome sequence of Desulfovibrio magneticus strain RS-1, the only isolate of magnetotactic microorganisms classified under δ-proteobacteria. Comparative genomics of the RS-1 and four α-proteobacterial strains revealed the presence of three separate gene regions (nuo and mamAB-like gene clusters, and gene region of a cryptic plasmid) conserved in all magnetotactic bacteria. The nuo gene cluster, encoding NADH dehydrogenase (complex I), was also common to the genomes of three iron-reducing bacteria exhibiting uncontrolled extracellular and/or intracellular magnetite synthesis. A cryptic plasmid, pDMC1, encodes three homologous genes that exhibit high similarities with those of other magnetotactic bacterial strains. In addition, the mamAB-like gene cluster, encoding the key components for magnetosome formation such as iron transport and magnetosome alignment, was conserved only in the genomes of magnetotactic bacteria as a similar genomic island-like structure. Our findings suggest the presence of core genetic components for magnetosome biosynthesis; these genes may have been acquired into the magnetotactic bacterial genomes by multiple gene-transfer events during proteobacterial evolution. PMID:19675025

  7. Organization, expression and evolution of a disease resistance gene cluster in soybean.

    PubMed Central

    Graham, Michelle A; Marek, Laura Fredrick; Shoemaker, Randy C

    2002-01-01

    PCR amplification was previously used to identify a cluster of resistance gene analogues (RGAs) on soybean linkage group J. Resistance to powdery mildew (Rmd-c), Phytophthora stem and root rot (Rps2), and an ineffective nodulation gene (Rj2) map within this cluster. BAC fingerprinting and RGA-specific primers were used to develop a contig of BAC clones spanning this region in cultivar "Williams 82" [rps2, Rmd (adult onset), rj2]. Two cDNAs with homology to the TIR/NBD/LRR family of R-genes have also been mapped to opposite ends of a BAC in the contig Gm_Isb001_091F11 (BAC 91F11). Sequence analyses of BAC 91F11 identified 16 different resistance-like gene (RLG) sequences with homology to the TIR/NBD/LRR family of disease resistance genes. Four of these RLGs represent two potentially novel classes of disease resistance genes: TIR/NBD domains fused inframe to a putative defense-related protein (NtPRp27-like) and TIR domains fused inframe to soybean calmodulin Ca(2+)-binding domains. RT-PCR analyses using gene-specific primers allowed us to monitor the expression of individual genes in different tissues and developmental stages. Three genes appeared to be constitutively expressed, while three were differentially expressed. Analyses of the R-genes within this BAC suggest that R-gene evolution in soybean is a complex and dynamic process. PMID:12524363

  8. Spatial Clustering of de Novo Missense Mutations Identifies Candidate Neurodevelopmental Disorder-Associated Genes.

    PubMed

    Lelieveld, Stefan H; Wiel, Laurens; Venselaar, Hanka; Pfundt, Rolph; Vriend, Gerrit; Veltman, Joris A; Brunner, Han G; Vissers, Lisenka E L M; Gilissen, Christian

    2017-09-07

    Haploinsufficiency (HI) is the best characterized mechanism through which dominant mutations exert their effect and cause disease. Non-haploinsufficiency (NHI) mechanisms, such as gain-of-function and dominant-negative mechanisms, are often characterized by the spatial clustering of mutations, thereby affecting only particular regions or base pairs of a gene. Variants leading to haploinsufficency might occasionally cluster as well, for example in critical domains, but such clustering is on the whole less pronounced with mutations often spread throughout the gene. Here we exploit this property and develop a method to specifically identify genes with significant spatial clustering patterns of de novo mutations in large cohorts. We apply our method to a dataset of 4,061 de novo missense mutations from published exome studies of trios with intellectual disability and developmental disorders (ID/DD) and successfully identify 15 genes with clustering mutations, including 12 genes for which mutations are known to cause neurodevelopmental disorders. For 11 out of these 12, NHI mutation mechanisms have been reported. Additionally, we identify three candidate ID/DD-associated genes of which two have an established role in neuronal processes. We further observe a higher intolerance to normal genetic variation of the identified genes compared to known genes for which mutations lead to HI. Finally, 3D modeling of these mutations on their protein structures shows that 81% of the observed mutations are unlikely to affect the overall structural integrity and that they therefore most likely act through a mechanism other than HI. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Genetic recombination as a major cause of mutagenesis in the human globin gene clusters.

    PubMed

    Borg, Joseph; Georgitsi, Marianthi; Aleporou-Marinou, Vassiliki; Kollia, Panagoula; Patrinos, George P

    2009-12-01

    Homologous recombination is a frequent phenomenon in multigene families and as such it occurs several times in both the alpha- and beta-like globin gene families. In numerous occasions, genetic recombination has been previously implicated as a major mechanism that drives mutagenesis in the human globin gene clusters, either in the form of unequal crossover or gene conversion. Unequal crossover results in the increase or decrease of the human globin gene copies, accompanied in the majority of cases with minor phenotypic consequences, while gene conversion contributes either to maintaining sequence homogeneity or generating sequence diversity. The role of genetic recombination, particularly gene conversion in the evolution of the human globin gene families has been discussed elsewhere. Here, we summarize our current knowledge and review existing experimental evidence outlining the role of genetic recombination in the mutagenic process in the human globin gene families.

  10. Gene microarray data analysis using parallel point-symmetry-based clustering.

    PubMed

    Sarkar, Anasua; Maulik, Ujjwal

    2015-01-01

    Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

  11. The genome of Streptococcus pneumoniae is organized in topology-reacting gene clusters.

    PubMed

    Ferrándiz, María-José; Martín-Galiano, Antonio J; Schvartzman, Jorge B; de la Campa, Adela G

    2010-06-01

    The transcriptional response of Streptococcus pneumoniae was examined after exposure to the GyrB-inhibitor novobiocin. Topoisomer distributions of an internal plasmid confirmed DNA relaxation and recovery of the native level of supercoiling at low novobiocin concentrations. This was due to the up-regulation of DNA gyrase and the down-regulation of topoisomerases I and IV. In addition, >13% of the genome exhibited relaxation-dependent transcription. The majority of the responsive genes (>68%) fell into 15 physical clusters (14.6-85.6 kb) that underwent coordinated regulation, independently of operon organization. These genomic clusters correlated with AT content and codon composition, showing the chromosome to be organized into topology-reacting gene clusters that respond to DNA supercoiling. In particular, down-regulated clusters were flanked by 11-40 kb AT-rich zones that might have a putative structural function. This is the first case where genes responding to changes in the level of supercoiling in a coordinated manner were found organized as functional clusters. Such an organization revealed DNA supercoiling as a general feature that controls gene expression superimposed on other kinds of more specific regulatory mechanisms.

  12. The genome of Streptococcus pneumoniae is organized in topology-reacting gene clusters

    PubMed Central

    Ferrándiz, María-José; Martín-Galiano, Antonio J.; Schvartzman, Jorge B.; de la Campa, Adela G.

    2010-01-01

    The transcriptional response of Streptococcus pneumoniae was examined after exposure to the GyrB-inhibitor novobiocin. Topoisomer distributions of an internal plasmid confirmed DNA relaxation and recovery of the native level of supercoiling at low novobiocin concentrations. This was due to the up-regulation of DNA gyrase and the down-regulation of topoisomerases I and IV. In addition, >13% of the genome exhibited relaxation-dependent transcription. The majority of the responsive genes (>68%) fell into 15 physical clusters (14.6–85.6 kb) that underwent coordinated regulation, independently of operon organization. These genomic clusters correlated with AT content and codon composition, showing the chromosome to be organized into topology-reacting gene clusters that respond to DNA supercoiling. In particular, down-regulated clusters were flanked by 11–40 kb AT-rich zones that might have a putative structural function. This is the first case where genes responding to changes in the level of supercoiling in a coordinated manner were found organized as functional clusters. Such an organization revealed DNA supercoiling as a general feature that controls gene expression superimposed on other kinds of more specific regulatory mechanisms. PMID:20176571

  13. Delineation of metabolic gene clusters in plant genomes by chromatin signatures

    PubMed Central

    Yu, Nan; Nützmann, Hans-Wilhelm; MacDonald, James T.; Moore, Ben; Field, Ben; Berriri, Souha; Trick, Martin; Rosser, Susan J.; Kumar, S. Vinod; Freemont, Paul S.; Osbourn, Anne

    2016-01-01

    Plants are a tremendous source of diverse chemicals, including many natural product-derived drugs. It has recently become apparent that the genes for the biosynthesis of numerous different types of plant natural products are organized as metabolic gene clusters, thereby unveiling a highly unusual form of plant genome architecture and offering novel avenues for discovery and exploitation of plant specialized metabolism. Here we show that these clustered pathways are characterized by distinct chromatin signatures of histone 3 lysine trimethylation (H3K27me3) and histone 2 variant H2A.Z, associated with cluster repression and activation, respectively, and represent discrete windows of co-regulation in the genome. We further demonstrate that knowledge of these chromatin signatures along with chromatin mutants can be used to mine genomes for cluster discovery. The roles of H3K27me3 and H2A.Z in repression and activation of single genes in plants are well known. However, our discovery of highly localized operon-like co-regulated regions of chromatin modification is unprecedented in plants. Our findings raise intriguing parallels with groups of physically linked multi-gene complexes in animals and with clustered pathways for specialized metabolism in filamentous fungi. PMID:26895889

  14. Streptococcus mutans serotype c tagatose 6-phosphate pathway gene cluster.

    PubMed Central

    Jagusztyn-Krynicka, E K; Hansen, J B; Crow, V L; Thomas, T D; Honeyman, A L; Curtiss, R

    1992-01-01

    DNA cloned into Escherichia coli K-12 from a serotype c strain of Streptococcus mutans encodes three enzyme activities for galactose utilization via the tagatose 6-phosphate pathway: galactose 6-phosphate isomerase, tagatose 6-phosphate kinase, and tagatose-1,6-bisphosphate aldolase. The genes coding for the tagatose 6-phosphate pathway were located on a 3.28-kb HindIII DNA fragment. Analysis of the tagatose proteins expressed by recombinant plasmids in minicells was used to determine the sizes of the various gene products. Mutagenesis of these plasmids with transposon Tn5 was used to determine the order of the tagatose genes. Tagatose 6-phosphate isomerase appears to be composed of 14- and 19-kDa subunits. The sizes of the kinase and aldolase were found to be 34 and 36 kDa, respectively. These values correspond to those reported previously for the tagatose pathway enzymes in Staphylococcus aureus and Lactococcus lactis. Images PMID:1328153

  15. Clustering of nitrogen fixation (nif) genes in Rhizobium meliloti.

    PubMed Central

    Corbin, D; Ditta, G; Helinski, D R

    1982-01-01

    A cloned 17.3-kilobase region of the Rhizobium meliloti genome with homology to the Klebsiella pneumoniae nitrogenase structural genes was studied. Limits on the extent of homology were determined. Transposon mutagenesis of this region of the genome verified that it contained functional nif genes, Some transposon insertions resulted in a defective symbiotic phenotype, whereas others had no noticeable effect on symbiotic competence. The relative position of insertions yielding these two phenotypic classes suggested that at least three distinct units of gene expression are present in this region. Hybridization of RNA from alfalfa root nodules and from vegetatively grown Rhizobium to this cloned DNA showed that at least 11.1 kilobases of the region was transcribed actively and that transcription was specific for the symbiotic state. Images PMID:6274844

  16. Contributions of vertical descent, horizontal transfer and gene loss to the distribution of mycotoxin biosynthetic gene clusters in Fusarium

    USDA-ARS?s Scientific Manuscript database

    The genus Fusarium produces a diverse array of mycotoxins and other secondary metabolites, but individual species contribute to only a small fraction of this diversity. Here, we employed comparative genomic and phylogenetic analyses to investigate the distribution and evolution of gene clusters resp...

  17. The Naphthalene Catabolic (nag) Genes of Polaromonas naphthalenivorans CJ2: Evolutionary Implications for Two Gene Clusters and Novel Regulatory Control

    PubMed Central

    Jeon, Che Ok; Park, Minjeong; Ro, Hyun-Su; Park, Woojun; Madsen, Eugene L.

    2006-01-01

    Polaromonas naphthalenivorans CJ2, found to be responsible for the degradation of naphthalene in situ at a coal tar waste-contaminated site (C.-O. Jeon et al., Proc. Natl. Acad. Sci. USA 100:13591-13596, 2003), is able to grow on mineral salts agar media with naphthalene as the sole carbon source. Beginning from a 484-bp nagAc-like region, we used a genome walking strategy to sequence genes encoding the entire naphthalene degradation pathway andadditional flanking regions. We found that the naphthalene catabolic genes in P. naphthalenivorans CJ2 were divided into one large and one small gene cluster, separated by an unknown distance. The large gene cluster (nagRAaGHAbAcAdBFCQEDJI′ORF1tnpA) is bounded by a LysR-type regulator (nagR). The small cluster (nagR2ORF2I"KL) is bounded by a MarR-type regulator (nagR2). The catabolic genes of P. naphthalenivorans CJ2 were homologous to many of those of Ralstonia U2, which uses the gentisate pathway to convert naphthalene to central metabolites. However, three open reading frames (nagY, nagM, and nagN), present in Ralstonia U2, were absent. Also, P. naphthalenivorans carries two copies of gentisate dioxygenase (nagI) with 77.4% DNA sequence identity to one another and 82% amino acid identity to their homologue in Ralstonia sp. strain U2. Investigation of the operons using reverse transcription PCR showed that each cluster was controlled independently by its respective promoter. Insertional inactivation and lacZ reporter assays showed that nagR2 is a negative regulator and that expression of the small cluster is not induced by naphthalene, salicylate, or gentisate. Association of two putative Azoarcus-related transposases with the large cluster and one Azoarcus-related putative salicylate 5-hydroxylase gene (ORF2) in the small cluster suggests that mobile genetic elements were likely involved in creating the novel arrangement of catabolic and regulatory genes in P. naphthalenivorans. PMID:16461653

  18. Apple contains receptor-like genes homologous to the Cladosporium fulvum resistance gene family of tomato with a cluster of genes cosegregating with Vf apple scab resistance.

    PubMed

    Vinatzer, B A; Patocchi, A; Gianfranceschi, L; Tartarini, S; Zhang, H B; Gessler, C; Sansavini, S

    2001-04-01

    Scab caused by the fungal pathogen Venturia inaequalis is the most common disease of cultivated apple (Malus x domestica Borkh.). Monogenic resistance against scab is found in some small-fruited wild Malus species and has been used in apple breeding for scab resistance. Vf resistance of Malus floribunda 821 is the most widely used scab resistance source. Because breeding a high-quality cultivar in perennial fruit trees takes dozens of years, cloning disease resistance genes and using them in the transformation of high-quality apple varieties would be advantageous. We report the identification of a cluster of receptor-like genes with homology to the Cladosporium fulvum (Cf) resistance gene family of tomato on bacterial artificial chromosome clones derived from the Vf scab resistance locus. Three members of the cluster were sequenced completely. Similar to the Cf gene family of tomato, the deduced amino acid sequences coded by these genes contain an extracellular leucine-rich repeat domain and a transmembrane domain. The transcription of three members of the cluster was determined by reverse transcriptionpolymerase chain reaction to be constitutive, and the transcription and translation start of one member was verified by 5' rapid amplification of cDNA ends. We discuss the parallels between Cf resistance of tomato and Vf resistance of apple and the possibility that one of the members of the gene cluster is the Vf gene. Cf homologs from other regions of the apple genome also were identified and are likely to present other scab resistance genes.

  19. Copy Number Variants in the Kallikrein Gene Cluster

    PubMed Central

    Lindahl, Pernilla; Säll, Torbjörn; Bjartell, Anders; Johansson, Anna M.; Lilja, Hans; Halldén, Christer

    2013-01-01

    The kallikrein gene family (KLK1-KLK15) is the largest contiguous group of protease genes within the human genome and is associated with both risk and outcome of cancer and other diseases. We searched for copy number variants in all KLK genes using quantitative PCR analysis and analysis of inheritance patterns of single nucleotide polymorphisms. Two deletions were identified: one 2235-bp deletion in KLK9 present in 1.2% of alleles, and one 3394-bp deletion in KLK15 present in 4.0% of alleles. Each deletion eliminated one complete exon and created out-of-frame coding that eliminated the catalytic triad of the resulting truncated gene product, which therefore likely is a non-functional protein. Deletion breakpoints identified by DNA sequencing located the KLK9 deletion breakpoint to a long interspersed element (LINE) repeated sequence, while the deletion in KLK15 is located in a single copy sequence. To search for an association between each deletion and risk of prostate cancer (PC), we analyzed a cohort of 667 biopsied men (266 PC cases and 401 men with no evidence of PC at biopsy) using short deletion-specific PCR assays. There was no association between evidence of PC in this cohort and the presence of either gene deletion. Haplotyping revealed a single origin of each deletion, with most recent common ancestor estimates of 3000-8000 and 6000-14 000 years for the deletions in KLK9 and KLK15, respectively. The presence of the deletions on the same haplotypes in 1000 Genomes data of both European and African populations indicate an early origin of both deletions. The old age in combination with homozygous presence of loss-of-function variants suggests that some kallikrein-related peptidases have non-essential functions. PMID:23894413

  20. The evolution and maintenance of Hox gene clusters in vertebrates and the teleost-specific genome duplication.

    PubMed

    Kuraku, Shigehiro; Meyer, Axel

    2009-01-01

    Hox genes are known to specify spatial identities along the anterior-posterior axis during embryogenesis. In vertebrates and most other deuterostomes, they are arranged in sets of uninterrupted clusters on chromosomes, and are in most cases expressed in a "colinear" fashion, in which genes closer to the 3-end of the Hox clusters are expressed earlier and more anteriorly and genes close to the 5-end of the clusters later and more posteriorly. In this review, we summarize the current understanding of how Hox gene clusters have been modified from basal lineages of deuterostomes to diverse taxa of vertebrates. Our parsimony reconstruction of Hox cluster architecture at various stages of vertebrate evolution highlights that the variation in Hox cluster structures among jawed vertebrates is mostly due to secondary lineage-specific gene losses and an additional genome duplication that occurred in the actinopterygian stem lineage, the teleost-specific genome duplication (TSGD).

  1. Sequencing, physical organization and kinetic expression of the patulin biosynthetic gene cluster from Penicillium expansum.

    PubMed

    Tannous, Joanna; El Khoury, Rhoda; Snini, Selma P; Lippi, Yannick; El Khoury, André; Atoui, Ali; Lteif, Roger; Oswald, Isabelle P; Puel, Olivier

    2014-10-17

    Patulin is a polyketide-derived mycotoxin produced by numerous filamentous fungi. Among them, Penicillium expansum is by far the most problematic species. This fungus is a destructive phytopathogen capable of growing on fruit, provoking the blue mold decay of apples and producing significant amounts of patulin. The biosynthetic pathway of this mycotoxin is chemically well-characterized, but its genetic bases remain largely unknown with only few characterized genes in less economic relevant species. The present study consisted of the identification and positional organization of the patulin gene cluster in P. expansum strain NRRL 35695. Several amplification reactions were performed with degenerative primers that were designed based on sequences from the orthologous genes available in other species. An improved genome Walking approach was used in order to sequence the remaining adjacent genes of the cluster. RACE-PCR was also carried out from mRNAs to determine the start and stop codons of the coding sequences. The patulin gene cluster in P. expansum consists of 15 genes in the following order: patH, patG, patF, patE, patD, patC, patB, patA, patM, patN, patO, patL, patI, patJ, and patK. These genes share 60-70% of identity with orthologous genes grouped differently, within a putative patulin cluster described in a non-producing strain of Aspergillus clavatus. The kinetics of patulin cluster genes expression was studied under patulin-permissive conditions (natural apple-based medium) and patulin-restrictive conditions (Eagle's minimal essential medium), and demonstrated a significant association between gene expression and patulin production. In conclusion, the sequence of the patulin cluster in P. expansum constitutes a key step for a better understanding of the mechanisms leading to patulin production in this fungus. It will allow the role of each gene to be elucidated, and help to define strategies to reduce patulin production in apple-based products.

  2. Organization of the human keratin type II gene cluster at 12q13

    SciTech Connect

    Yoon, S.J.; LeBlanc-Straceski, J.; Krauter, K.

    1994-12-01

    Keratin proteins constitute intermediate filaments and are the major differentiation products of mammalian epithelial cells. The epithelial keratins are classified into two groups, type I and type II, and one member of each group is expressed in a given epithelial cell differentiation stage. Mutations in type I and type II keratin genes have now been implicated in three different human genetic disorders, epidermolysis bullosa simplex, epidermolytic hyperkeratosis, and epidermolytic palmoplantar keratoderma. Members of the type I keratins are mapped to human chromosome 17, and the type II keratin genes are mapped to chromosome 12. To understand the organization of the type II keratin genes on chromosome 12, we isolated several yeast artificial chromosomes carrying these keratin genes and examined them in detail. We show that eight already known type II keratin genes are located in a cluster at 12q13, and their relative organization reflects their evolutionary relationship. We also determined that a type I keratin gene, KRT8, is located next to its partner, KRT18, in this cluster. Careful examination of the cluster also revealed that there may be a number of additional keratin genes at this locus that have not been described previously. 41 refs., 3 figs., 1 tab.

  3. Optimization of gene set annotations via entropy minimization over variable clusters (EMVC).

    PubMed

    Frost, H Robert; Moore, Jason H

    2014-06-15

    Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets. We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. http://cran.r-project.org/web/packages/EMVC/index.html. © The Author 2014. Published by Oxford University Press.

  4. Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

    PubMed Central

    2013-01-01

    Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571

  5. Exome-based linkage disequilibrium maps of individual genes: functional clustering and relationship to disease.

    PubMed

    Gibson, Jane; Tapper, William; Ennis, Sarah; Collins, Andrew

    2013-02-01

    Exome sequencing identifies thousands of DNA variants and a proportion of these are involved in disease. Genotypes derived from exome sequences provide particularly high-resolution coverage enabling study of the linkage disequilibrium structure of individual genes. The extent and strength of linkage disequilibrium reflects the combined influences of mutation, recombination, selection and population history. By constructing linkage disequilibrium maps of individual genes, we show that genes containing OMIM-listed disease variants are significantly under-represented amongst genes with complete or very strong linkage disequilibrium (P = 0.0004). In contrast, genes with disease variants are significantly over-represented amongst genes with levels of linkage disequilibrium close to the average for genes not known to contain disease variants (P = 0.0038). Functional clustering reveals, amongst genes with particularly strong linkage disequilibrium, significant enrichment of essential biological functions (e.g. phosphorylation, cell division, cellular transport and metabolic processes). Strong linkage disequilibrium, corresponding to reduced haplotype diversity, may reflect selection in utero against deleterious mutations which have profound impact on the function of essential genes. Genes with very weak linkage disequilibrium show enrichment of functions requiring greater allelic diversity (e.g. sensory perception and immune response). This category is not enriched for genes containing disease variation. In contrast, there is significant enrichment of genes containing disease variants amongst genes with more average levels of linkage disequilibrium. Mutations in these genes may less likely lead to in utero lethality and be subject to less intense selection.

  6. A Nomadic Subtelomeric Disease Resistance Gene Cluster in Common Bean1[W

    PubMed Central

    David, Perrine; Chen, Nicolas W.G.; Pedrosa-Harand, Andrea; Thareau, Vincent; Sévignac, Mireille; Cannon, Steven B.; Debouck, Daniel; Langin, Thierry; Geffroy, Valérie

    2009-01-01

    The B4 resistance (R) gene cluster is one of the largest clusters known in common bean (Phaseolus vulgaris [Pv]). It is located in a peculiar genomic environment in the subtelomeric region of the short arm of chromosome 4, adjacent to two heterochromatic blocks (knobs). We sequenced 650 kb spanning this locus and annotated 97 genes, 26 of which correspond to Coiled-Coil-Nucleotide-Binding-Site-Leucine-Rich-Repeat (CNL). Conserved microsynteny was observed between the Pv B4 locus and corresponding regions of Medicago truncatula and Lotus japonicus in chromosomes Mt6 and Lj2, respectively. The notable exception was the CNL sequences, which were completely absent in these regions. The origin of the Pv B4-CNL sequences was investigated through phylogenetic analysis, which reveals that, in the Pv genome, paralogous CNL genes are shared among nonhomologous chromosomes (4 and 11). Together, our results suggest that Pv B4-CNL was derived from CNL sequences from another cluster, the Co-2 cluster, through an ectopic recombination event. Integration of the soybean (Glycine max) genome data enables us to date more precisely this event and also to infer that a single CNL moved from the Co-2 to the B4 cluster. Moreover, we identified a new 528-bp satellite repeat, referred to as khipu, specific to the Phaseolus genus, present both between B4-CNL sequences and in the two knobs identified at the B4 R gene cluster. The khipu repeat is present on most chromosomal termini, indicating the existence of frequent ectopic recombination events in Pv subtelomeric regions. Our results highlight the importance of ectopic recombination in R gene evolution. PMID:19776165

  7. Teaching Gene Technology in an Outreach Lab: Students' Assigned Cognitive Load Clusters and the Clusters' Relationships to Learner Characteristics, Laboratory Variables, and Cognitive Achievement

    ERIC Educational Resources Information Center

    Scharfenberg, Franz-Josef; Bogner, Franz X.

    2013-01-01

    This study classified students into different cognitive load (CL) groups by means of cluster analysis based on their experienced CL in a gene technology outreach lab which has instructionally been designed with regard to CL theory. The relationships of the identified student CL clusters to learner characteristics, laboratory variables, and…

  8. Teaching Gene Technology in an Outreach Lab: Students' Assigned Cognitive Load Clusters and the Clusters' Relationships to Learner Characteristics, Laboratory Variables, and Cognitive Achievement

    ERIC Educational Resources Information Center

    Scharfenberg, Franz-Josef; Bogner, Franz X.

    2013-01-01

    This study classified students into different cognitive load (CL) groups by means of cluster analysis based on their experienced CL in a gene technology outreach lab which has instructionally been designed with regard to CL theory. The relationships of the identified student CL clusters to learner characteristics, laboratory variables, and…

  9. A carotenogenic gene cluster from Brevibacterium linens with novel lycopene cyclase genes involved in the synthesis of aromatic carotenoids.

    PubMed

    Krubasik, P; Sandmann, G

    2000-04-01

    The carotenogenic (crt) gene cluster from Brevibacterium linens, a member of the commercially important group of coryneform bacteria, was cloned and identified. An expression library of B. linens genes was constructed and a fragment of the crt cluster was obtained by functional complementation of a colourless B. flavum mutant, screening transformed cells for production of a yellow pigment. Subsequent screening of a cosmid library resulted in the cloning of the whole crt cluster from B. linens. All genes necessary for the synthesis of the aromatic carotenoid isorenieratene were identified on the basis of sequence homologies. In addition a novel type of lycopene cyclase was identified by complementation of a lycopene-accumulating B. flavum mutant. Two genes, named crt Yc and crt Yd, which code for polypeptides of 125 and 107 amino acids, respectively, are necessary to convert lycopene to beta-carotene. The amino acid sequences of these polypeptides show no similarity to any of the known lycopene cyclases. This is the first example of a carotenoid biosynthetic conversion in which two different gene products are involved, probably forming a heterodimer.

  10. Identification of a gene cluster associated with triclosan catabolism.

    PubMed

    Kagle, Jeanne M; Paxson, Clayton; Johnstone, Precious; Hay, Anthony G

    2015-06-01

    Aerobic degradation of bis-aryl ethers like the antimicrobial triclosan typically proceeds through oxygenase-dependent catabolic pathways. Although several studies have reported on bacteria capable of degrading triclosan aerobically, there are no reports describing the genes responsible for this process. In this study, a gene encoding the large subunit of a putative triclosan oxygenase, designated tcsA was identified in a triclosan-degrading fosmid clone from a DNA library of Sphingomonas sp. RD1. Consistent with tcsA's similarity to two-part dioxygenases, a putative FMN-dependent ferredoxin reductase, designated tcsB was found immediately downstream of tcsA. Both tcsAB were found in the midst of a putative chlorocatechol degradation operon. We show that RD1 produces hydroxytriclosan and chlorocatechols during triclosan degradation and that tcsA is induced by triclosan. This is the first study to report on the genetics of triclosan degradation.

  11. Isolation of Hox Cluster Genes from Insects Reveals an Accelerated Sequence Evolution Rate

    PubMed Central

    Hadrys, Heike; Simon, Sabrina; Kaune, Barbara; Schmitt, Oliver; Schöner, Anja; Jakob, Wolfgang; Schierwater, Bernd

    2012-01-01

    Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda) that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera). We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx) from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution. PMID:22685537

  12. A Large Cluster of Highly Expressed Genes Is Dispensable for Growth and Development in Aspergillus Nidulans

    PubMed Central

    Aramayo, R.; Adams, T. H.; Timberlake, W. E.

    1989-01-01

    We investigated the functions of the highly expressed, sporulation-specific SpoC1 genes of Aspergillus nidulans by deleting the entire 38-kb SpoC1 gene cluster. The resultant mutant strain did not differ from the wild type in (1) growth rate, (2) morphology of specialized reproductive structures formed during completion of the asexual or sexual life cycles, (3) sporulation efficiency, (4) spore viability or (5) spore resistance to environmental stress. Thus, deletion of the SpoC1 gene cluster, representing 0.15% of the A. nidulans genome, had no readily detectable phenotypic effects. Implications of this result are discussed in the context of major alterations in gene expression that occur during A. nidulans development. PMID:2471671

  13. Engineering a regulatory region of jadomycin gene cluster to improve jadomycin B production in Streptomyces venezuelae.

    PubMed

    Zheng, Jian-Ting; Wang, Sheng-Lan; Yang, Ke-Qian

    2007-09-01

    Streptomyces venezuelae ISP5230 produces a group of jadomycin congeners with cytotoxic activities. To improve jadomycin fermentation process, a genetic engineering strategy was designed to replace a 3.4-kb regulatory region of jad gene cluster that contains four regulatory genes (3' end 272 bp of jadW2, jadW3, jadR2, and jadR1) and the native promoter upstream of jadJ (P(J)) with the ermEp* promoter sequence so that ermEp* drives the expression of the jadomycin biosynthetic genes from jadJ in the engineered strain. As expected, the mutant strain produced jadomycin B without ethanol treatment, and the yield increased to about twofold that of the stressed wild-type. These results indicated that manipulation of the regulation of a biosynthetic gene cluster is an effective strategy to increase product yield.

  14. Genetic and Transcriptional Analyses of the Flagellar Gene Cluster in Actinoplanes missouriensis.

    PubMed

    Jang, Moon-Sun; Mouri, Yoshihiro; Uchida, Kaoru; Aizawa, Shin-Ichi; Hayakawa, Masayuki; Fujita, Nobuyuki; Tezuka, Takeaki; Ohnishi, Yasuo

    2016-08-15

    Actinoplanes missouriensis, a Gram-positive and soil-inhabiting bacterium, is a member of the rare actinomycetes. The filamentous cells produce sporangia, which contain hundreds of flagellated spores that can swim rapidly for a short period of time until they find niches for germination. These swimming cells are called zoospores, and the mechanism of this unique temporal flagellation has not been elucidated. Here, we report all of the flagellar genes in the bacterial genome and their expected function and contribution for flagellar morphogenesis. We identified a large flagellar gene cluster composed of 33 genes that encode the majority of proteins essential for assembling the functional flagella of Gram-positive bacteria. One noted exception to the cluster was the location of the fliQ gene, which was separated from the cluster. We examined the involvement of four genes in flagellar biosynthesis by gene disruption, fliQ, fliC, fliK, and lytA Furthermore, we performed a transcriptional analysis of the flagellar genes using RNA samples prepared from A. missouriensis grown on a sporangium-producing agar medium for 1, 3, 6, and 40 days. We demonstrated that the transcription of the flagellar genes was activated in conjunction with sporangium formation. Eleven transcriptional start points of the flagellar genes were determined using the rapid amplification of cDNA 5' ends (RACE) procedure, which revealed the highly conserved promoter sequence CTCA(N15-17)GCCGAA. This result suggests that a sigma factor is responsible for the transcription of all flagellar genes and that the flagellar structure assembles simultaneously. The biology of a zoospore is very interesting from the viewpoint of morphogenesis, survival strategy, and evolution. Here, we analyzed flagellar genes in A. missouriensis, which produces sporangia containing hundreds of flagellated spores each. Zoospores released from the sporangia swim for a short time before germination occurs. We identified a large

  15. Genetic and Transcriptional Analyses of the Flagellar Gene Cluster in Actinoplanes missouriensis

    PubMed Central

    Jang, Moon-Sun; Mouri, Yoshihiro; Uchida, Kaoru; Aizawa, Shin-Ichi; Hayakawa, Masayuki; Fujita, Nobuyuki; Tezuka, Takeaki

    2016-01-01

    ABSTRACT Actinoplanes missouriensis, a Gram-positive and soil-inhabiting bacterium, is a member of the rare actinomycetes. The filamentous cells produce sporangia, which contain hundreds of flagellated spores that can swim rapidly for a short period of time until they find niches for germination. These swimming cells are called zoospores, and the mechanism of this unique temporal flagellation has not been elucidated. Here, we report all of the flagellar genes in the bacterial genome and their expected function and contribution for flagellar morphogenesis. We identified a large flagellar gene cluster composed of 33 genes that encode the majority of proteins essential for assembling the functional flagella of Gram-positive bacteria. One noted exception to the cluster was the location of the fliQ gene, which was separated from the cluster. We examined the involvement of four genes in flagellar biosynthesis by gene disruption, fliQ, fliC, fliK, and lytA. Furthermore, we performed a transcriptional analysis of the flagellar genes using RNA samples prepared from A. missouriensis grown on a sporangium-producing agar medium for 1, 3, 6, and 40 days. We demonstrated that the transcription of the flagellar genes was activated in conjunction with sporangium formation. Eleven transcriptional start points of the flagellar genes were determined using the rapid amplification of cDNA 5′ ends (RACE) procedure, which revealed the highly conserved promoter sequence CTCA(N15–17)GCCGAA. This result suggests that a sigma factor is responsible for the transcription of all flagellar genes and that the flagellar structure assembles simultaneously. IMPORTANCE The biology of a zoospore is very interesting from the viewpoint of morphogenesis, survival strategy, and evolution. Here, we analyzed flagellar genes in A. missouriensis, which produces sporangia containing hundreds of flagellated spores each. Zoospores released from the sporangia swim for a short time before germination occurs

  16. A mixture model with random-effects components for clustering correlated gene-expression profiles.

    PubMed

    Ng, S K; McLachlan, G J; Wang, K; Ben-Tovim Jones, L; Ng, S-W

    2006-07-15

    The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are

  17. Efficient Mining of Discriminative Co-clusters from Gene Expression Data.

    PubMed

    Odibat, Omar; Reddy, Chandan K

    2014-12-01

    Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is two-fold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.

  18. Nearest hyperplane distance neighbor clustering algorithm applied to gene co-expression analysis in Alzheimer's disease.

    PubMed

    Pasluosta, Cristian F; Dua, Prerna; Lukiw, Walter J

    2011-01-01

    Microarray analysis can contribute considerably to the understanding of biologically significant cellular mechanisms that yield novel information regarding co-regulated sets of gene patterns. Clustering is one of the most popular tools for analyzing DNA microarray data. In this paper, we present an unsupervised clustering algorithm based on the K-local hyperplane distance nearest-neighbor classifier (HKNN). We adapted the well-known nearest neighbor clustering algorithm for use with hyperplane distance. The result is a simple and computationally inexpensive unsupervised clustering algorithm that can be applied to high-dimensional data. It has been reported that the NFkB1 gene is progressively over-expressed in moderate-to-severe Alzheimer's disease (AD) cases, and that the NF-kB complex plays a key role in neuroinflammatory responses in AD pathogenesis. In this study, we apply the proposed clustering algorithm to identify co-expression patterns with the NFkB1 in gene expression data from hippocampal tissue samples. Finally, we validate our experiments with biomedical literature search.

  19. Calcitonin gene-related peptide antagonism and cluster headache: an emerging new treatment.

    PubMed

    Ashina, Håkan; Newman, Lawrence; Ashina, Sait

    2017-08-30

    Calcitonin gene-related peptide (CGRP) is a key signaling molecule involved in migraine pathophysiology. Efficacy of CGRP monoclonal antibodies and antagonists in migraine treatment has fueled an increasing interest in the prospect of treating cluster headache (CH) with CGRP antagonism. The exact role of CGRP and its mechanism of action in CH have not been fully clarified. A search for original studies and randomized controlled trials (RCTs) published in English was performed in PubMed and in ClinicalTrials.gov . The search term used was "cluster headache and calcitonin gene related peptide" and "primary headaches and calcitonin gene related peptide." Reference lists of identified articles were also searched for additional relevant papers. Human experimental studies have reported elevated plasma CGRP levels during both spontaneous and glyceryl trinitrate-induced cluster attacks. CGRP may play an important role in cluster headache pathophysiology. More refined human studies are warranted with regard to assay validation and using larger sample sizes. The results from RCTs may reveal the therapeutic potential of CGRP monoclonal antibodies and antagonists for cluster headache treatment.

  20. A novel harmony search-K means hybrid algorithm for clustering gene expression data.

    PubMed

    Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu

    2013-01-01

    Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.

  1. Characterization of a Major Cluster of nif, fix, and Associated Genes in a Sugarcane Endophyte, Acetobacter diazotrophicus

    PubMed Central

    Lee, Sunhee; Reth, Alexander; Meletzus, Dietmar; Sevilla, Myrna; Kennedy, Christina

    2000-01-01

    A major 30.5-kb cluster of nif and associated genes of Acetobacter diazotrophicus (syn. Gluconacetobacter diazotrophicus), a nitrogen-fixing endophyte of sugarcane, was sequenced and analyzed. This cluster represents the largest assembly of contiguous nif-fix and associated genes so far characterized in any diazotrophic bacterial species. Northern blots and promoter sequence analysis indicated that the genes are organized into eight transcriptional units. The overall arrangement of genes is most like that of the nif-fix cluster in Azospirillum brasilense, while the individual gene products are more similar to those in species of Rhizobiaceae or in Rhodobacter capsulatus. PMID:11092875

  2. Organization of the biosynthetic gene cluster for the macrolide antibiotic spiramycin in Streptomyces ambofaciens.

    PubMed

    Karray, Fatma; Darbon, Emmanuelle; Oestreicher, Nathalie; Dominguez, Hélène; Tuphile, Karine; Gagnat, Josette; Blondelet-Rouault, Marie-Hélène; Gerbaud, Claude; Pernodet, Jean-Luc

    2007-12-01

    Spiramycin, a 16-membered macrolide antibiotic used in human medicine, is produced by Streptomyces ambofaciens; it comprises a polyketide lactone, platenolide, to which three deoxyhexose sugars are attached. In order to characterize the gene cluster governing the biosynthesis of spiramycin, several overlapping cosmids were isolated from an S. ambofaciens gene library, by hybridization with various probes (spiramycin resistance or biosynthetic genes, tylosin biosynthetic genes), and the sequences of their inserts were determined. Sequence analysis showed that the spiramycin biosynthetic gene cluster spanned a region of over 85 kb of contiguous DNA. In addition to the five previously described genes that encode the type I polyketide synthase involved in platenolide biosynthesis, 45 other genes have been identified. It was possible to propose a function for most of the inferred proteins in spiramycin biosynthesis, in its regulation, in resistance to the produced antibiotic or in the provision of extender units for the polyketide synthase. Two of these genes, predicted to be involved in deoxysugar biosynthesis, were inactivated by gene replacement, and the resulting mutants were unable to produce spiramycin, thus confirming their involvement in spiramycin biosynthesis. This work reveals the main features of spiramycin biosynthesis and constitutes a first step towards a detailed molecular analysis of the production of this medically important antibiotic.

  3. Vertebrate GAGA factor associated insulator elements demarcate homeotic genes in the HOX clusters.

    PubMed

    Srivastava, Surabhi; Puri, Deepika; Garapati, Hita Sony; Dhawan, Jyotsna; Mishra, Rakesh K

    2013-04-22

    Hox genes impart segment identity to body structures along the anterior-posterior axis and are crucial for the proper development of all organisms. Multiple regulatory elements, best defined in Drosophila melanogaster, ensure that Hox expression patterns follow the spatial and temporal colinearity reflected in their tight genomic organization. However, the precise mechanisms that regulate colinear patterns of Hox gene expression remain unclear, especially in higher vertebrates where it is not fully determined how the distinct activation domains of the tightly clustered Hox genes are defined independently of each other. Here, we report the identification of a large number of novel cis-elements at mammalian Hox clusters that can help in regulating their precise expression pattern. We have identified DNA elements at all four murine Hox clusters that show poor association with histone H3 in chromatin immunoprecipitation (ChIP)-chip tiling arrays. The majority of these elements lie in the intergenic regions segregating adjacent Hox genes; we demonstrate that they possess efficient enhancer-blocking activity in mammalian cells. Further, we find that these histone-free intergenic regions bear GA repeat motifs and associate with the vertebrate homolog of the GAGA binding boundary factor. This suggests that they can act as GAGA factor-dependent chromatin boundaries that create independent domains, insulating each Hox gene from the influence of neighboring regulatory elements. Our results reveal a large number of potential regulatory elements throughout the murine Hox clusters. We further demarcate the precise location of several novel cis-elements bearing chromatin boundary activity that appear to segregate successive Hox genes. This reflects a pattern reminiscent of the organization of homeotic genes in Drosophila, where such regulatory elements have been characterized. Our findings thus provide new insights into the regulatory processes and evolutionarily conserved

  4. Detection of a Gene Cluster That Is Dispensable for Human Herpesvirus 6 Replication and Latency

    PubMed Central

    Kondo, Kazuhiro; Nozaki, Hideo; Shimada, Kazuya; Yamanishi, Koichi

    2003-01-01

    The U3-U7 gene cluster of human herpesvirus 6 (HHV-6) was replaced with an enhanced green fluorescent protein-puromycin gene cassette containing the cytomegalovirus major immediate-early promoter. Neither viral replication in T cells nor latency and reactivation in macrophages was impaired. During HHV-6 latency, the cytomegalovirus promoter used the transcription start sites employed in cytomegalovirus latency. PMID:12970461

  5. A gene cluster for the synthesis of serotype g-specific polysaccharide antigen in Aggregatibacter actinomycetemcomitans.

    PubMed

    Tsuzukibashi, Osamu; Saito, Masanori; Kobayashi, Taira; Umezawa, Koji; Nagahama, Fumio; Hiroi, Takachika; Hirasawa, Masatomo; Takada, Kazuko

    2014-04-01

    Aggregatibacter actinomycetemcomitans is an important pathogen related to aggressively progressive periodontal breakdown in adolescents and adults. The species can be divided into six serotypes (a-f) according to their surface carbohydrate antigens. Recently, a new serotype g of A. actinomycetemcomitans was proposed. The aim of the present study was to sequence the gene cluster associated with the biosynthesis of the serotype g-specific polysaccharide antigen and develop serotype-specific primers for PCR assay to identify serotype g strains of A. actinomycetemcomitans. The serotype-specific polysaccharide (SSPS) gene cluster of the NUM-Aa 4039 strain contained 21 genes in 21,842-bp nucleotides. The similarity of the SSPS gene cluster sequence was 96.7 % compared with that of the serotype e strain. Seventeen serotype g genes showed more than 90 % homology both in nucleotide and amino acids to the serotype e strain. Three additional genes with 1,579 bp in NUM-Aa 4039 were inserted into the corresponding ORF13 of the serotype e strain. The serotype g-specific primers were designed from the insertion region of NUM-Aa 4039. Serotypes of the a-f strains were not amplified by serotype-specific g primers; only NUM-Aa 4039 showed an amplicon band. The NUM-Aa 4039 strain was three genes in the SSPS gene cluster different from those of serotype e strain. The specific primers derived from these different regions are useful for identification and distribution of serotype g strain among A. actinomycetemcomitans from clinical samples.

  6. Cloning of ascidian homeobox genes provides evidence for a primordial chordate cluster.

    PubMed

    Di Gregorio, A; Spagnuolo, A; Ristoratore, F; Pischetola, M; Aniello, F; Branno, M; Cariello, L; Di Lauro, R

    1995-04-24

    In order to isolate genes important in controlling embryonic development in Tunicates, a genomic library from the ascidian Ciona intestinalis was screened with a degenerate oligodeoxyribonucleotide encoding the third helix of Antennapedia-type homeoboxes. Fourteen C. intestinalis homeobox genes, corresponding to several classes of homeodomains, have been identified. Five of the isolated homeoboxes show their highest homology to members of the Vertebrate HOX clusters. mRNAs for two of the isolated homeoboxes are present in unfertilized C. intestinalis eggs.

  7. Identification, isolation, and analysis of a gene cluster involved in iron acquisition by Pseudomonas mendocina ymp.

    PubMed

    Awaya, Jonathan D; Dubois, Jennifer L

    2008-06-01

    Microbial acquisition of iron from natural sources in aerobic environments is a little-studied process that may lead to mineral instability and trace metal mobilization. Pseudomonas mendocina ymp was isolated from the Yucca Mountain Site for long-term nuclear waste storage. Its ability to solubilize a variety of Fe-containing minerals under aerobic conditions has been previously investigated but its molecular and genetic potential remained uncharacterized. Here, we have shown that the organism produces a hydroxamate and not a catecholate-based siderophore that is synthesized via non-ribosomal peptide synthetases. Gene clustering patterns observed in other Pseudomonads suggested that hybridizing multiple probes to the same library could allow for the identification of one or more clusters of syntenic siderophore-associated genes. Using this approach, two independent clusters were identified. An unfinished draft genome sequence of P. mendocina ymp indicated that these mapped to two independent contigs. The sequenced clusters were investigated informatically and shown to contain respectively a potentially complete set of genes responsible for siderophore biosynthesis, uptake, and regulation, and an incomplete set of genes with low individual homology to siderophore-associated genes. A mutation in the cluster's pvdA homolog (pmhA) resulted in a siderophore-null phenotype, which could be reversed by complementation. The organism likely produces one siderophore with possibly different isoforms and a peptide backbone structure containing seven residues (predicted sequence: Acyl-Asp-Dab-Ser-fOHOrn-Ser-fOHorn). A similar approach could be applied for discovery of Fe- and siderophore-associated genes in unsequenced or poorly annotated organisms.

  8. Clusters of point mutations are found exclusively around rearranged antibody variable genes.

    PubMed

    Gearhart, P J; Bogenhagen, D F

    1983-06-01

    We have examined the nucleotide sequences of a series of murine antibody genes derived from one kappa light chain gene in order to gain insight into the mechanism that specifically mutates variable genes. Six rearranged VK167 genes from hybridoma and myeloma cells were cloned from bacteriophage lambda libraries. The sequences were compared to the germ-line sequence of the VK167 gene, the JK genes, and the CK gene to identify sites of mutation. Four of six rearranged genes had extensive mutation which occurred exclusively in a 1-kilobase region of DNA centered around the V-J gene. No mutations were found at more distant sites in the intervening sequence or in the constant gene. The frequency of mutation was approximately 0.5% (32 mutations per 6,749 base pairs). Mutations were mostly due to nucleotide substitutions with no preference for transitions or transversions. The location of mutations around each gene indicates that they occur in clusters at random sites. The observation of mutations in the intervening sequence downstream from the JK5 gene rules out models for the mechanism of mutagenesis that rely solely on gene conversion or recombination. The distribution and high frequency of mutations are most easily explained by a mechanism of error-prone repair that occurs during several cycles of cell division.

  9. The Magea gene cluster regulates male germ cell apoptosis without affecting the fertility in mice

    PubMed Central

    Hou, Siyuan; Xian, Li; Shi, Peiliang; Li, Chaojun; Lin, Zhaoyu; Gao, Xiang

    2016-01-01

    While apoptosis is essential for male germ cell development, improper activation of apoptosis in the testis can affect spermatogenesis and cause reproduction defects. Members of the MAGE-A (melanoma antigen family A) gene family are frequently clustered in mammalian genomes and are exclusively expressed in the testes of normal animals but abnormally activated in a wide variety of cancers. We investigated the potential roles of these genes in spermatogenesis by generating a mouse model with a 210-kb genomic deletion encompassing six members of the Magea gene cluster (Magea1, Magea2, Magea3, Magea5, Magea6 and Magea8). Male mice carrying the deletion displayed smaller testes from 2 months old with a marked increase in apoptotic germ cells in the first wave of spermatogenesis. Furthermore, we found that Magea genes prevented stress-induced spermatogenic apoptosis after N-ethyl-N-nitrosourea (ENU) treatment during the adult stage. Mechanistically, deletion of the Magea gene cluster resulted in a dramatic increase in apoptotic germ cells, predominantly spermatocytes, with activation of p53 and induction of Bax in the testes. These observations demonstrate that the Magea genes are crucial in maintaining normal testicular size and protecting germ cells from excessive apoptosis under genotoxic stress. PMID:27226137

  10. Diversity and depth-specific distribution of SAR11 cluster rRNA genes from marine planktonic bacteria

    SciTech Connect

    Field, K.G.; Gordon, D.; Wright, T.

    1997-01-01

    Small-subunit (SSU) ribosomal DNA (rDNA) gene clusters are phylogenetically related sets of SSU rRNA genes, commonly encountered in genes amplified from natural populations. Genetic variability in gene clusters could result form artifacts (polymerase error or PCR chimera formation), microevolution (variation among rrn copies within strains), or macroevolution (genetic divergence correlated with long-term evolutionary divergence). To better understand gene clusters, this study assessed genetic diversity and distribution of a single environmental SSU rDNA gene cluster, the SAR11 cluster. SAR11 cluster genes, from an uncultured group of the {alpha} subclass of the class Proteobacteria, have been recovered from coastal and midoceanic waters of the North Atlantic and Pacific. We cloned and bidirectionally sequenced 23 new SAR11 cluster 16S rRNA genes, from 80 and 250 m im the Sargasso Sea and from surface coastal waters of the Atlantic and Pacific, and analyzed them with previously published sequences. Two SAR11 genes were obviously PCR chimeras, but the biological (nonchimeric) origins of most subgroups within the cluster were confirmed by independent recovery from separate gene libraries. Using group-specific oligonucleotide probes, we analyzed depth profiles of nucleic acids, targeting both amplified rDNAs and bulk RNAs. Two subgroups within the SAR11 cluster showed different highly depth-specific distributions. We conclude that some of the genetic diversity within the SAR11 gene cluster represents macroevolutionary divergence correlated with niche specialization. Furthermore, we demonstrate the utility for marine microbial ecology of oligonucleotide probes based on gene sequences amplified from natural populations and show that a detailed knowledge of sequence variability may be needed to effectively design these probes. 48 refs., 7 figs., 3 tabs.

  11. Isolation of Sorangium cellulosum carrying epothilone gene clusters.

    PubMed

    Hyun, Hyesook; Chung, Jinwoo; Kim, Jihoon; Lee, Jong Suk; Kwon, Byoung-Mog; Son, Kwang-Hee; Cho, Kyungyun

    2008-08-01

    Epothilone and its analogs are a potent new class of anticancer compounds produced by myxobacteria. Thus, in an effort to identify new myxobacterial strains producing epothilone and its analogs, cellulose-degrading myxobacteria were isolated from Korean soils, and 13 strains carrying epothilone biosynthetic gene homologs were screened using a polymerase chain reaction. A migration assay revealed that Sorangium cellulosum KYC3013, 3016, 3017, and 3018 all produced microtubule-stabilizing compounds, and an LCMS/ MS analysis showed that S. cellulosum KYC3013 synthesized epothilone A.

  12. Identification of the viridicatumtoxin and griseofulvin gene clusters from Penicillium aethiopicum.

    PubMed

    Chooi, Yit-Heng; Cacho, Ralph; Tang, Yi

    2010-05-28

    Penicillium aethiopicum produces two structurally interesting and biologically active polyketides: the tetracycline-like viridicatumtoxin 1 and the classic antifungal agent griseofulvin 2. Here, we report the concurrent discovery of the two corresponding biosynthetic gene clusters (vrt and gsf) by 454 shotgun sequencing. Gene deletions confirmed that two nonreducing PKSs (NRPKSs), vrtA and gsfA, are required for the biosynthesis of 1 and 2, respectively. Both PKSs share similar domain architectures and lack a C-terminal thioesterase domain. We identified gsfI as the chlorinase involved in the biosynthesis of 2, because deletion of gsfI resulted in the accumulation of decholorogriseofulvin 3. Comparative analysis with the P. chrysogenum genome revealed that both clusters are embedded within conserved syntenic regions of P. aethiopicum chromosomes. Discovery of the vrt and gsf clusters provided the basis for genetic and biochemical studies of the pathways.

  13. TreeParser-Aided Klee Diagrams Display Taxonomic Clusters in DNA Barcode and Nuclear Gene Datasets

    PubMed Central

    Stoeckle, Mark Y.; Coffran, Cameron

    2013-01-01

    Indicator vector analysis of a nucleotide sequence alignment generates a compact heat map, called a Klee diagram, with potential insight into clustering patterns in evolution. However, so far this approach has examined only mitochondrial cytochrome c oxidase I (COI) DNA barcode sequences. To further explore, we developed TreeParser, a freely-available web-based program that sorts a sequence alignment according to a phylogenetic tree generated from the dataset. We applied TreeParser to nuclear gene and COI barcode alignments from birds and butterflies. Distinct blocks in the resulting Klee diagrams corresponded to species and higher-level taxonomic divisions in both groups, and this enabled graphic comparison of phylogenetic information in nuclear and mitochondrial genes. Our results demonstrate TreeParser-aided Klee diagrams objectively display taxonomic clusters in nucleotide sequence alignments. This approach may help establish taxonomy in poorly studied groups and investigate higher-level clustering which appears widespread but not well understood. PMID:24022383

  14. GenClust: a genetic algorithm for clustering gene expression data.

    PubMed

    Di Gesú, Vito; Giancarlo, Raffaele; Lo Bosco, Giosué; Raimondi, Alessandra; Scaturro, Davide

    2005-12-07

    Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

  15. GenClust: A genetic algorithm for clustering gene expression data

    PubMed Central

    Di Gesú, Vito; Giancarlo, Raffaele; Lo Bosco, Giosué; Raimondi, Alessandra; Scaturro, Davide

    2005-01-01

    Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology. PMID:16336639

  16. Cloning large gene clusters from E. coli using in vitro single-strand overlapping annealing.

    PubMed

    Wang, Rui-Yan; Shi, Zhen-Yu; Chen, Jin-Chun; Chen, Guo-Qiang

    2012-07-20

    Despite recent advances in genomic sequencing and DNA chemical synthesis, construction of large gene clusters containing DNA fragments is still a difficult and expensive task. To tackle this problem, we developed a gene cluster extraction method based on in vitro single-strand overlapping annealing (SSOA). It starts with digesting the target gene cluster in an existing genome, followed by recovering digested chromosome fragments. Subsequently, the single-strand DNA overhangs formed from the digestion process would be specifically annealed and covalently joined together with a circular and a linear vector, respectively. The SSOA method was successfully applied to clone a 18 kb DNA fragment encoding NADH:ubiquinone oxidoreductase. Genomic DNA fragments of different sizes including 11.86, 18.33, 28.67, 34.56, and 55.99 kb were used to test the cloning efficiency. Combined with genetic information from KEGG and the KEIO strain collection, this method will be useful to clone any specific region of an E. coli genome at sizes less than ~28 kb. The method provides a cost-effective way for genome assembly, alternative to chemically synthesized gene clusters.

  17. Resolving misassembled cattle immune gene clusters with hierarchical, long read sequencing

    USDA-ARS?s Scientific Manuscript database

    Animal health is a critical component of productivity; however, current genomic selection genotyping tools have a paucity of genetic markers within key immune gene clusters (IGC) involved in the cattle innate and adaptive immune systems. With diseases such as Bovine Tuberculosis and Johne’s disease ...

  18. Characterization of the Tunicamycin Gene Cluster Unveiling Unique Steps Involved in its Biosynthesis

    USDA-ARS?s Scientific Manuscript database

    Tunicamycin, a potent reversible translocase I inhibitor, is produced by several Actinomycetes species. The tunicamycin structure is highly unusual, and contains an 11-carbon dialdose sugar and an aß-1,1-glycosidic linkage. Here we report the identification of a gene cluster essential for tunicamy...

  19. Characterization of the Complete Zwittermicin A Biosynthesis Gene Cluster from Bacillus cereus▿ †

    PubMed Central

    Kevany, Brian M.; Rasko, David A.; Thomas, Michael G.

    2009-01-01

    Bacillus cereus UW85 produces the linear aminopolyol antibiotic zwittermicin A (ZmA). This antibiotic has diverse biological activities, such as suppression of disease in plants caused by protists, inhibition of fungal and bacterial growth, and amplification of the insecticidal activity of the toxin protein from Bacillus thuringiensis. ZmA has an unusual chemical structure that includes a d amino acid and ethanolamine and glycolyl moieties, as well as having an unusual terminal amide that is generated from the modification of the nonproteinogenic amino acid β-ureidoalanine. The diverse biological activities and unusual structure of ZmA have stimulated our efforts to understand how this antibiotic is biosynthesized. Here, we present the identification of the complete ZmA biosynthesis gene cluster from B. cereus UW85. A nearly identical gene cluster is identified on a plasmid from B. cereus AH1134, and we show that this strain is also capable of producing ZmA. Bioinformatics and biochemical analyses of the ZmA biosynthesis enzymes strongly suggest that ZmA is initially biosynthesized as part of a larger metabolite that is processed twice, resulting in the formation of ZmA and two additional metabolites. Additionally, we propose that the biosynthesis gene cluster for the production of the amino sugar kanosamine is contained within the ZmA biosynthesis gene cluster in B. cereus UW85. PMID:19098220

  20. Expanded natural product diversity revealed by analysis of lanthipeptide-like gene clusters in actinobacteria.

    PubMed

    Zhang, Qi; Doroghazi, James R; Zhao, Xiling; Walker, Mark C; van der Donk, Wilfred A

    2015-07-01

    Lanthionine-containing peptides (lanthipeptides) are a rapidly growing family of polycyclic peptide natural products belonging to the large class of ribosomally synthesized and posttranslationally modified peptides (RiPPs). Lanthipeptides are widely distributed in taxonomically distant species, and their currently known biosynthetic systems and biological activities are diverse. Building on the recent natural product gene cluster family (GCF) project, we report here large-scale analysis of lanthipeptide-like biosynthetic gene clusters from Actinobacteria. Our analysis suggests that lanthipeptide biosynthetic pathways, and by extrapolation the natural products themselves, are much more diverse than currently appreciated and contain many different posttranslational modifications. Furthermore, lanthionine synthetases are much more diverse in sequence and domain topology than currently characterized systems, and they are used by the biosynthetic machineries for natural products other than lanthipeptides. The gene cluster families described here significantly expand the chemical diversity and biosynthetic repertoire of lanthionine-related natural products. Biosynthesis of these novel natural products likely involves unusual and unprecedented biochemistries, as illustrated by several examples discussed in this study. In addition, class IV lanthipeptide gene clusters are shown not to be silent, setting the stage to investigate their biological activities.

  1. Expanded Natural Product Diversity Revealed by Analysis of Lanthipeptide-Like Gene Clusters in Actinobacteria

    PubMed Central

    Zhang, Qi; Doroghazi, James R.; Zhao, Xiling; Walker, Mark C.

    2015-01-01

    Lanthionine-containing peptides (lanthipeptides) are a rapidly growing family of polycyclic peptide natural products belonging to the large class of ribosomally synthesized and posttranslationally modified peptides (RiPPs). Lanthipeptides are widely distributed in taxonomically distant species, and their currently known biosynthetic systems and biological activities are diverse. Building on the recent natural product gene cluster family (GCF) project, we report here large-scale analysis of lanthipeptide-like biosynthetic gene clusters from Actinobacteria. Our analysis suggests that lanthipeptide biosynthetic pathways, and by extrapolation the natural products themselves, are much more diverse than currently appreciated and contain many different posttranslational modifications. Furthermore, lanthionine synthetases are much more diverse in sequence and domain topology than currently characterized systems, and they are used by the biosynthetic machineries for natural products other than lanthipeptides. The gene cluster families described here significantly expand the chemical diversity and biosynthetic repertoire of lanthionine-related natural products. Biosynthesis of these novel natural products likely involves unusual and unprecedented biochemistries, as illustrated by several examples discussed in this study. In addition, class IV lanthipeptide gene clusters are shown not to be silent, setting the stage to investigate their biological activities. PMID:25888176

  2. Genetic weighted k-means algorithm for clustering large-scale gene expression data.

    PubMed

    Wu, Fang-Xiang

    2008-05-28

    The traditional (unweighted) k-means is one of the most popular clustering methods for analyzing gene expression data. However, it suffers three major shortcomings. It is sensitive to initial partitions, its result is prone to the local minima, and it is only applicable to data with spherical-shape clusters. The last shortcoming means that we must assume that gene expression data at the different conditions follow the independent distribution with the same variances. Nevertheless, this assumption is not true in practice. In this paper, we propose a genetic weighted K-means algorithm (denoted by GWKMA), which solves the first two problems and partially remedies the third one. GWKMA is a hybridization of a genetic algorithm (GA) and a weighted K-means algorithm (WKMA). In GWKMA, each individual is encoded by a partitioning table which uniquely determines a clustering, and three genetic operators (selection, crossover, mutation) and a WKM operator derived from WKMA are employed. The superiority of the GWKMA over the k-means is illustrated on a synthetic and two real-life gene expression datasets. The proposed algorithm has general application to clustering large-scale biological data such as gene expression data and peptide mass spectral data.

  3. Interrogating the function of metazoan histones using engineered gene clusters.

    PubMed

    McKay, Daniel J; Klusza, Stephen; Penke, Taylor J R; Meers, Michael P; Curry, Kaitlin P; McDaniel, Stephen L; Malek, Pamela Y; Cooper, Stephen W; Tatomer, Deirdre C; Lieb, Jason D; Strahl, Brian D; Duronio, Robert J; Matera, A Gregory

    2015-02-09

    Histones and their posttranslational modifications influence the regulation of many DNA-dependent processes. Although an essential role for histone-modifying enzymes in these processes is well established, defining the specific contribution of individual histone residues remains a challenge because many histone-modifying enzymes have nonhistone targets. This challenge is exacerbated by the paucity of suitable approaches to genetically engineer histone genes in metazoans. Here, we describe a platform in Drosophila for generating and analyzing any desired histone genotype, and we use it to test the in vivo function of three histone residues. We demonstrate that H4K20 is neither essential for DNA replication nor for completion of development, unlike inferences drawn from analyses of H4K20 methyltransferases. We also show that H3K36 is required for viability and H3K27 is essential for maintenance of cellular identity but not for gene activation. These findings highlight the power of engineering histones to interrogate genome structure and function in animals. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Genomic and expression analysis of the vanG-like gene cluster of Clostridium difficile.

    PubMed

    Peltier, Johann; Courtin, Pascal; El Meouche, Imane; Catel-Ferreira, Manuella; Chapot-Chartier, Marie-Pierre; Lemée, Ludovic; Pons, Jean-Louis

    2013-07-01

    Primary antibiotic treatment of Clostridium difficile intestinal diseases requires metronidazole or vancomycin therapy. A cluster of genes homologous to enterococcal glycopeptides resistance vanG genes was found in the genome of C. difficile 630, although this strain remains sensitive to vancomycin. This vanG-like gene cluster was found to consist of five ORFs: the regulatory region consisting of vanR and vanS and the effector region consisting of vanG, vanXY and vanT. We found that 57 out of 83 C. difficile strains, representative of the main lineages of the species, harbour this vanG-like cluster. The cluster is expressed as an operon and, when present, is found at the same genomic location in all strains. The vanG, vanXY and vanT homologues in C. difficile 630 are co-transcribed and expressed to a low level throughout the growth phases in the absence of vancomycin. Conversely, the expression of these genes is strongly induced in the presence of subinhibitory concentrations of vancomycin, indicating that the vanG-like operon is functional at the transcriptional level in C. difficile. Hydrophilic interaction liquid chromatography (HILIC-HPLC) and MS analysis of cytoplasmic peptidoglycan precursors of C. difficile 630 grown without vancomycin revealed the exclusive presence of a UDP-MurNAc-pentapeptide with an alanine at the C terminus. UDP-MurNAc-pentapeptide [d-Ala] was also the only peptidoglycan precursor detected in C. difficile grown in the presence of vancomycin, corroborating the lack of vancomycin resistance. Peptidoglycan structures of a vanG-like mutant strain and of a strain lacking the vanG-like cluster did not differ from the C. difficile 630 strain, indicating that the vanG-like cluster also has no impact on cell-wall composition.

  5. Identification, characterization and metagenome analysis of oocyte-specific genes organized in clusters in the mouse genome

    PubMed Central

    Paillisson, Amélie; Dadé, Sébastien; Callebaut, Isabelle; Bontoux, Martine; Dalbiès-Tran, Rozenn; Vaiman, Daniel; Monget, Philippe

    2005-01-01

    Background Genes specifically expressed in the oocyte play key roles in oogenesis, ovarian folliculogenesis, fertilization and/or early embryonic development. In an attempt to identify novel oocyte-specific genes in the mouse, we have used an in silico subtraction methodology, and we have focused our attention on genes that are organized in genomic clusters. Results In the present work, five clusters have been studied: a cluster of thirteen genes characterized by an F-box domain localized on chromosome 9, a cluster of six genes related to T-cell leukaemia/lymphoma protein 1 (Tcl1) on chromosome 12, a cluster composed of a SPErm-associated glutamate (E)-Rich (Speer) protein expressed in the oocyte in the vicinity of four unknown genes specifically expressed in the testis on chromosome 14, a cluster composed of the oocyte secreted protein-1 (Oosp-1) gene and two Oosp-related genes on chromosome 19, all three being characterized by a partial N-terminal zona pellucida-like domain, and another small cluster of two genes on chromosome 19 as well, composed of a TWIK-Related spinal cord K+ channel encoding-gene, and an unknown gene predicted in silico to be testis-specific. The specificity of expression was confirmed by RT-PCR and in situ hybridization for eight and five of them, respectively. Finally, we showed by comparing all of the isolated and clustered oocyte-specific genes identified so far in the mouse genome, that the oocyte-specific clusters are significantly closer to telomeres than isolated oocyte-specific genes are. Conclusion We have studied five clusters of genes specifically expressed in female, some of them being also expressed in male germ-cells. Moreover, contrarily to non-clustered oocyte-specific genes, those that are organized in clusters tend to map near chromosome ends, suggesting that this specific near-telomere position of oocyte-clusters in rodents could constitute an evolutionary advantage. Understanding the biological benefits of such an

  6. Supra-operonic clusters of functionally related genes (SOCs) are a source of horizontal gene co-transfers

    PubMed Central

    Pang, Tin Yau; Lercher, Martin J.

    2017-01-01

    Adaptation of bacteria occurs predominantly via horizontal gene transfer (HGT). While it is widely recognized that horizontal acquisitions frequently encompass multiple genes, it is unclear what the size distribution of successfully transferred DNA segments looks like and what evolutionary forces shape this distribution. Here, we identified 1790 gene family pairs that were consistently co-gained on the same branches across a phylogeny of 53 E. coli strains. We estimated a lower limit of their genomic distances at the time they were transferred to their host genomes; this distribution shows a sharp upper bound at 30 kb. The same gene-pairs can have larger distances (up to 70 kb) in other genomes. These more distant pairs likely represent recent acquisitions via transduction that involve the co-transfer of excised prophage genes, as they are almost always associated with intervening phage-associated genes. The observed distribution of genomic distances of co-transferred genes is much broader than expected from a model based on the co-transfer of genes within operons; instead, this distribution is highly consistent with the size distribution of supra-operonic clusters (SOCs), groups of co-occurring and co-functioning genes that extend beyond operons. Thus, we propose that SOCs form a basic unit of horizontal gene transfer. PMID:28067311

  7. Identification of arthritis-related gene clusters by microarray analysis of two independent mouse models for rheumatoid arthritis.

    PubMed

    Fujikado, Noriyuki; Saijo, Shinobu; Iwakura, Yoichiro

    2006-01-01

    Rheumatoid arthritis (RA) is an autoimmune disease affecting approximately 1% of the population worldwide. Previously, we showed that human T-cell leukemia virus type I-transgenic mice and interleukin-1 receptor antagonist-knockout mice develop autoimmunity and joint-specific inflammation that resembles human RA. To identify genes involved in the pathogenesis of arthritis, we analyzed the gene expression profiles of these animal models by using high-density oligonucleotide arrays. We found 1,467 genes that were differentially expressed from the normal control mice by greater than threefold in one of these animal models. The gene expression profiles of the two models correlated well. We extracted 554 genes whose expression significantly changed in both models, assuming that pathogenically important genes at the effector phase would change in both models. Then, each of these commonly changed genes was mapped into the whole genome in a scale of the 1-megabase pairs. We found that the transcriptome map of these genes did not distribute evenly on the chromosome but formed clusters. These identified gene clusters include the major histocompatibility complex class I and class II genes, complement genes, and chemokine genes, which are well known to be involved in the pathogenesis of RA at the effector phase. The activation of these gene clusters suggests that antigen presentation and lymphocyte chemotaxis are important for the development of arthritis. Moreover, by searching for such clusters, we could detect genes with marginal expression changes. These gene clusters include schlafen and membrane-spanning four-domains subfamily A genes whose function in arthritis has not yet been determined. Thus, by combining two etiologically different RA models, we succeeded in efficiently extracting genes functioning in the development of arthritis at the effector phase. Furthermore, we demonstrated that identification of gene clusters by transcriptome mapping is a useful way to find

  8. Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages

    USDA-ARS?s Scientific Manuscript database

    Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trac...

  9. A hybrid NRPS-PKS gene cluster related to the bleomycin family of antitumor antibiotics in Alteromonas macleodii strains.

    PubMed

    Mizuno, Carolina Megumi; Kimes, Nikole E; López-Pérez, Mario; Ausó, Eva; Rodriguez-Valera, Francisco; Ghai, Rohit

    2013-01-01

    Although numerous marine bacteria are known to produce antibiotics via hybrid NRPS-PKS gene clusters, none have been previously described in an Alteromonas species. In this study, we describe in detail a novel hybrid NRPS-PKS cluster identified in the plasmid of the Alteromonasmacleodii strain AltDE1 and analyze its relatedness to other similar gene clusters in a sequence-based characterization. This is a mobile cluster, flanked by transposase-like genes, that has even been found inserted into the chromosome of some Alteromonasmacleodii strains. The cluster contains separate genes for NRPS and PKS activity. The sole PKS gene appears to carry a novel acyltransferase domain, quite divergent from those currently characterized. The predicted specificities of the adenylation domains of the NRPS genes suggest that the final compound has a backbone very similar to bleomycin related compounds. However, the lack of genes involved in sugar biosynthesis indicates that the final product is not a glycopeptide. Even in the absence of these genes, the presence of the cluster appears to confer complete or partial resistance to phleomycin, which may be attributed to a bleomycin-resistance-like protein identified within the cluster. This also suggests that the compound still shares significant structural similarity to bleomycin. Moreover, transcriptomic evidence indicates that the NRPS-PKS cluster is expressed. Such sequence-based approaches will be crucial to fully explore and analyze the diversity and potential of secondary metabolite production, especially from increasingly important sources like marine microbes.

  10. Comparison of expression of secondary metabolite biosynthesis cluster genes in Aspergillus flavus, A. parasiticus, and A. oryzae

    USDA-ARS?s Scientific Manuscript database

    More than 55 secondary metabolite biosynthesis gene clusters are predicted to be present in the Aspergillus flavus genome. In spite of this the biosynthesis of only a few metabolites, such as the aflatoxin, cyclopiazonic acid and aflatrem, has been correlated with a particular gene cluster. Using RN...

  11. Multiplexed CRISPR/Cas9- and TAR-Mediated Promoter Engineering of Natural Product Biosynthetic Gene Clusters in Yeast.

    PubMed

    Kang, Hahk-Soo; Charlop-Powers, Zachary; Brady, Sean F

    2016-09-16

    The use of DNA sequencing to guide the discovery of natural products has emerged as a new paradigm for revealing chemistries encoded in bacterial genomes. A major obstacle to implementing this approach to natural product discovery is the transcriptional silence of biosynthetic gene clusters under laboratory growth conditions. Here we describe an improved yeast-based promoter engineering platform (mCRISTAR) that combines CRISPR/Cas9 and TAR to enable single-marker multiplexed promoter engineering of large gene clusters. mCRISTAR highlights the first application of the CRISPR/Cas9 system to multiplexed promoter engineering of natural product biosynthetic gene clusters. In this method, CRISPR/Cas9 is used to induce DNA double-strand breaks in promoter regions of biosynthetic gene clusters, and the resulting operon fragments are reassembled by TAR using synthetic gene-cluster-specific promoter cassettes. mCRISTAR uses a CRISPR array to simplify the construction of a CRISPR plasmid for multiplex CRISPR and a single auxotrophic selection to improve the inefficiency of using a CRISPR array for multiplex gene cluster refactoring. mCRISTAR is a simple and generic method for multiplexed replacement of promoters in biosynthetic gene clusters that will facilitate the discovery of natural products from the rapidly growing collection of gene clusters found in microbial genome and metagenome sequencing projects.

  12. The impact of polyploidy on the evolution of a complex NB-LRR resistance gene cluster in soybean

    USDA-ARS?s Scientific Manuscript database

    A comparative genomics approach was used to investigate the evolution of a complex NB-LRR gene cluster found in soybean (Glycine max), common bean (Phaseolus vulgaris), and other legumes. In soybean, the cluster is associated with several disease resistance (R) genes of known function including Rpg1...

  13. Birth, death and horizontal transfer of the fumonisin biosynthetic gene cluster during the evolutionary diversification of Fusarium

    USDA-ARS?s Scientific Manuscript database

    In fungi, genes required for synthesis of secondary metabolites are often clustered. The FUM gene cluster is required for synthesis of a family of toxic secondary metabolites, fumonisins, produced by species of Fusarium in the Gibberella fujikuroi species complex (GFSC). Fumonisins are a health and ...

  14. Clustered Transcription Factor Genes Regulate Nicotine Biosynthesis in Tobacco[W][OA

    PubMed Central

    Shoji, Tsubasa; Kajikawa, Masataka; Hashimoto, Takashi

    2010-01-01

    Tobacco (Nicotiana tabacum) synthesizes nicotine and related pyridine alkaloids in the root, and their synthesis increases upon herbivory on the leaf via a jasmonate-mediated signaling cascade. Regulatory NIC loci that positively regulate nicotine biosynthesis have been genetically identified, and their mutant alleles have been used to breed low-nicotine tobacco varieties. Here, we report that the NIC2 locus, originally called locus B, comprises clustered transcription factor genes of an ethylene response factor (ERF) subfamily; in the nic2 mutant, at least seven ERF genes are deleted altogether. Overexpression, suppression, and dominant repression experiments using transgenic tobacco roots showed both functional redundancy and divergence among the NIC2-locus ERF genes. These transcription factors recognized a GCC-box element in the promoter of a nicotine pathway gene and specifically activated all known structural genes in the pathway. The NIC2-locus ERF genes are expressed in the root and upregulated by jasmonate with kinetics that are distinct among the members. Thus, gene duplication events generated a cluster of highly homologous transcription factor genes with transcriptional and functional diversity. The NIC2-locus ERFs are close homologs of ORCA3, a jasmonate-responsive transcriptional activator of indole alkaloid biosynthesis in Catharanthus roseus, indicating that the NIC2/ORCA3 ERF subfamily was recruited independently to regulate jasmonate-inducible secondary metabolism in distinct plant lineages. PMID:20959558

  15. Clusters of genes encoding fructan biosynthesizing enzymes in wheat and barley.

    PubMed

    Huynh, Bao-Lam; Mather, Diane E; Schreiber, Andreas W; Toubia, John; Baumann, Ute; Shoaei, Zahra; Stein, Nils; Ariyadasa, Ruvini; Stangoulis, James C R; Edwards, James; Shirley, Neil; Langridge, Peter; Fleury, Delphine

    2012-10-01

    Fructans are soluble carbohydrates with health benefits and possible roles in plant adaptation. Fructan biosynthetic genes were isolated using comparative genomics and physical mapping followed by BAC sequencing in barley. Genes encoding sucrose:sucrose 1-fructosyltransferase (1-SST), fructan:fructan 1-fructosyltransferase (1-FFT) and sucrose:fructan 6-fructosyltransferase (6-SFT) were clustered together with multiple copies of vacuolar invertase genes and a transposable element on two barley BAC. Intron-exon structures of the genes were similar. Phylogenetic analysis of the fructosyltransferases and invertases in the Poaceae showed that the fructan biosynthetic genes may have evolved from vacuolar invertases. Quantitative real-time PCR was performed using leaf RNA extracted from three wheat cultivars grown under different conditions. The 1-SST, 1-FFT and 6-SFT genes had correlated expression patterns in our wheat experiment and in existing barley transcriptome database. Single nucleotide polymorphism (SNP) markers were developed and successfully mapped to a major QTL region affecting wheat grain fructan accumulation in two independent wheat populations. The alleles controlling high- and low- fructan in parental lines were also found to be associated in fructan production in a diverse set of 128 wheat lines. To the authors' knowledge, this is the first report on the mapping and sequencing of a fructan biosynthetic gene cluster and in particular, the isolation of a novel 1-FFT gene from barley.

  16. Cloning of a Vibrio cholerae vibriobactin gene cluster: identification of genes required for early steps in siderophore biosynthesis.

    PubMed Central

    Wyckoff, E E; Stoebner, J A; Reed, K E; Payne, S M

    1997-01-01

    Vibrio cholerae secretes the catechol siderophore vibriobactin in response to iron limitation. Vibriobactin is structurally similar to enterobactin, the siderophore produced by Escherichia coli, and both organisms produce 2,3-dihydroxybenzoic acid (DHBA) as an intermediate in siderophore biosynthesis. To isolate and characterize V. cholerae genes involved in vibriobactin biosynthesis, we constructed a genomic cosmid bank of V. cholerae DNA and isolated clones that complemented mutations in E. coli enterobactin biosynthesis genes. V. cholerae homologs of entA, entB, entC, entD, and entE were identified on overlapping cosmid clones. Our data indicate that the vibriobactin genes are clustered, like the E. coli enterobactin genes, but the organization of the genes within these clusters is different. In this paper, we present the organization and sequences of genes involved in the synthesis and activation of DHBA. In addition, a V. cholerae strain with a chromosomal mutation in vibA was constructed by marker exchange. This strain was unable to produce vibriobactin or DHBA, confirming that in V. cholerae VibA catalyzes an early step in vibriobactin biosynthesis. PMID:9371453

  17. Molecular cloning of the Escherichia coli B L-fucose-D-arabinose gene cluster.

    PubMed Central

    Elsinghorst, E A; Mortlock, R P

    1994-01-01

    To metabolize the uncommon pentose D-arabinose, enteric bacteria often recruit the enzymes of the L-fucose pathway by a regulatory mutation. However, Escherichia coli B can grow on D-arabinose without the requirement of a mutation, using some of the L-fucose enzymes and a D-ribulokinase that is distinct from the L-fuculokinase of the L-fucose pathway. To study this naturally occurring D-arabinose pathway, we cloned and partially characterized the E. coli B L-fucose-D-arabinose gene cluster and compared it with the L-fucose gene cluster of E. coli K-12. The order of the fucA, -P, -I, and -K genes was the same in the two E. coli strains. However, the E. coli B gene cluster contained a 5.2-kb segment located between the fucA and fucP genes that was not present in E. coli K-12. This segment carried the darK gene, which encodes the D-ribulokinase needed for growth on D-arabinose by E. coli B. The darK gene was not homologous with any of the L-fucose genes or with chromosomal DNA from other D-arabinose-utilizing bacteria. D-Ribulokinase and L-fuculokinase were purified to apparent homogeneity and partially characterized. The molecular weights, substrate specificities, and kinetic parameters of these two enzymes were very dissimilar, which together with DNA hybridization analysis, suggested that these enzymes are not related. D-Arabinose metabolism by E. coli B appears to be the result of acquisitive evolution, but the source of the darK gene has not been determined. Images PMID:7961494

  18. Web-Type Evolution of Rhodococcus Gene Clusters Associated with Utilization of Naphthalene

    PubMed Central

    Kulakov, Leonid A.; Chen, Shenchang; Allen, Christopher C. R.; Larkin, Michael J.

    2005-01-01

    Clusters of genes which include determinants for the catalytic subunits of naphthalene dioxygenase (narAa and narAb) were analyzed in naphthalene-degrading Rhodococcus strains. We demonstrated (i) that in the region analyzed homologous gene clusters are separated from each other by nonhomologous DNA, (ii) that there are various degrees of homology between related genes, and (iii) that nar genes are located on plasmids in strains NCIMB12038 and P400 and on a chromosome in P200. These observations suggest that genetic exchange and reshuffling of genetic modules, as well as vertical descent of the genetic information, were the main routes in the evolution of naphthalene degradation in Rhodococcus. These conclusions were supported by studies of transcription patterns in the region analyzed. It was found that the nar region is not organized into a single operon but there are several transcription units which differ in the strains investigated. The narA and narB genes were found to be transcribed as a single unit in all strains analyzed, and their transcription was induced by naphthalene. The putative aldolase gene (narC) was found on the same transcript only in strains P200 and P400. In NCIMB12038 transcription of two more gene clusters was induced by growth on naphthalene. Transcription start sites for narA and narB were found to be different in all of the strains studied. Putative regulatory genes (narR1 and narR2) were transcribed as a single mRNA in naphthalene-induced cells. At the same time, a number of the genes known to be essential for naphthalene catabolism in gram-negative bacteria were not found in the region analyzed. PMID:15811998

  19. Identification of the phd gene cluster responsible for phenylpropanoid utilization in Corynebacterium glutamicum.

    PubMed

    Kallscheuer, Nicolai; Vogt, Michael; Kappelmann, Jannick; Krumbach, Karin; Noack, Stephan; Bott, Michael; Marienhagen, Jan

    2016-02-01

    Phenylpropanoids as abundant, lignin-derived compounds represent sustainable feedstocks for biotechnological production processes. We found that the biotechnologically important soil bacterium Corynebacterium glutamicum is able to grow on phenylpropanoids such as p-coumaric acid, ferulic acid, caffeic acid, and 3-(4-hydroxyphenyl)propionic acid as sole carbon and energy sources. Global gene expression analyses identified a gene cluster (cg0340-cg0341 and cg0344-cg0347), which showed increased transcription levels in response to phenylpropanoids. The gene cg0340 (designated phdT) encodes for a putative transporter protein, whereas cg0341 and cg0344-cg0347 (phdA-E) encode enzymes involved in the β-oxidation of phenylpropanoids. The phd gene cluster is transcriptionally controlled by a MarR-type repressor encoded by cg0343 (phdR). Cultivation experiments conducted with C. glutamicum strains carrying single-gene deletions showed that loss of phdA, phdB, phdC, or phdE abolished growth of C. glutamicum with all phenylpropanoid substrates tested. The deletion of phdD (encoding for putative acyl-CoA dehydrogenase) additionally abolished growth with the α,β-saturated phenylpropanoid 3-(4-hydroxyphenyl)propionic acid. However, the observed growth defect of all constructed single-gene deletion strains could be abolished through plasmid-borne expression of the respective genes. These results and the intracellular accumulation of pathway intermediates determined via LC-ESI-MS/MS in single-gene deletion mutants showed that the phd gene cluster encodes for a CoA-dependent, β-oxidative deacetylation pathway, which is essential for the utilization of phenylpropanoids in C. glutamicum.

  20. Evolutionary dynamics of rRNA gene clusters in cichlid fish

    PubMed Central

    2012-01-01

    Background Among multigene families, ribosomal RNA (rRNA) genes are the most frequently studied and have been explored as cytogenetic markers to study the evolutionary history of karyotypes among animals and plants. In this report, we applied cytogenetic and genomic methods to investigate the organization of rRNA genes among cichlid fishes. Cichlids are a group of fishes that are of increasing scientific interest due to their rapid and convergent adaptive radiation, which has led to extensive ecological diversity. Results The present paper reports the cytogenetic mapping of the 5S rRNA genes from 18 South American, 22 African and one Asian species and the 18S rRNA genes from 3 African species. The data obtained were comparatively analyzed with previously published information related to the mapping of rRNA genes in cichlids. The number of 5S rRNA clusters per diploid genome ranged from 2 to 15, with the most common pattern being the presence of 2 chromosomes bearing a 5S rDNA cluster. Regarding 18S rDNA mapping, the number of sites ranged from 2 to 6, with the most common pattern being the presence of 2 sites per diploid genome. Furthermore, searching the Oreochromis niloticus genome database led to the identification of a total of 59 copies of 5S rRNA and 38 copies of 18S rRNA genes that were distributed in several genomic scaffolds. The rRNA genes were frequently flanked by transposable elements (TEs) and spread throughout the genome, complementing the FISH analysis that detect only clustered copies of rRNA genes. Conclusions The organization of rRNA gene clusters seems to reflect their intense and particular evolutionary pathway and not the evolutionary history of the associated taxa. The possible role of TEs as one source of rRNA gene movement, that could generates the spreading of ribosomal clusters/copies, is discussed. The present paper reinforces the notion that the integration of cytogenetic data and genomic analysis provides a more complete picture for

  1. Relationship between replication and transcriptional activity within the type 1 keratin gene cluster

    SciTech Connect

    Belanger, C.; Royal, A.; Lemieux, N.

    1994-09-01

    Tissue-specific genes are usually replicated according to a developmentally regulated pattern: early in cells expressing those particular genes and late in non-expressing cell types. To study the relationship between transcriptional activity and replication time in relation to cell differentiation, the type 1 keratin gene cluster (Krt-1) was chosen as a model. This relationship is very complex, particularly in multigene families, because genes are not expressed at the same time during development or cellular differentiation. To determine if the Krt-1 cluster behaves as a single functional unit or is subdivided into functionally distinct regions, we have defined the replication times of two specific sequences within the locus with different patterns of expression. The analyses were performed by FISH techniques on mouse non-synchronous interphase nuclei obtained from different cell lines: KLN 205 and OBL 24. The results show two distinct units: a region containing three type 1 keratin genes, K19, K15 and K13, which seem to behave as a single functional unit and a region containing the K10 gene. We have obtained a high percentage of double signals with K19, K15, and K13 genes whereas most nuclei showed single hybridization dots with the K10 gene. Furthermore, the K19, K15, K13 region seems to replicate earlier than the K10 gene in KLN 205 cells. In contrast, no relation between replication time and epithelial differentiation type was found with two reference probes: the hypoxanthine phosphorybosyl transferase gene (HPRT), early replicating in all the cells examined, and the peripherin gene which is expected to be late replicating in epithelial cells. Comparison of the replication times of sequences used within the Krt-1 locus suggests that this locus behaves as at least two distinct domains related to different type 1 keratin gene expression in distinct tissues.

  2. Classification and Clustering on Microarray Data for Gene Functional Prediction Using R.

    PubMed

    López-Kleine, Liliana; Kleine, Liliana López; Montaño, Rosa; Torres-Avilés, Francisco

    2016-01-01

    Gene expression data (microarrays and RNA-sequencing data) as well as other kinds of genomic data can be extracted from publicly available genomic data. Here, we explain how to apply multivariate cluster and classification methods on gene expression data. These methods have become very popular and are implemented in freely available software in order to predict the participation of gene products in a specific functional category of interest. Taking into account the availability of data and of these methods, every biological study should apply them in order to obtain knowledge on the organism studied and functional category of interest. A special emphasis is made on the nonlinear kernel classification methods.

  3. Phenotype-Dependent Coexpression Gene Clusters: Application to Normal and Premature Ageing.

    PubMed

    Wang, Kun; Das, Avinash; Xiong, Zheng-Mei; Cao, Kan; Hannenhalli, Sridhar

    2015-01-01

    Hutchinson Gilford progeria syndrome (HGPS) is a rare genetic disease with symptoms of aging at a very early age. Its molecular basis is not entirely clear, although profound gene expression changes have been reported, and there are some known and other presumed overlaps with normal aging process. Identification of genes with agingor HGPS-associated expression changes is thus an important problem. However, standard regression approaches are currently unsuitable for this task due to limited sample sizes, thus motivating development of alternative approaches. Here, we report a novel iterative multiple regression approach that leverages co-expressed gene clusters to identify gene clusters whose expression co-varies with age and/or HGPS. We have applied our approach to novel RNA-seq profiles in fibroblast cell cultures at three different cellular ages, both from HGPS patients and normal samples. After establishing the robustness of our approach, we perform a comparative investigation of biological processes underlying normal aging and HGPS. Our results recapitulate previously known processes underlying aging as well as suggest numerous unique processes underlying aging and HGPS. The approach could also be useful in detecting phenotype-dependent co-expression gene clusters in other contexts with limited sample sizes.

  4. Characterisation of the paralytic shellfish toxin biosynthesis gene clusters in Anabaena circinalis AWQC131C and Aphanizomenon sp. NH-5

    PubMed Central

    Mihali, Troco K; Kellmann, Ralf; Neilan, Brett A

    2009-01-01

    Background Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs) are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. Results We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. Conclusion The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved in the biosynthesis, may

  5. Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi

    PubMed Central

    Cacho, Ralph A.; Tang, Yi; Chooi, Yit-Heng

    2015-01-01

    Genomics has revolutionized the research on fungal secondary metabolite (SM) biosynthesis. To elucidate the molecular and enzymatic mechanisms underlying the biosynthesis of a specific SM compound, the important first step is often to find the genes that responsible for its synthesis. The accessibility to fungal genome sequences allows the bypass of the cumbersome traditional library construction and screening approach. The advance in next-generation sequencing (NGS) technologies have further improved the speed and reduced the cost of microbial genome sequencing in the past few years, which has accelerated the research in this field. Here, we will present an example work flow for identifying the gene cluster encoding the biosynthesis of SMs of interest using an NGS approach. We will also review the different strategies that can be employed to pinpoint the targeted gene clusters rapidly by giving several examples stemming from our work. PMID:25642215

  6. Regularized Non-negative Matrix Factorization for Identifying Differential Genes and Clustering Samples: a Survey.

    PubMed

    Liu, Jin-Xing; Wang, Dong; Gao, Ying-Lian; Zheng, Chun-Hou; Xu, Yong; Yu, Jiguo

    2017-02-07

    Non-negative Matrix Factorization (NMF), a classical method for dimensionality reduction, has been applied in many fields. It is based on the idea that negative numbers are physically meaningless in various data-processing tasks. Apart from its contribution to conventional data analysis, the recent overwhelming interest in NMF is due to its newly discovered ability to solve challenging data mining and machine learning problems, especially in relation to gene expression data. This survey paper mainly focuses on research examining the application of NMF to identify differentially expressed genes and to cluster samples, and the main NMF models, properties, principles, and algorithms with its various generalizations, extensions, and modifications are summarized. The experimental results demonstrate the performance of the various NMF algorithms in identifying differentially expressed genes and clustering samples.

  7. Stable chromosomal integration of the entire nitrogen fixation gene cluster from Klebsiella pneumoniae in yeast.

    PubMed Central

    Zamir, A; Maina, C V; Fink, G R; Szalay, A A

    1981-01-01

    A bacterial plasmid containing the entire nitrogen fixation (nif) gene cluster (consisting of at least 15 genes) from Klebsiella pneumoniae was used in conjunction with an Escherichia coli-yeast shuttle plasmid containing the yeast his4 gene cluster to cotransform a his4- recipient strain of Saccharomyces cerevisiae. Of 87 histidine-independent clones screened, 2 contained nif DNA. Restriction and hybridization analyses showed that two copies of the nif plasmid (46 kilobases each) are integrated in tandem in the recipient chromosome by recombination between homologous regions in the transforming plasmids. Chromosomal integration was also verified by tetrad analysis, showing that the nif DNA behaved in meiosis like a Mendelian element. During mitotic growth, one of the two copies of the nif region is frequently lost. The remaining copy of nif is stable, even after 40 generations in nonselective medium. Images PMID:6267596

  8. Natural and engineered hydroxyectoine production based on the Pseudomonas stutzeri ectABCD-ask gene cluster.

    PubMed

    Seip, Britta; Galinski, Erwin A; Kurz, Matthias

    2011-02-01

    We report on the presence of a functional hydroxyectoine biosynthesis gene cluster, ectABCD-ask, in Pseudomonas stutzeri DSM5190(T) and evaluate the suitability of P. stutzeri DSM5190(T) for hydroxyectoine production. Furthermore, we present information on heterologous de novo production of the compatible solute hydroxyectoine in Escherichia coli. In this host, the P. stutzeri gene cluster remained under the control of its salt-induced native promoters. We also noted the absence of trehalose when hydroxyectoine genes were expressed, as well as a remarkable inhibitory effect of externally applied betaine on hydroxyectoine synthesis. The specific heterologous production rate in E. coli under the conditions employed exceeded that of the natural producer Pseudomonas stutzeri and, for the first time, enabled effective hydroxyectoine production at low salinity (2%), with the added advantage of simple product processing due to the absence of other cosolutes.

  9. Interrogating the function of metazoan histones using engineered gene clusters

    PubMed Central

    McKay, Daniel J.; Klusza, Stephen; Penke, Taylor J.R.; Meers, Michael P.; Curry, Kaitlin P.; McDaniel, Stephen L.; Malek, Pamela Y.; Cooper, Stephen W.; Tatomer, Deirdre C.; Lieb, Jason D.; Strahl, Brian D.; Duronio, Robert J.; Matera, A. Gregory

    2015-01-01

    SUMMARY Histones and their post-translational modifications influence the regulation of many DNA-dependent processes. Although an essential role for histone-modifying enzymes in these processes is well established, defining the specific contribution of individual histone residues remains a challenge because many histone-modifying enzymes have non-histone targets. This challenge is exacerbated by the paucity of suitable approaches to genetically engineer histone genes in metazoans. Here, we describe a facile platform in Drosophila for generating and analyzing any desired histone genotype, and we use it to test the in vivo function of three histone residues. We demonstrate that H4K20 is neither essential for DNA replication nor for completion of development, unlike conclusions drawn from analyses of H4K20 methyltransferases. We also show that H3K36 is required for viability and H3K27 is essential for maintenance of cellular identity during development. These findings highlight the power of engineering histones to interrogate genome structure and function in animals. PMID:25669886

  10. Organization and characterization of a biosynthetic gene cluster for bafilomycin from Streptomyces griseus DSM 2608

    PubMed Central

    2013-01-01

    Streptomyces griseus DSM 2608 produces bafilomycin, an antifungal plecomacrolide antibiotic. We cloned and sequenced an 87.4-kb region, including a polyketide synthase (PKS) region, methoxymalonate genes, flavensomycinate genes, and other putative regulatory genes. The 58.5kb of PKS region consisting 12 PKS modules arranged in five different PKS genes, was assumed to be responsible for the biosynthesis of plecomacrolide backbone including 16-membered macrocyclic lactone. All the modules showed high similarities with typical type I PKS genes. However, the starting module of PKS gene was confirmed to be specific for isobutyrate by sequence comparison of an acyltransferase domain. In downstream of PKS region, the genes for methoxymalonate biosynthesis were located, among which a gene for FkbH-like protein was assumed to play an important role in the production of methoxymalonyl-CoA from glyceryl-CoA. Further the genes encoding flavensomycinyl-ACP biosynthesis for the post-PKS tailoring were also found in the upstream of PKS region. By gene disruption experiments of a dehydratase domain of module 12 and an FkbH-like protein, this gene cluster was confirmed to be involved in the biosynthesis of bafilomycin. PMID:23663353

  11. Two Gene Clusters Coordinate Galactose and Lactose Metabolism in Streptococcus gordonii

    PubMed Central

    Zeng, Lin; Martino, Nicole C.

    2012-01-01

    Streptococcus gordonii is an early colonizer of the human oral cavity and an abundant constituent of oral biofilms. Two tandemly arranged gene clusters, designated lac and gal, were identified in the S. gordonii DL1 genome, which encode genes of the tagatose pathway (lacABCD) and sugar phosphotransferase system (PTS) enzyme II permeases. Genes encoding a predicted phospho-β-galactosidase (LacG), a DeoR family transcriptional regulator (LacR), and a transcriptional antiterminator (LacT) were also present in the clusters. Growth and PTS assays supported that the permease designated EIILac transports lactose and galactose, whereas EIIGal transports galactose. The expression of the gene for EIIGal was markedly upregulated in cells growing on galactose. Using promoter-cat fusions, a role for LacR in the regulation of the expressions of both gene clusters was demonstrated, and the gal cluster was also shown to be sensitive to repression by CcpA. The deletion of lacT caused an inability to grow on lactose, apparently because of its role in the regulation of the expression of the genes for EIILac, but had little effect on galactose utilization. S. gordonii maintained a selective advantage over Streptococcus mutans in a mixed-species competition assay, associated with its possession of a high-affinity galactose PTS, although S. mutans could persist better at low pHs. Collectively, these results support the concept that the galactose and lactose systems of S. gordonii are subject to complex regulation and that a high-affinity galactose PTS may be advantageous when S. gordonii is competing against the caries pathogen S. mutans in oral biofilms. PMID:22660715

  12. DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions.

    PubMed

    El-Mogharbel, Nisrine; Wakefield, Matthew; Deakin, Janine E; Tsend-Ayush, Enkhjargal; Grützner, Frank; Alsop, Amber; Ezaz, Tariq; Marshall Graves, Jennifer A

    2007-01-01

    We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.

  13. Structure, function, and regulation of the aldouronate utilization gene cluster from Paenibacillus sp. strain JDR-2.

    PubMed

    Chow, Virginia; Nong, Guang; Preston, James F

    2007-12-01

    Direct bacterial conversion of the hemicellulose fraction of hardwoods and crop residues to biobased products depends upon extracellular depolymerization of methylglucuronoxylan (MeGAX(n)), followed by assimilation and intracellular conversion of aldouronates and xylooligosaccharides to fermentable xylose. Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium, secretes a multimodular cell-associated GH10 endoxylanase (XynA1) that catalyzes depolymerization of MeGAX(n) and rapidly assimilates the principal products, beta-1,4-xylobiose, beta-1,4-xylotriose, and MeGAX(3), the aldotetrauronate 4-O-methylglucuronosyl-alpha-1,2-xylotriose. Genomic libraries derived from this bacterium have now allowed cloning and sequencing of a unique aldouronate utilization gene cluster comprised of genes encoding signal transduction regulatory proteins, ABC transporter proteins, and the enzymes AguA (GH67 alpha-glucuronidase), XynA2 (GH10 endoxylanase), and XynB (GH43 beta-xylosidase/alpha-arabinofuranosidase). Expression of these genes, as well as xynA1 encoding the secreted GH10 endoxylanase, is induced by growth on MeGAX(n) and repressed by glucose. Sequences in the yesN, lplA, and xynA2 genes within the cluster and in the distal xynA1 gene show significant similarity to catabolite responsive element (cre) defined in Bacillus subtilis for recognition of the catabolite control protein (CcpA) and consequential repression of catabolic regulons. The aldouronate utilization gene cluster in Paenibacillus sp. strain JDR-2 operates as a regulon, coregulated with the expression of xynA1, conferring the ability for efficient assimilation and catabolism of the aldouronate product generated by a multimodular cell surface-anchored GH10 endoxylanase. This cluster offers a desirable metabolic potential for bacterial conversion of hemicellulose fractions of hardwood and crop residues to biobased products.

  14. Identification, isolation, and analysis of a gene cluster involved in iron acquisition by Pseudomonas mendocina ymp

    PubMed Central

    Awaya, Jonathan D.

    2013-01-01

    Microbial acquisition of iron from natural sources in aerobic environments is a little-studied process that may lead to mineral instability and trace metal mobilization. Pseudomonas mendocina ymp was isolated from the Yucca Mountain Site for long-term nuclear waste storage. Its ability to solubilize a variety of Fe-containing minerals under aerobic conditions has been previously investigated but its molecular and genetic potential remained uncharacterized. Here, we have shown that the organism produces a hydroxamate and not a catecholate-based siderophore that is synthesized via non-ribosomal peptide synthetases. Gene clustering patterns observed in other Pseudomonads suggested that hybridizing multiple probes to the same library could allow for the identification of one or more clusters of syntenic siderophore-associated genes. Using this approach, two independent clusters were identified. An unfinished draft genome sequence of P. mendocina ymp indicated that these mapped to two independent contigs. The sequenced clusters were investigated informatically and shown to contain respectively a potentially complete set of genes responsible for siderophore biosynthesis, uptake, and regulation, and an incomplete set of genes with low individual homology to siderophore-associated genes. A mutation in the cluster’s pvdA homolog (pmhA) resulted in a siderophore-null phenotype, which could be reversed by complementation. The organism likely produces one siderophore with possibly different isoforms and a peptide backbone structure containing seven residues (predicted sequence: Acyl-Asp-Dab-Ser-fOHOrn-Ser-fOHorn). A similar approach could be applied for discovery of Fe− and siderophore-associated genes in unsequenced or poorly annotated organisms. PMID:18058194

  15. Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products.

    PubMed

    Li, Yong Fuga; Tsai, Kathleen J S; Harvey, Colin J B; Li, James Jian; Ary, Beatrice E; Berlew, Erin E; Boehman, Brenna L; Findley, David M; Friant, Alexandra G; Gardner, Christopher A; Gould, Michael P; Ha, Jae H; Lilley, Brenna K; McKinstry, Emily L; Nawal, Saadia; Parry, Robert C; Rothchild, Kristina W; Silbert, Samantha D; Tentilucci, Michael D; Thurston, Alana M; Wai, Rebecca B; Yoon, Yongjin; Aiyar, Raeka S; Medema, Marnix H; Hillenmeyer, Maureen E; Charkoudian, Louise K

    2016-04-01

    Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, "activity-guided" approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent "genome mining" approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias toward the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and

  16. A Telomeric Cluster of Antimony Resistance Genes on Chromosome 34 of Leishmania infantum

    PubMed Central

    Tejera Nevado, Paloma; Bifeld, Eugenia; Höhn, Katharina

    2016-01-01

    The mechanisms underlying the drug resistance of Leishmania spp. are manifold and not completely identified. Apart from the highly conserved multidrug resistance gene family known from higher eukaryotes, Leishmania spp. also possess genus-specific resistance marker genes. One of them, ARM58, was first identified in Leishmania braziliensis using a functional cloning approach, and its domain structure was characterized in L. infantum. Here we report that L. infantum ARM58 is part of a gene cluster at the telomeric end of chromosome 34 also comprising the neighboring genes ARM56 and HSP23. We show that overexpression of all three genes can confer antimony resistance to intracellular amastigotes. Upon overexpression in L. donovani, ARM58 and ARM56 are secreted via exosomes, suggesting a scavenger/secretion mechanism of action. Using a combination of functional cloning and next-generation sequencing, we found that the gene cluster was selected only under antimonyl tartrate challenge and weakly under Cu2+ challenge but not under sodium arsenite, Cd2+, or miltefosine challenge. The selective advantage is less pronounced in intracellular amastigotes treated with the sodium stibogluconate, possibly due to the known macrophage-stimulatory activity of this drug, against which these resistance markers may not be active. Our data point to the specificity of these three genes for antimony resistance. PMID:27324767

  17. A Telomeric Cluster of Antimony Resistance Genes on Chromosome 34 of Leishmania infantum.

    PubMed

    Tejera Nevado, Paloma; Bifeld, Eugenia; Höhn, Katharina; Clos, Joachim

    2016-09-01

    The mechanisms underlying the drug resistance of Leishmania spp. are manifold and not completely identified. Apart from the highly conserved multidrug resistance gene family known from higher eukaryotes, Leishmania spp. also possess genus-specific resistance marker genes. One of them, ARM58, was first identified in Leishmania braziliensis using a functional cloning approach, and its domain structure was characterized in L. infantum Here we report that L. infantum ARM58 is part of a gene cluster at the telomeric end of chromosome 34 also comprising the neighboring genes ARM56 and HSP23. We show that overexpression of all three genes can confer antimony resistance to intracellular amastigotes. Upon overexpression in L. donovani, ARM58 and ARM56 are secreted via exosomes, suggesting a scavenger/secretion mechanism of action. Using a combination of functional cloning and next-generation sequencing, we found that the gene cluster was selected only under antimonyl tartrate challenge and weakly under Cu(2+) challenge but not under sodium arsenite, Cd(2+), or miltefosine challenge. The selective advantage is less pronounced in intracellular amastigotes treated with the sodium stibogluconate, possibly due to the known macrophage-stimulatory activity of this drug, against which these resistance markers may not be active. Our data point to the specificity of these three genes for antimony resistance. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  18. Novel linkage disequilibrium clustering algorithm identifies new lupus genes on meta-analysis of GWAS datasets.

    PubMed

    Saeed, Mohammad

    2017-05-01

    Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.

  19. Non-ribosomal peptide synthetases: Identifying the cryptic gene clusters and decoding the natural product.

    PubMed

    Singh, Mangal; Chaudhary, Sandeep; Sareen, Dipti

    2017-03-01

    Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) present in bacteria and fungi are the major multi-modular enzyme complexes which synthesize secondary metabolites like the pharmacologically important antibiotics and siderophores. Each of the multiple modules of an NRPS activates a different amino or aryl acid, followed by their condensation to synthesize a linear or cyclic natural product. The studies on NRPS domains, the knowledge of their gene cluster architecture and tailoring enzymes have helped in the in silico genetic screening of the ever-expanding sequenced microbial genomic data for the identification of novel NRPS/PKS clusters and thus deciphering novel non-ribosomal peptides (NRPs). Adenylation domain is an integral part of the NRPSs and is the substrate selecting unit for the final assembled NRP. In some cases, it also requires a small protein, the MbtH homolog, for its optimum activity. The presence of putative adenylation domain and MbtH homologs in a sequenced genome can help identify the novel secondary metabolite producers. The role of the adenylation domain in the NRPS gene clusters and its characterization as a tool for the discovery of novel cryptic NRPS gene clusters are discussed.

  20. COPD subtypes identified by network-based clustering of blood gene expression

    PubMed Central

    Chang, Yale; Glass, Kimberly; Liu, Yang-Yu; Silverman, Edwin K.; Crapo, James D.; Tal-Singer, Ruth; Bowler, Russ; Dy, Jennifer; Cho, Michael; Castaldi, Peter

    2016-01-01

    One of the most common smoking-related diseases, chronic obstructive pulmonary disease (COPD), results from a dysregulated, multi-tissue inflammatory response to cigarette smoke. We hypothesized that systemic inflammatory signals in genome-wide blood gene expression can identify clinically important COPD-related disease subtypes, and we leveraged pre-existing gene interaction networks to guide unsupervised clustering of blood microarray expression data. Using network-informed non-negative matrix factorization, we analyzed genome-wide blood gene expression from 229 former smokers in the ECLIPSE Study, and we identified novel, clinically relevant molecular subtypes of COPD. These network-informed clusters were more stable and more strongly associated with measures of lung structure and function than clusters derived from a network-naïve approach, and they were associated with subtype-specific enrichment for inflammatory and protein catabolic pathways. These clusters were successfully reproduced in an independent sample of 135 smokers from the COPDGene Study. PMID:26773458

  1. A polyketide synthase-peptide synthetase gene cluster from an uncultured bacterial symbiont of Paederus beetles.

    PubMed

    Piel, Jörn

    2002-10-29

    Many drug candidates from marine and terrestrial invertebrates are suspected metabolites of uncultured bacterial symbionts. The antitumor polyketides of the pederin family, isolated from beetles and sponges, are an example. Drug development from such sources is commonly hampered by low yields and the difficulty of sustaining invertebrate cultures. To obtain insight into the true producer and find alternative supplies of these rare drug candidates, the putative pederin biosynthesis genes were cloned from total DNA of Paederus fuscipes beetles, which use this compound for chemical defense. Sequence analysis of the gene cluster and adjacent regions revealed the presence of ORFs with typical bacterial architecture and homologies. The ped cluster, which is present only in beetle specimens with high pederin content, is located on a 54-kb region bordered by transposase pseudogenes and encodes a mixed modular polyketide synthase/nonribosomal peptide synthetase. Notably, none of the modules contains regions with homology to acyltransferase domains, but two copies of isolated monodomain acyltransferase genes were found at the upstream end of the cluster. In line with an involvement in pederin biosynthesis, the upstream cluster region perfectly mirrors pederin structure. The unexpected presence of additional polyketide synthase/nonribosomal peptide synthetase modules reveals surprising insights into the evolutionary relationship between pederin-type pathways in beetles and sponges.

  2. Sequencing and transcriptional analysis of the biosynthesis gene cluster of putrescine-producing Lactococcus lactis.

    PubMed

    Ladero, Victor; Rattray, Fergal P; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A

    2011-09-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution.

  3. A polyketide synthase-peptide synthetase gene cluster from an uncultured bacterial symbiont of Paederus beetles

    PubMed Central

    Piel, Jörn

    2002-01-01

    Many drug candidates from marine and terrestrial invertebrates are suspected metabolites of uncultured bacterial symbionts. The antitumor polyketides of the pederin family, isolated from beetles and sponges, are an example. Drug development from such sources is commonly hampered by low yields and the difficulty of sustaining invertebrate cultures. To obtain insight into the true producer and find alternative supplies of these rare drug candidates, the putative pederin biosynthesis genes were cloned from total DNA of Paederus fuscipes beetles, which use this compound for chemical defense. Sequence analysis of the gene cluster and adjacent regions revealed the presence of ORFs with typical bacterial architecture and homologies. The ped cluster, which is present only in beetle specimens with high pederin content, is located on a 54-kb region bordered by transposase pseudogenes and encodes a mixed modular polyketide synthase/nonribosomal peptide synthetase. Notably, none of the modules contains regions with homology to acyltransferase domains, but two copies of isolated monodomain acyltransferase genes were found at the upstream end of the cluster. In line with an involvement in pederin biosynthesis, the upstream cluster region perfectly mirrors pederin structure. The unexpected presence of additional polyketide synthase/nonribosomal peptide synthetase modules reveals surprising insights into the evolutionary relationship between pederin-type pathways in beetles and sponges. PMID:12381784

  4. Cloning, sequencing, analysis, and heterologous expression of the fredericamycin biosynthetic gene cluster from Streptomyces griseus.

    PubMed

    Wendt-Pienkowski, Evelyn; Huang, Yong; Zhang, Jian; Li, Bensheng; Jiang, Hao; Kwon, Hyungjin; Hutchinson, C Richard; Shen, Ben

    2005-11-30

    Fredericamycin (FDM) A, a pentadecaketide featuring two sets of peri-hydroxy tricyclic aromatic moieties connected through a unique chiral spiro carbon center, exhibits potent cytotoxicity and has been studied as a new type of anticancer drug lead because of its novel molecular architecture. The fdm gene cluster was localized to 33-kb DNA segment of Streptomyces griseus ATCC 49344, and its involvement in FDM A biosynthesis was proven by gene inactivation, complementation, and heterologous expression experiments. The fdm cluster consists of 28 open reading frames (ORFs), encoding a type II polyketide synthase (PKS) and tailoring enzymes as well as several regulatory and resistance proteins. The FDM PKS features a KSalpha subunit with heretofore unseen tandem cysteines at its active site, a KSbeta subunit that is distinct phylogenetically from KSbeta of hexa-, octa-, or decaketide PKSs, and a dedicated phosphopantetheinyl transferase. Further study of the FDM PKS could provide new insight into how a type II PKS controls chain length in aromatic polyketide biosynthesis. The availability of the fdm genes, in vivo characterization of the fdm cluster in S. griseus, and heterologous expression of the fdm cluster in Streptomyces albus set the stage to investigate FDM A biosynthesis and engineer the FDM biosynthetic machinery for the production of novel FDM A analogues.

  5. A Comparison of Fuzzy Clustering Approaches for Quantification of Microarray Gene Expression

    PubMed Central

    WANG, YU-PING; GUNAMPALLY, MAHESWAR; CHEN, JIE; BITTEL, DOUGLAS; BUTLER, MERLIN G.; CAI, WEI-WEN

    2016-01-01

    Despite the widespread application of microarray imaging for biomedical imaging research, barriers still exist regarding its reliability for clinical use. A critical major problem lies in accurate spot segmentation and the quantification of gene expression level (mRNA) from the microarray images. A variety of commercial and research freeware packages are available, but most cannot handle array spots with complex shapes such as donuts and scratches. Clustering approaches such as k-means and mixture models were introduced to overcome this difficulty, which use the hard labeling of each pixel. In this paper, we apply fuzzy clustering approaches for spot segmentation, which provides soft labeling of the pixel. We compare several fuzzy clustering approaches for microarray analysis and provide a comprehensive study of these approaches for spot segmentation. We show that possiblistic c-means clustering (PCM) provides the best performance in terms of stability criterion when testing on both a variety of simulated and real microarray images. In addition, we compared three statistical criteria in measuring gene expression levels and show that a new asymptotically unbiased statistic is able to quantify the gene expression level more accurately. PMID:28163819

  6. Genome-wide upstream motif analysis of Cryptosporidium parvum genes clustered by expression profile

    PubMed Central

    2013-01-01

    Background There are very few molecular genetic tools available to study the apicomplexan parasite Cryptosporidium parvum. The organism is not amenable to continuous in vitro cultivation or transfection, and purification of intracellular developmental stages in sufficient numbers for most downstream molecular applications is difficult and expensive since animal hosts are required. As such, very little is known about gene regulation in C. parvum. Results We have clustered whole-genome gene expression profiles generated from a previous study of seven post-infection time points of 3,281 genes to identify genes that show similar expression patterns throughout the first 72 hours of in vitro epithelial cell culture. We used the algorithms MEME, AlignACE and FIRE to identify conserved, overrepresented DNA motifs in the upstream promoter region of genes with similar expression profiles. The most overrepresented motifs were E2F (5′-TGGCGCCA-3′); G-box (5′-G.GGGG-3′); a well-documented ApiAP2 binding motif (5′-TGCAT-3′), and an unknown motif (5′-[A/C] AACTA-3′). We generated a recombinant C. parvum DNA-binding protein domain from a putative ApiAP2 transcription factor [CryptoDB: cgd8_810] and determined its binding specificity using protein-binding microarrays. We demonstrate that cgd8_810 can putatively bind the overrepresented G-box motif, implicating this ApiAP2 in the regulation of many gene clusters. Conclusion Several DNA motifs were identified in the upstream sequences of gene clusters that might serve as potential cis-regulatory elements. These motifs, in concert with protein DNA binding site data, establish for the first time the beginnings of a global C. parvum gene regulatory map that will contribute to our understanding of the development of this zoonotic parasite. PMID:23895416

  7. Genome-wide upstream motif analysis of Cryptosporidium parvum genes clustered by expression profile.

    PubMed

    Oberstaller, Jenna; Joseph, Sandeep J; Kissinger, Jessica C

    2013-07-29

    There are very few molecular genetic tools available to study the apicomplexan parasite Cryptosporidium parvum. The organism is not amenable to continuous in vitro cultivation or transfection, and purification of intracellular developmental stages in sufficient numbers for most downstream molecular applications is difficult and expensive since animal hosts are required. As such, very little is known about gene regulation in C. parvum. We have clustered whole-genome gene expression profiles generated from a previous study of seven post-infection time points of 3,281 genes to identify genes that show similar expression patterns throughout the first 72 hours of in vitro epithelial cell culture. We used the algorithms MEME, AlignACE and FIRE to identify conserved, overrepresented DNA motifs in the upstream promoter region of genes with similar expression profiles. The most overrepresented motifs were E2F (5'-TGGCGCCA-3'); G-box (5'-G.GGGG-3'); a well-documented ApiAP2 binding motif (5'-TGCAT-3'), and an unknown motif (5'-[A/C] AACTA-3'). We generated a recombinant C. parvum DNA-binding protein domain from a putative ApiAP2 transcription factor [CryptoDB: cgd8_810] and determined its binding specificity using protein-binding microarrays. We demonstrate that cgd8_810 can putatively bind the overrepresented G-box motif, implicating this ApiAP2 in the regulation of many gene clusters. Several DNA motifs were identified in the upstream sequences of gene clusters that might serve as potential cis-regulatory elements. These motifs, in concert with protein DNA binding site data, establish for the first time the beginnings of a global C. parvum gene regulatory map that will contribute to our understanding of the development of this zoonotic parasite.

  8. Evolution of the Genome 3D Organization: Comparison of Fused and Segregated Globin Gene Clusters.